Out-of-order RDMA method and device with asynchronous notification
Technical Field
The invention relates to the field of out-of-order RDMA message communication, in particular to an out-of-order RDMA method and device with asynchronous notification.
Background
In high performance computing systems, RDMA messaging mechanisms are employed between computing nodes for remote host access, typically integrated in a network interface chip.
Currently, in the high performance computing field, the commercial network interface chip is basically monopolized by the company mellonox, and the RDMA mechanism thereof is implemented by strictly referring to the InfiniBand specification. RDMA message transmission in the InfiniBand specification must be strictly order-preserved, and the order-preserved transmission generally only can use deterministic routing and has poor dynamic adaptability to networks. And the message mechanism using out-of-order transmission can support adaptive routing, and has smaller constraint on the network.
In an out-of-order communication network, when an RDMA message completes data transmission and a source side needs to notify a target side of completing an event, if software initiates an asynchronous notification to the target side, the asynchronous notification message needs to be delivered after the RDMA message of the source side is completed, and therefore the time delay is large.
Disclosure of Invention
Aiming at the technical problem, the invention provides an out-of-order RDMA method with asynchronous notification, which comprises the following steps:
step 1: the method comprises the steps that a source side obtains and records message packet information of RDMA messages, reads packet data from a source side main memory according to the message packet information, packages the packet data and corresponding message packet information into RDMA data packets and sends the RDMA data packets to a target side;
step 2: after receiving the response packet returned by the target party, counting the response, and after finishing the response, sending an asynchronous notification message Send packet to the target party;
and step 3: and writing a completion event after the target side writes the Send packet into the receiving queue and returns a response.
The method supports message out-of-order transmission and message packet out-of-order transmission, reduces the limitation on the routing mode, enables the construction of the network to be more flexible, automatically initiates an asynchronous notification message by source side hardware to notify a target side of message completion, realizes the quick notification of a message completion event, and reduces the message delay. The method comprises the steps that when the length of an RDMA message is larger than the length of a maximum transmission packet, the RDMA message is split into a plurality of packet transmissions, each packet contains packet information, when different message packets of different RDMA messages or different message packets of the same RDMA message are transmitted, a source side obtains the message packet information in sequence, the packet data and the corresponding message packet information are read from a source side main memory according to the message packet information, the packet data and the corresponding message packet information are packaged into an RDMA data packet and are transmitted to a target side, the target side returns a response to the source side after receiving one RDMA data packet, the source side counts the responses, after the responses are collected completely (the response collection shows that one RDMA message transmission is completed), an asynchronous notification message Send packet is transmitted to the target side, the target side writes a receiving queue and returns one response, after the source side receives the response packet, the message execution is completed, and a write completion event is transmitted to a designated main memory space.
Preferably, in step 1, the message packet information of the RDMA message is obtained through a message suspension buffer module, and includes a message ID, a remaining data amount, a next packet source data address, a next packet destination data address, an MTU, a remaining response amount, an asynchronous notification reception queue number, a message descriptor, and a message status; in step 2, the target party returns a response packet which carries the message hanging buffer number and the message ID number.
Preferably, the specific method of response counting in step 2 is as follows: when a source side receives a response packet, matching corresponding message information of a message hanging buffer module according to a message hanging buffer number and a message ID number in the response packet, and reducing the residual response quantity in the message hanging buffer module by one after the matching is successful; when the number of remaining replies is zero, it indicates that the message transmission is complete. The setting uses source side counting, ensures the reliability of message transmission, simplifies hardware design and saves hardware resource overhead.
Preferably, in step 1, the source side reads data from the source side main memory according to information such as the remaining data amount of the message, the source data address of the next packet, the destination data address of the next packet, and the MTU (maximum transmission packet length).
The invention also discloses an out-of-order RDMA device with asynchronous notification, which comprises a sending engine and a receiving engine, wherein the sending engine comprises a message hanging buffer module, a response processing module, a unpacking module, a packing module, a sending engine memory access interface module and a data buffer module; the receiving engine comprises a request packet processing module, a response queue module, a response packet processing module, a receiving queue management module and a receiving engine access interface module;
the message hanging buffer module is used for acquiring and registering a message descriptor of the RDMA message packet and recording message information of the RDMA message packet; the message information comprises message ID, residual data volume, next packet source data address, next packet target data address, residual response number, asynchronous notification receiving queue number, message descriptor, message state and maximum transmission packet length MTU information.
The unpacking module is electrically connected with the message hanging buffer module, the packaging module and the sending engine access interface module and is used for reading the message descriptor and unpacking the message descriptor to generate package control information, outputting the content of the package control information to the packaging module, outputting a data fetching request to the sending engine access interface module and updating the message state information and writing back the updated message state information to the message hanging buffer; when the information suspension buffer module is obtained to send a signal that the information enters an asynchronous notification state, the unpacking module generates asynchronous notification control information and information descriptor content, and outputs the asynchronous notification control information and the information descriptor content to the packing module, wherein the information descriptor content is a descriptor of the RDMA information;
the sending engine access interface module is electrically connected with the data buffer module and is used for processing all main memory accesses of the sending engine, initiating an access request to a main memory access after receiving a data fetching request of the unpacking module, and writing an access response into the data buffer module; the sending engine memory access interface module is also electrically connected with the message suspension buffer module and is used for processing a completion event input by the message suspension buffer module;
the data buffer module is used for storing the data content of the RDMA message packet;
the package module is electrically connected with the unpacking module and the data buffer module and is used for being responsible for packaging and sending the message package to the Internet, receiving package control information from the unpacking module, receiving the data content of the RDMA message package from the data buffer module, packaging the data content into an RDMA message request package and sending the RDMA message request package to the Internet access; when the unpacking module obtains the message hanging buffer and sends out a message to enter a state signal for sending an asynchronous notification, the unpacking module obtains asynchronous notification control information and message descriptor contents, and then packages the asynchronous notification control information and the message descriptor contents into a request packet for sending the asynchronous notification message and sends the request packet to an internet access;
the request packet processing module is responsible for processing message request packets received on the network and is electrically connected with the response queue module, the receiving engine memory access interface module and the receiving queue management module; when the message request packet is an RDMA message request packet, the request packet is analyzed, data in the packet is extracted, a write data request is generated and input to a receiving engine memory access interface module, and meanwhile, an RDMA message response packet is generated according to the content of the message request packet and input to a response queue module; when the message request packet is a request packet for sending an asynchronous notification message, generating a response packet for sending the asynchronous notification message, sending the response packet to a response queue module, generating asynchronous notification information, and writing the asynchronous notification information into a receiving queue management module;
the response queue module is electrically connected with the request packet processing module, processes the response packet output by the request packet processing module and sends the response packet to the Internet;
the response packet processing module is responsible for processing the response packet received on the network, generating remote response control information and inputting the remote response control information into the response processing module;
the response processing module is electrically connected with the response packet processing module and the message hanging and buffering module, is responsible for processing remote response and processing according to the type of the response packet; when the response packet is an RDMA message response packet, controlling the message hanging buffer module to count the response, and after finishing the response, controlling the message hanging buffer module to generate and send a message entering and sending asynchronous notification state signal; when the response packet is a response packet for sending an asynchronous notification message, controlling a message hanging buffer module to generate a completion event and write the completion event into a main memory;
the receiving queue management module is electrically connected with the request packet processing module and is used for positioning to a specified position corresponding to a receiving queue space according to a receiving queue number when receiving asynchronous notification information input by the request packet processing module, generating a queue writing request and inputting the queue writing request to the receiving engine memory access interface module;
the receiving engine memory access interface module is responsible for processing all memory access requests of the receiving engine, including write data requests input by the request packet processing module and queue write requests input by the receiving queue management module, and all write main memory requests are sent to the main memory channel.
In the out-of-order RDMA device, when different message packets of different RDMA messages or different message packets of the same RDMA message are sent, a sending engine acquires message packet information of each RDMA message in sequence, reads packet data from a main memory according to the message packet information, encapsulates the packet data and the corresponding message packet information into an RDMA message request packet and sends the RDMA message request packet to a receiving engine, the receiving engine returns a response to the sending engine when receiving one RDMA message request packet, a response processing module in the sending engine counts the responses, sends the message request packet with sending asynchronous notification messages to the receiving engine after the responses are received completely (the response is received completely to indicate that the transmission of one RDMA message is finished), a request packet processing module in the receiving engine generates the asynchronous notification information and writes the asynchronous notification information into a receiving queue management module and returns a response, and the response processing module in the sending engine receives the response packet, indicating that the message execution is complete, write completion event to the specified main memory space. The device supports message out-of-order transmission and message packet out-of-order transmission, reduces the limitation on a routing mode, enables the construction of a network to be more flexible, automatically initiates an asynchronous notification message by source side hardware to notify a target side of message completion, realizes the quick notification of a message completion event, and reduces message delay.
Preferably, when the response packet is an RDMA message response packet, the response counting manner of the response processing module is as follows: and when a response packet is received, matching the corresponding message information of the message hanging buffer module according to the message hanging buffer number and the message ID number in the response packet, and reducing the residual response quantity in the message hanging buffer module by one after the matching is successful. The setting uses source side counting, ensures the reliability of message transmission, simplifies hardware design and saves hardware resource overhead.
Preferably, the receive queue space of the receive queue management module is located in the main memory space.
The invention has the beneficial effects that: the message packet supports out-of-order transmission, so that the limitation on a network and a routing mode can be reduced, and the construction of the network is more flexible. And a reliable message transmission mechanism of source side counting is used, so that the reliable transmission of the message is ensured, the hardware design is simplified, and the hardware resource overhead is saved. The hardware automatically sends the asynchronous notification message, the source side can complete the quick notification of the event to the target side under the disorder communication network, and the message delay is effectively reduced.
Drawings
FIG. 1 is a mechanical diagram of the out-of-order RDMA method with asynchronous notification of embodiment 1;
FIG. 2 is a schematic diagram showing a configuration of a transmission engine according to embodiment 2;
fig. 3 is a schematic structural diagram of a receiving engine in embodiment 2.
Detailed Description
The invention is described in further detail below:
example 1: as shown in fig. 1, an out-of-order RDMA method with asynchronous notification includes the following steps:
step 1: the method comprises the steps that a source side obtains and records message packet information of RDMA messages, reads packet data from a source side main memory according to the message packet information, packages the packet data and corresponding message packet information into RDMA data packets and sends the RDMA data packets to a target side; preferably, in this step, the message packet information of the RDMA message is obtained by the message pending buffer module, which includes the message ID, the remaining data amount, the next packet source data address, the next packet destination data address, the MTU, the remaining response number, the asynchronous notification reception queue number, the message descriptor, and the message status;
step 2: after receiving the response packet returned by the target party, counting the response, and after finishing the response, sending an asynchronous notification message Send packet to the target party; in the step, a response packet returned by the target party carries a message hanging buffer number and a message ID number;
and step 3: and writing a completion event after the target side writes the Send packet into the receiving queue and returns a response.
The method supports message out-of-order transmission and message packet out-of-order transmission, reduces the limitation on the routing mode, enables the construction of the network to be more flexible, automatically initiates an asynchronous notification message by source side hardware to notify a target side of message completion, realizes the quick notification of a message completion event, and reduces the message delay. The RDMA message length is larger than the maximum transmission package length, the RDMA message length is divided into a plurality of package transmission, each package comprises package information, in the application, when different message packages of different RDMA messages or different message packages of the same RDMA message are transmitted, a source side obtains the message package information in sequence, the package data and the corresponding message package information are read from a source side main memory according to the message package information, the package data and the corresponding message package information are packaged into the RDMA data package and are transmitted to a target side, the target side returns a response to the source side after receiving one RDMA data package, the source side counts the responses, after the responses are completely collected (the responses are completely collected), an asynchronous notification message Send package is transmitted to the target side, the target side writes a receiving queue and returns a response, after the source side receives the response package, the message execution is completed, and the writing completion event is transmitted to a designated main memory space.
Preferably, the specific method of response counting in step 2 is as follows: when a source side receives a response packet, matching corresponding message information of a message hanging buffer module according to a message hanging buffer number and a message ID number in the response packet, and reducing the residual response quantity in the message hanging buffer module by one after the matching is successful; when the number of remaining replies is zero, it indicates that the message transmission is complete. The setting uses source side counting, ensures the reliability of message transmission, simplifies hardware design and saves hardware resource overhead.
Preferably, in step 1, the source side reads data from the source side main memory according to information such as the remaining data amount of the message, the source data address of the next packet, the destination data address of the next packet, and the MTU (maximum transmission packet length).
Example 2: 2-3, an out-of-order RDMA device with asynchronous notification comprises a sending engine and a receiving engine, wherein the sending engine comprises a message hanging buffer module, a response processing module, an unpacking module, a packing module, a sending engine access interface module and a data buffer module; the receiving engine comprises a request packet processing module, a response queue module, a response packet processing module, a receiving queue management module and a receiving engine access interface module;
the message hanging buffer module is used for acquiring and registering a message descriptor of the RDMA message packet and recording message information of the RDMA message packet; the message information comprises message ID, residual data volume, next packet source data address, next packet target data address, residual response number, asynchronous notification receiving queue number, message descriptor, message state and maximum transmission packet length MTU information.
The unpacking module is electrically connected with the message hanging buffer module, the packaging module and the sending engine access interface module and is used for reading the message descriptor and unpacking the message descriptor to generate package control information, outputting the content of the package control information to the packaging module, outputting a data fetching request to the sending engine access interface module and updating the message state information and writing back the updated message state information to the message hanging buffer; when a signal that the message hanging buffer module sends a message enters a sending asynchronous notification state is obtained, the unpacking module generates asynchronous notification control information and message descriptor content, and outputs the asynchronous notification control information and the message descriptor content to the packing module, wherein the message descriptor content is a descriptor of an RDMA message;
the sending engine access interface module is electrically connected with the data buffer module and is used for processing all main memory accesses of the sending engine, initiating an access request to a main memory access after receiving a data fetching request of the unpacking module, and writing an access response into the data buffer module; the sending engine memory access interface module is also electrically connected with the message suspension buffer module and is used for processing a completion event input by the message suspension buffer module;
the data buffer module is used for storing the data content of the RDMA message packet;
the package module is electrically connected with the unpacking module and the data buffer module and is used for being responsible for packaging and sending the message package to the Internet, receiving package control information from the unpacking module, receiving the data content of the RDMA message package from the data buffer module, packaging the data content into an RDMA message request package and sending the RDMA message request package to the Internet access; when the unpacking module obtains the message hanging buffer to send out the message and enters into the sending asynchronous notification state signal, the unpacking module obtains the asynchronous notification control information and the message descriptor content, and then the asynchronous notification control information and the message descriptor content are packaged into a request packet for sending the asynchronous notification message and sent to the internet access;
the request packet processing module is responsible for processing message request packets received on the network and is electrically connected with the response queue module, the receiving engine memory access interface module and the receiving queue management module; when the message request packet is an RDMA message request packet, the request packet is analyzed, data in the packet is extracted, a write data request is generated and input to a receiving engine memory access interface module, and meanwhile, an RDMA message response packet is generated according to the content of the message request packet and input to a response queue module; when the message request packet is a request packet for sending an asynchronous notification message, generating a response packet for sending the asynchronous notification message, sending the response packet to a response queue module, generating asynchronous notification information, and writing the asynchronous notification information into a receiving queue management module;
the response queue module is electrically connected with the request packet processing module, processes the response packet output by the request packet processing module and sends the response packet to the Internet;
the response packet processing module is responsible for processing the response packet received on the network, generating remote response control information and inputting the remote response control information into the response processing module;
the response processing module is electrically connected with the response packet processing module and the message hanging and buffering module, is responsible for processing remote response and processing according to the type of the response packet; when the response packet is an RDMA message response packet, controlling the message suspension buffer module to respond and count, and after finishing the response, controlling the message suspension buffer module to generate and send a message to enter a state signal for sending an asynchronous notification; when the response packet is a response packet for sending an asynchronous notification message, controlling a message hanging buffer module to generate a completion event and write the completion event into a main memory;
the receiving queue management module is electrically connected with the request packet processing module and is used for positioning to a specified position corresponding to a receiving queue space according to a receiving queue number when receiving asynchronous notification information input by the request packet processing module, generating a queue writing request and inputting the queue writing request to the receiving engine memory access interface module;
the receiving engine memory access interface module is responsible for processing all memory access requests of the receiving engine, including write data requests input by the request packet processing module and queue write requests input by the receiving queue management module, and all write main memory requests are sent to the main memory path.
In the out-of-order RDMA device, when different message packets of different RDMA messages or different message packets of the same RDMA message are sent, a sending engine acquires message packet information of each RDMA message in sequence, reads packet data from a main memory according to the message packet information, encapsulates the packet data and the corresponding message packet information into an RDMA message request packet and sends the RDMA message request packet to a receiving engine, the receiving engine returns a response to the sending engine when receiving one RDMA message request packet, a response processing module in the sending engine counts the responses, sends the message request packet with sending asynchronous notification messages to the receiving engine after the responses are received completely (the response is received completely to indicate that the transmission of one RDMA message is finished), a request packet processing module in the receiving engine generates the asynchronous notification information and writes the asynchronous notification information into a receiving queue management module and returns a response, and the response processing module in the sending engine receives the response packet, indicating that the message execution is complete, and writing a completion event to the specified main memory space. The device supports message out-of-order transmission and message packet out-of-order transmission, reduces the limitation on a routing mode, enables the construction of a network to be more flexible, automatically initiates an asynchronous notification message by source side hardware to notify a target side of message completion, realizes the quick notification of a message completion event, and reduces message delay.
Preferably, when the response packet is an RDMA message response packet, the response counting manner of the response processing module is as follows: and when a response packet is received, matching the corresponding message information of the message hanging buffer module according to the message hanging buffer number and the message ID number in the response packet, and reducing the residual response quantity in the message hanging buffer module by one after the matching is successful. The setting uses source side counting, ensures the reliability of message transmission, simplifies hardware design and saves hardware resource overhead.
Preferably, the receive queue space of the receive queue management module is located in the main memory space.
The foregoing is only a preferred embodiment of the present invention and all equivalent changes or modifications in the structure, characteristics and principles described in the present patent application are included in the scope of the present patent application.