WO2023138789A1 - Methods and devices for network interface card (nic) object coherency (noc) messages - Google Patents

Methods and devices for network interface card (nic) object coherency (noc) messages Download PDF

Info

Publication number
WO2023138789A1
WO2023138789A1 PCT/EP2022/051476 EP2022051476W WO2023138789A1 WO 2023138789 A1 WO2023138789 A1 WO 2023138789A1 EP 2022051476 W EP2022051476 W EP 2022051476W WO 2023138789 A1 WO2023138789 A1 WO 2023138789A1
Authority
WO
WIPO (PCT)
Prior art keywords
value
rnic
request
transmitter
receiver
Prior art date
Application number
PCT/EP2022/051476
Other languages
French (fr)
Inventor
Ben-Shahar BELKAR
Sagiv Goren
David Yaron
Uri Hasson
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to PCT/EP2022/051476 priority Critical patent/WO2023138789A1/en
Publication of WO2023138789A1 publication Critical patent/WO2023138789A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/9063Intermediate storage in different physical parts of a node or terminal
    • H04L49/9068Intermediate storage in different physical parts of a node or terminal in the network interface card
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/901Buffering arrangements using storage descriptor, e.g. read or write pointers

Definitions

  • the present disclosure in some embodiments thereof relates to communication systems. More specifically, but not exclusively to methods and devices for network interface card (NIC) object coherency (NOC) messages.
  • NIC network interface card
  • NOC object coherency
  • RDMA Remote Direct Memory Access
  • CPU central processing unit
  • RNIC RDMA NIC
  • WQE work queue element
  • WQEs are for tagged (one-sided WRITE/RE AD/ ATOMIC) operations and for untagged (two-sided SEND/RECV) operations.
  • RDMA peers are communicating over queue-pairs (QPs) offering various transport services.
  • QPs queue-pairs
  • the common QP types (published by IB specification) are Reliable Connection (RC), Reliable Datagram (RD), Unreliable Datagram (UD), Extended Reliable Connection (XRC), and Unreliable Connection (UC).
  • RC Reliable Connection
  • RD Reliable Datagram
  • UD Unreliable Datagram
  • XRC Extended Reliable Connection
  • UC Unreliable Connection
  • RBW read before write
  • CAS compare and swap
  • the present disclosure relates to a device for receiving a plurality of transactions comprising a Remote Direct Memory Access, RDMA, Network Interface Card, RNIC, configured to: receive from a transmitter a request to read an object from a memory address; check when a header version is unlocked and extract from each cache line a version number, Vobj, and verify that the Vobj matches a least significant bytes, LSB, of a version field of a header of the object; when the Vobj in each cache line matches the value in the version field: remove the Vobj from each cache line of the object, read data of the object; transmit the object to the transmitter; and when the header version is locked or when the Vobj in each cache line does not match the value in the version field, retry to verify that header version is unlocked and that each Vobj matches the value in the version field in the header of the object for a predefined number of times and when there is no match transmit a failure response.
  • RDMA Remote Direct Memory Access
  • RNIC Network Interface Card
  • Performing the check by the RNIC and sending the object to the transmitter without the Vobj saves network and computing resources as well as improves latency.
  • the RNIC is further configured to: receive from a transmitter a request to write an object to the memory address; extract the value in the version field of the header of the object and write the value into each cache line of the object and write the object to the memory address; transmit a success response to the transmitter.
  • the present disclosure relates to a device for receiving a plurality of transactions comprising a Remote Direct Memory Access, RDMA, Network Interface Card, RNIC, configured to: receive from a transmitter a request to read an object from a memory address; extract data from each packet of the object; calculate a hash function value for the data in each packet of the object and update the calculated hash function value of a current packet at a temporary RNIC memory address; at a last packet, verify that the updated hash function value matches a value in a hash value field in a header of the object; and when the hash value matches the value in the hash value field in the header of the object transmit the object to the transmitter; when the hash value calculated does not match to the value in the hash value field of the header of the object, retry for a predefined number of times to: calculate the hash function value for the data in each packet and update the calculated hash function value of the current packet at the temporary RNIC memory and at the last packet, verify that the updated hash function value matches
  • RDMA
  • the RNIC is further configured to: receive from a transmitter a request to write an object to the memory address; calculate the hash function value for the data in each packet of the object and update the calculated hash function value of the current packet at a temporary RNIC memory address and at a last packet, update the calculated hash function value in the hash value field in the header of the object and the calculated hash function value of a current packet is updated at a temporary transmitter RNIC memory address and wherein at a last packet, the updated hash function value is written into the hash value field, in the header of the object; write the object to the memory address; and transmit a success response to the transmitter.
  • the RNIC is further configured to: receive from the transmitter the request to write an object to the address memory; wherein the hash function value is calculated by a transmitter RNIC for data in each packet of the object and the calculated hash function value of a current packet is updated at a temporary transmitter RNIC memory address and wherein at a last packet, the updated hash function value is written into the hash value field, in the header of the object; write the object to the address memory; and transmit a success response to the transmitter.
  • the RNIC is further configured to: receive from one or more transmitters a read before write, RBW, request to read an object from the memory address where packets of the RBW request contain a bit which indicates a request to write another object to the memory address is expected to be received from the one or more transmitters; assign in a table, a row with an identification, ID, of the one or more transmitters, a timestamp for the received RBW request and the address memory to read the object from of each transmitter; when a request to write an object to the address memory is received from another transmitter, wherein the other transmitter has a row in the table for a RBW request to read the object from the memory address with a timestamp smaller than the time stamp of the one or more transmitters requests: send a notification to the one or more transmitters according to the transmitter ID in the table, to avoid from sending the request to write to the memory address; and remove the rows of the other transmitter and of the one or more transmitters ID and timestamp from the table.
  • the RNIC is further configured to: send by one or more transmitters a RBW request to read the object from the memory address, where packets of the RBW request contain a bit which indicates a request to write another object to the memory address is expected to be received from the one or more transmitters.
  • the RNIC is further configured to: send a zero read request indicating that no request to write an object to the address memory is expected to be sent from the one or more transmitters.
  • the RNIC is further configured to: receive from the one or more transmitters, a zero read request indicating that no request to write an object to the address memory is expected to be sent from the one or more transmitters; and remove the row of the one or more transmitters IDs and timestamps from the table.
  • the present disclosure relates to a device for transmitting a plurality of transactions comprising a Remote Direct Memory Access, RDMA, Network Interface Card, RNIC, configured to: transmit to a receiver a request to read an object from a memory address, wherein a version number, Vobj, is extracted by the receiver from each cache line of the object and the Vobj of each cache line is verified to match the LSB value in a version field of a header of the object; receive the object from the receiver, when the Vobj in each cache line matches the LSB value in the version field, with the Vobj removed from each cache line of the object by the receiver; or receive a failure response from the receiver, when the Vobj in each cache line does not match the LSB value in the version field.
  • RDMA Remote Direct Memory Access
  • RNIC Network Interface Card
  • the RNIC is further configured to: transmit to a receiver a request to write an object to the memory address; wherein the LSB value in the version field of the header of the object is extracted by the receiver and written into each cache line of the object and the object is written to the memory address; receive a success response from the receiver.
  • the present disclosure relates to a device for transmitting a plurality of transactions comprising a Remote Direct Memory Access, RDMA, Network Interface Card, RNIC, configured to: transmit to a receiver a request to read an object from a memory address; wherein data is extracted by the receiver from each packet of the object and a hash function value is calculated for the data in each packet of the object, and at a last packet, the updated hash function value is verified by the receiver to match a value in a hash value field in a header of the object; and when the hash value matches the value in the hash value field in the header of the object, receive the obj ect from the receiver; and when the hash value calculated does not match to the value in the hash value field of the header of the object receive a failure response.
  • RDMA Remote Direct Memory Access
  • RNIC Network Interface Card
  • the RNIC is further configured to: transmit to a receiver a request to write an object to a memory address; wherein the hash function value is calculated by the receiver for the data in each packet of the object and the calculated hash function value of the current packet is updated by the receiver at a temporary receiver RNIC memory address and at a last packet, the calculated hash function value is updated in the hash value field in the header of the object; and receive a success response from the receiver after the object is written to the address memory by the receiver.
  • the RNIC is further configured to: calculate the hash function value for the data in each packet of the object and update the calculated hash function value of a current packet at a temporary RNIC memory address; at a last packet, write the updated hash function value into a hash value field in the header of the object; transmit the request to write the object to the receiver; and receive a success response from the receiver after the object is written to the address memory by the receiver.
  • the present disclosure relates to a device for optimizing a flow operation of a compare and swap operation and a read request to a single operation, in Remote Direct Memory Access, RDMA, transactions, comprising a RDMA, Network Interface Card, RNIC, configured to: receive from a transmitter a request for an optimized compare and swap and read operation: compare a content of a first memory address to a first value; when the content of the first memory address is equal to the first value: replace the content of the first memory address with a second value; read a content of a second memory address; and send a success response with the content read from the second memory address to the transmitter; and when the content of the first memory address is not equal to the first value: send a failure response to the transmitter.
  • RDMA Remote Direct Memory Access
  • RNIC Network Interface Card
  • the present disclosure relates to a method for receiving a plurality of transactions, comprising: at a Remote Direct Memory Access, RDMA, Network Interface Card, RNIC: receiving from a transmitter a request to read an obj ect from a memory address; extracting data version number, Vobj, from each cache line of the object and verifying that the Vobj of each cache line matches a Least Significant Bytes, LSB, value in a version field of a header of the object; when the Vobj in each cache line matches the value in the version field: removing the Vobj from each cache line of the object, reading data of the object; transmitting the object to the transmitter; and when the Vobj in each cache line does not match the LSB value in the version field, retry verifying that each Vobj matches the value in the LSB version field in the header of the object a predefined number of times and when there is no match transmitting a failure response.
  • RDMA Remote Direct Memory Access
  • RNIC Network Interface Card
  • the method further comprises: receiving from a transmitter a request to write an object to the memory address; extracting the LSB value in the version field of the header of the object and writing the value into each cache line of the object and writing the object to the memory address; transmitting a success response to the transmitter.
  • the present disclosure relates to a method for receiving a plurality of transactions comprising: at a Remote Direct Memory Access, RDMA, Network Interface Card, RNIC: receiving from a transmitter a request to read an object from a memory address; extracting data from each packet of the obj ect; calculating a hash function value for the data in each packet of the object and updating the calculated hash function value of a current packet at a temporary RNIC memory address; at a last packet, verifying that the updated hash function value matches a value in a hash value field in a header of the object; and when the hash value matches the value in the hash value field in the header of the object, transmitting the object to the transmitter; when the hash value calculated does not match to the value in the hash value field of the header of the object, retrying for a predefined number of times: calculating the hash function value for the data in each packet and updating the calculated hash function value of the current packet at the temporary RNIC memory and at the
  • the method further comprises: receiving from a transmitter a request to write an object to the memory address; calculating the hash function value for the data in each packet of the object and update the calculated hash function value of the current packet at a temporary RNIC memory address and at a last packet, update the calculated hash function value in the hash value field in the header of the object; and writing the object to the memory address and transmitting a success response.
  • the method further comprises: calculating the hash function value at a transmitter RNIC for the data in each packet of the object and updating the calculated hash function value of a current packet at a temporary transmitter RNIC memory address; at a last packet, writing the updated hash function value into the hash value field in the header of the object; transmitting the request to write the object to the receiver; at the receiver RNIC, receiving the request to write the object; and transmitting a success response to the transmitter.
  • the present disclosure relates to a method for transmitting a plurality of transactions, comprising: at a Remote Direct Memory Access, RDMA, Network Interface Card, RNIC: transmitting to a receiver a request to read an object from a memory address, wherein a version number, Vobj, is extracted by a receiver RNIC from each cache line of the object and the Vobj of each cache line is verified by the receiver RNIC to match the LSB value in a version field of a header of the object; receiving the object from the receiver RNIC, when the Vobj in each cache line matches the LSB value in the version field, with the Vobj removed from each cache line of the object by the receiver; or receiving a failure response from the receiver RNIC, when the Vobj in each cache line does not match the LSB value in the version field.
  • RDMA Remote Direct Memory Access
  • RNIC Network Interface Card
  • the present disclosure relates to a method for transmitting a plurality of transactions comprising: at a Remote Direct Memory Access, RDMA, Network Interface Card, RNIC: transmitting to a receiver RNIC a request to read an object from a memory address; wherein data is extracted by the receiver RNIC from each packet of the object and a hash function value is calculated for the data in each packet of the object, and at a last packet, the updated hash function value is verified by the receiver RNIC to match a value in a hash value field in a header of the object; and when the hash value matches the value in the hash value field in the header of the object, receiving the obj ect from the receiver; and when the hash value calculated does not match to the value in the hash value field of the header of the object, receiving a failure response.
  • RDMA Remote Direct Memory Access
  • RNIC Network Interface Card
  • the present disclosure relates to a method for optimizing a flow operation of a compare and swap operation and a read request to a single operation, in Remote Direct Memory Access, RDMA, transactions, comprising: at a receiver, when receiving from a transmitter a request for an optimized compare and swap and read operation: comparing a content of a first memory address to a first value; when the content of the first memory address is equal to the first value: replacing the content of the first memory address with a second value; reading a content of a second memory address; and sending a success response and the content read from the second memory address to the transmitter; and when the content is not equal to the first value: sending a failure response to the transmitter.
  • RDMA Remote Direct Memory Access
  • FIG. 1 schematically shows a block diagram of an apparatus for checking a NIC object coherency in RDMA transactions, according to some embodiments of the present disclosure
  • FIG. 2 schematically shows a layout of an object memory with an object version field, ObV, of 64 bits in the beginning of the header of the object (i.e. in the first cache line), and 16 bits of the LSB of the ObV, denoted as Vobj, in the beginning of each of the following cache lines, according to some embodiments of the present disclosure;
  • ObV object version field
  • FIG. 3a schematically shows an object memory layout before removing the Vobj from the cache lines, according to some embodiments of the present disclosure
  • FIG. 3b schematically shows an object memory layout after removing the Vobj from the cache lines, according to some embodiments of the present disclosure
  • FIG. 4 is a schematic sequence diagram of an example for checking a NIC object coherency in RDMA transactions, when receiving a read request, by verifying the Vobj in each cache line is equal to the LSB of the object version number ObV in the header of the object, according to some embodiments of the present disclosure;
  • FIG. 5 is a schematic sequence diagram of an example for checking a NIC object coherency in RDMA transactions, when receiving a write request, by verifying the Vobj in each cache line is equal to the LSB of the object version number ObV in the header of the object, according to some embodiments of the present disclosure
  • FIG. 6 schematically shows a flowchart of a method for checking a NIC object coherency in RDMA transactions, when receiving a read request, by verifying the Vobj in each cache line is equal to the LSB of the ObV in the header of the object, according to some embodiments of the present disclosure
  • FIG. 7 schematically shows a flowchart of a method for checking a NIC object coherency in RDMA transactions, when transmitting a read request, by inserting the LSB of the ObV in the header of the object into each cache line of the object, according to some embodiments of the present disclosure
  • FIG. 8 schematically shows a layout of a memory, where an obj ect is kept in the memory and has a header field with a value calculated by a hash function over the object, according to some embodiments of the present disclosure
  • FIG. 9 schematically shows a flowchart of a method for checking a NIC coherency in RDMA transactions, by a hash function, when receiving a read request, according to some embodiments of the present disclosure
  • FIG. 10 schematically shows a sequence diagram of an example for checking a NIC object coherency in RDMA transactions, when receiving a read request, by a hash function, without locking the object first, according to some embodiments of the present disclosure
  • FIG. 11 schematically shows a flow chart of an example of an RDMA write request, where the hash function is calculated in the receiver RNIC 122, according to some embodiments of the present disclosure
  • FIG. 12 schematically shows a sequence diagram of an example for writing an object in RDMA transactions, with a hash function calculated at the receiver RNIC, according to some embodiments of the present disclosure
  • FIG. 13 schematically shows a sequence diagram of an example for writing an object in RDMA transactions, with a hash function calculated at the transmitter RNIC, according to some embodiments of the present disclosure
  • FIG. 14 schematically shows a flow chart of an example of an RDMA write request, where the hash function is calculated in the transmitter RNIC 112, according to some embodiments of the present disclosure
  • FIG. 15 schematically shows a flow chart of a method for checking a NIC coherency in RDMA transactions, by a hash function, when transmitting a read request, according to some embodiments of the present disclosure
  • FIG. 16 schematically shows a flow chart of a method for sending a read before write request (RBW) which is a read request with a notification on a write request, which is expected to be sent after the read request, according to some embodiments of the present disclosure
  • FIGs. 17a-17g schematically show an example for the method of the read before write (RBW) request, according to some embodiments of the present disclosure
  • FIG. 18 schematically shows a flow chart of a method optimizing a flow operation of a compare and swap (CAS) operation and a read request to a single operation, in RDMA transactions, according to some embodiments of the present disclosure
  • FIG. 19 schematically shows a sequence diagram of an optimized compare and swap (CAS) and read operation, according to some embodiments of the present disclosure.
  • the present disclosure in some embodiments thereof relates to communication systems. More specifically, but not exclusively to methods and apparatuses for network interface card (NIC) object coherency (NOC) messages.
  • NIC network interface card
  • NOC object coherency
  • lock-free-read methods place a heavy load on compute resources and network resources.
  • LSB least significant bytes
  • the way remote read is implemented in those methods is by issuing an RDMA read request over RC QP, and testing the LSB on each cache line with the LSB of version number, if test fails, wait timeout value and issue a request again, wasting CPU, bandwidth, and high latency.
  • the way remote write is implemented in these methods is by adding the LSB to each cache line in the remote side, then sending the object back to its original machine, wasting compute CPU and network latency.
  • Today RDMA does not support, a way to test the object coherency, and to strip the object least significant bytes (LSB) from each cache line on reading, or add LSB to each cache line on a remote or local NIC.
  • LSB least significant bytes
  • devices and methods are presented for testing object coherency and reducing computing resources and network bandwidth and shorten latency of large packets by offloading object coherency manipulation to the NIC and by optimizing the object coherency flow.
  • the present disclosure may be a system, a method, and/or a computer program product.
  • the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure.
  • the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
  • the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
  • a network for example, the Internet, a local area network, a wide area network and/or a wireless network.
  • the computer readable program instructions may execute entirely on the user's computer and /or computerized device, partly on the user's computer and /or computerized device, as a stand-alone software package, partly on the user's computer (and /or computerized device) and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer and /or computerized device through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present disclosure.
  • FPGA field-programmable gate arrays
  • PLA programmable logic arrays
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical fiinction(s).
  • the functions noted in the block may occur out of the order noted in the figures.
  • two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • Apparatus 100 includes a transmitter 110 and a receiver 120.
  • the transmitter includes a memory 111, an RDMA NIC (RNIC) 112, which also includes a processor 113, and one or more applications 114.
  • the receiver 120 includes a memory 121, an RNIC 122, which includes a processor 123, and one or more applications 124.
  • Transmitter 110 sends a lock-free read request to receiver 120, in order to read an object from memory 121, which is a remote memory for transmitter 110.
  • the RNIC 122 verifies object coherency by using the least significant bytes (LSB) header version field in each and every cache line.
  • LSB least significant bytes
  • transmitter 110 requests to read the object from memory 121
  • the object has the LSB of the header version number appended to a fixed position on each and every object cache line.
  • RNIC 122 checks by processor 123 that the header version is unlocked and in case it is unlocked the RNIC 122 extracts from each cache line the version number, Vobj, and verifies it matches the LSB of the object version field, ObV, of the header of the object. The verification is done by comparing the LSB of the ObV to the Vobj of each cache line.
  • Fig. 2 schematically shows a layout of an object memory with an object version field, ObV, of 64 bits in the beginning of the header of the object (i.e. in the first cache line), and 16 bits of the LSB of the ObV, denoted as Vobj, in the beginning of each of the following cache lines, according to some embodiments of the present disclosure.
  • ObV object version field
  • Vobj 16 bits of the LSB of the ObV
  • a verification is made by looping over all the cache lines of the object and checking if the Vobj in each cache line is the same as the LSB of the ObV. When it is the same, the object is read and transmitted to the transmitter 110, without the Vobj in each cache line.
  • transmitter 110 When transmitter 110 reads and object from receiver 120, it allocates a buffer into which the object is written without the Vobj in each cache line. When the RNIC writes the object into the buffer it skips the LSB fields and does not write it into the buffer. So, in the buffer the object without the LSB fields is received, i.e. without the Vobj in each cache line. In case the header version of the object is locked or in case the Vobj of one or more cache lines is not equal to the LSB of the ObV, the receiver retries to verify that header version is unlocked and that each Vobj matches the value in the version field in the header of the object for a predefined number of times. When there is no match the receiver 120 transmits a failure response to transmitter 110.
  • Fig. 3a schematically shows an object memory layout before removing the Vobj from the cache lines, according to some embodiments of the present disclosure.
  • Fig. 3b schematically shows an object memory layout after removing the Vobj from the cache lines, according to some embodiments of the present disclosure.
  • the RNIC 122 of receiver 120 when transmitter 110 sends a write request to receiver 120, to write an object to memory 121, the RNIC 122 of receiver 120 extracts the value ObV in the version field of the header of the object and write the value into each cache line of the object and then writes the object to a memory address in memory 121.
  • RNIC 122 transmits a success response to the transmitter 110.
  • FIG. 4 is a schematic sequence diagram of an example for checking a NIC object coherency in RDMA transactions, when receiving a read request, by verifying the Vobj in each cache line is equal to the LSB of the ObV in the header of the object, according to some embodiments of the present disclosure.
  • a QP is created on the application 114 and sent to the RNIC 112 of the transmitter.
  • a read request is sent from the application 114 to the RNIC 112, to read address A of length L in memory 121.
  • RDMA When the QP is created at the transmitter a QP is created in parallel at the receiver, and the receiver RNIC 122 at 403 waits to receive a read request from the transmitter.
  • the read request is transmitted from the RNIC 112 at the transmitter to the RNIC 122 at the receiver and the read request is received.
  • a counter is set to zero, and a 406, the RNIC 122 checks if the counter is equals to a predefined number X.
  • the RNIC 122 extracts all the values Vobj from each cache line in the requested address and verifies it is equal to the value of the LSB of the object version ObV in the header of the object.
  • the RNIC 122 waits a predefined time interval T, and adds 1 to the counter. Then the RNIC 122 returns to 406, to tries to verify again.
  • the receiver RNIC 122 After X predefined times that the receiver RNIC 122 tries and fails, i.e. when the counter is equal to X, at 409, a failure response is transmitted to the transmitter, and the RNIC 122 returns to 403 and waits for s new read request.
  • the values of the Vobj extracted from each cache line are equal to the LSB of the ObV in the header of the object, at 411, the Vobj values are removed from each cache line of the object, the object is read and sent to the transmitter without the Vobj in each cache line. Then the RNIC 122 returns to 403 and waits for a new read request.
  • FIG. 5 is a schematic sequence diagram of an example for checking a NIC object coherency in RDMA transactions, when receiving a write request, by verifying the Vobj in each cache line is equal to the LSB of the object version number ObV in the header of the object, according to some embodiments of the present disclosure.
  • a QP is created on the application 114 and sent to the RNIC 112 of the transmitter.
  • a write request is sent from the application 114 to the RNIC 112, to write an object to address A of length L in memory 121.
  • the packets that are sent from the transmitter in the write request are without the value of the Vobj in the cache lines.
  • RDMA when the QP is created at the transmitter a QP is created in parallel at the receiver, and the receiver RNIC 122 at 503 waits to receive a write request from the transmitter.
  • the write request is transmitted from the RNIC 112 at the transmitter to the RNIC 122 at the receiver and the write request is received.
  • the RNIC 122 at 505 extracts the LSB value of the ObV in the header of the object and inserts the value into the beginning of each cache line. After the LSB value is inserted to each cache line, at 506, the object is written to the memory 121, and at 507, the RNIC 122 transmits a success response to the transmitter 110.
  • FIG. 6 schematically shows a flowchart of a method for checking a NIC object coherency in RDMA transactions, when receiving a read request, by verifying the Vobj in each cache line is equal to the LSB of the ObV in the header of the object, according to some embodiments of the present disclosure.
  • a request to read an object from a memory address on memory 121 is received at the RNIC 122 of receiver 120.
  • the RNIC 122 extracts a data version number Vobj, from each cache line of the object and verifies, that the Vobj of each cache line matches the LSB value of a version field of a header of the object ObV.
  • the Vobj in each cache line matches the value in the version field
  • the Vobj is removed from each cache line of the object, by the RNIC 122 and the data of the object is read, and the object is transmitted to the transmitter 110.
  • the RNIC 122 retries to verify that each Vobj matches the value in the LSB of the ObV in the header of the object a predefined number of times. When there is no match, a failure response is transmitted to the transmitter 110.
  • FIG. 7 schematically shows a flowchart of a method for checking a NIC object coherency in RDMA transactions, when transmitting a read request, by inserting the LSB of the ObV in the header of the object into each cache line of the object, according to some embodiments of the present disclosure.
  • a request to read an object from a memory address is transmitted from a transmitter 110, by the RNIC 112 to receiver 120.
  • a version number, Vobj is extracted by the receiver from each cache line of the object and the Vobj of each cache line is verified to match the LSB value of an object version number, ObV, field of a header of the object.
  • the transmitter 110 receives the object from the receiver 120, when the Vobj in each cache line matches the LSB value of the ObV, with the Vobj removed from each cache line of the object by the receiver, i.e. the object is received in the transmitter 110 without the Vobj in each cache line. Otherwise, at 703, when the Vobj in each cache line does not match the LSB value of the ObV, the transmitter 110 receives a failure response from the receiver.
  • the LSB value of the ObV is extracted by the receiver 120 and written into each cache line of the object. The object is written to the memory address in memory 121, and the transmitter 110 receives a success response from the receiver 120.
  • the LSB value of the version field in the header of the object is written into the beginning of each cache line.
  • a NIC coherency in RDMA transactions when receiving a read or write request, may be checked by a hash function.
  • FIG. 8 schematically shows a layout of a memory, where an obj ect is kept in the memory and has a header field with a value calculated by a hash function over the object, according to some embodiments of the present disclosure.
  • a hash value is calculated over the object and stored in a hash value header field.
  • a test is done by calculating the hash value on the object and the result is compared with the hash value header, if it is the same the object is serializable and coherent if not, it means, the object is being worked on by the application, and should be read again or discarded.
  • any hash function that has good distribution may be used including Cyclic Redundancy Check -16 (CRC-16), Cyclic Redundancy Check -32 (CRC-32), Cyclic Redundancy Check -64 (CRC-64), Message digest algorithm 5 (MD5), and the like.
  • the hash value header field is chosen according to the hash function used, for example MD5 uses 128-bit header, CRC-64 uses 64-bit header, CRC-32 uses 32-bit header, and the like.
  • the hash function covers the entire content of the object and is calculated by the RNIC that writes the object into the memory.
  • the object is coherent if the hash function calculated by reader matches the value in the hash field in the header of the object.
  • FIG. 9 schematically shows a flowchart of a method for checking a NIC coherency in RDMA transactions, by a hash function, when receiving a read request, according to some embodiments of the present disclosure.
  • a read request is received in receiver 120, to read from a memory address in memory 121.
  • the RNIC 122 at the receiver 120 extracts data from each packet of the object.
  • the RNIC 122 calculates a hash function value for the data in each packet of the object and updates the calculated hash function value of the current packet at a temporary memory address in RNIC 122.
  • the RNIC 122 verifies that the updated hash function value matches a value in a hash value field in a header of the object.
  • the RNIC 122 at 906, transmits the object to the transmitter 110 in response to the request to read the object.
  • the RNIC 122 retries for a predefined number of times to calculate the hash function value for the data in each packet.
  • the RNIC 122 updates the calculated hash function value of the current packet at the temporary RNIC memory.
  • the last packet verifies that the updated hash function value matches the value in the hash value field of the header of the object.
  • the RNIC 122 transmits a failure response to transmitter 110.
  • FIG. 10 schematically shows a sequence diagram of an example for checking a NIC object coherency in RDMA transactions, when receiving a read request, by a hash function, without locking the object first, according to some embodiments of the present disclosure.
  • a QP is created at the application 114 and sent to the RNIC 112 of the transmitter 110.
  • a read request is sent from the application 114 to the RDMA NIC 112, to read address A of length L in memory 121.
  • the QP in the receiver is created with a hash function that is calculated over the object stored in memory 121.
  • the result of the hash function is stored at a hash value field in the header of the object, together with the header length.
  • the receivers RDMA NIC 122 at 1003 waits to receive a read request from the transmitter 110.
  • the read request is transmitted from the RDMA NIC 112 at the transmitter 110 to the RDMA NIC 122 at the receiver 120 and the read request is received.
  • a counter is set to zero, and at 1006, the RDMA NIC 122 checks if the counter equals to a predefined number X.
  • the RDMA NIC 122 extracts data from each packet of the object and calculates the hash function for each packet and updates the calculated hash function value of the current packet at a temporary memory address in RNIC 122.
  • the RNIC gets to the last packet of the object, the RNIC 122 verifies that the updated hash function value matches the value in the hash value field in a header of the object.
  • the updated hash function value stored in the temporary memory of the RNIC 122 is not equal to the value of the hash value field in the header of the object, at 1008, the RDMA NIC 122 waits a predefined time interval T, and adds 1 to the counter.
  • the RDMA NIC 122 returns to 1006, to tries to verify again.
  • X predefined times that the receivers RDMA NIC 122 tries and fails, i.e. when the counter is equal to X
  • a fail response is transmitted to the transmitter 110, and at 1010 the RDMA NIC 122 returns to 1003 and waits for a new read request.
  • the updated hash function value stored in the temporary memory of the RNIC 122 is equal to the value of the hash value field in the header of the object, at 1011, the RNIC 122, transmits the object to the transmitter 110 in response to the request to read the object.
  • the object is then sent at 1012, to the application 114.
  • the RDMA NIC 122 returns to 1003 and waits for a new read request.
  • FIG. 11 schematically shows a flow chart of an example of an RDMA write request, where the hash function is calculated in the receiver RNIC 122, according to some embodiments of the present disclosure.
  • an RDMA write request is sent from the transmitter 110 to the receiver
  • the packets are divided to write first, write middle, write last and immediate data (for example as in InfiniBand (IB) / RDMA over converged Ethernet (RoCE) / RoCE version 2 (RoCEv2)) .
  • the write request is received at the receiver RNIC 122, which at 1102 extracts data from all the packets of the object in the memory
  • the RNIC 122 calculates the hash function over each packet of the object.
  • the result of each calculation is stored on a temporary memory of the RNIC 122, e.g. in the QP context (QPC) and is updated with every calculation of a packet.
  • QPC QP context
  • the result is updated, e.g. at the QPC, and the updated hash function is then updated at the hash value field at the header of the object.
  • the RNIC 122 writes to the object in the memory 121 and a success response is sent to the transmitter 110.
  • FIG. 12 schematically shows a sequence diagram of an example for writing an object in RDMA transactions, with a hash function calculated at the receiver RNIC, according to some embodiments of the present disclosure.
  • a QP is created at the application 114 and sent to the RNIC 112 of the transmitter 110.
  • a write request is sent from the application 114 to the RNIC 112, to write to an address A of length L in memory 121.
  • the QP in the receiver is created with a hash function that is calculated over the object stored in memory 121.
  • the result of the hash function is stored at a hash value field in the header of the object, together with the header length.
  • the receiver RNIC 122 at 1203 waits to receive a write request from the transmitter 110.
  • the write request is transmitted from the RNIC 112 at the transmitter 110 to the RNIC 122 at the receiver 120 and the write request is received.
  • the RNIC 122 extract data from all the packets of the object in the memory 121, and calculates the hash function over each packet of the object. The result of each calculation is stored on a temporary memory of the RNIC 122, e.g. in the QP context (QPC), and is updated with every calculation of a packet. When calculating the hash function for the last packet, the result is updated, e.g. at the QPC.
  • the RNIC 122 writes to the object in the memory 121 and at 1207 a success response is sent to the transmitter 110.
  • the hash function may be calculated at the transmitter RNIC 112 instead of being calculated at the receiver RNIC 122.
  • the receiver 120 receives from the transmitter 110 the request to write an object to an address memory in memory 121, together with the hash function value calculated by the transmitter RNIC 112 for data in each packet of the object.
  • the receiver RNIC 122 writes the object to the memory 121 and transmits a success response to the transmitter 110.
  • the hash function is calculated in the transmitter 110 instead of being calculated in the receiver 120.
  • FIG. 13 schematically shows a sequence diagram of an example for writing an object in RDMA transactions, with a hash function calculated at the transmitter RNIC, according to some embodiments of the present disclosure.
  • a QP is created at the application 114 and sent to the RNIC 112 of the transmitter 110.
  • the QP in the transmitter is created with a hash function that is calculated over the object stored in memory 111.
  • the result of the final value of the hash function is stored at a hash offset field in the header of the object, together with the header length.
  • a write request is sent from the application 114 to the RNIC 112, to write to an address A of length L in memory 121.
  • the hash function is calculated over the whole object and at 1305, the write request is sent from the transmitter RNIC 112 to the receiver RNIC 122.
  • the QP is created at the transmitter a QP is created in parallel at the receiver 120.
  • the receiver RNIC 122 at 1306 waits to receive a write request from the transmitter 110.
  • the RNIC 122 writes the object to the memory 121 and inserts the hash function value calculated at the transmitter RNIC 112 into the hash field in the header of the object written to memory 121.
  • a success response is sent to the transmitter 110.
  • FIG. 14 schematically shows a flow chart of an example of an RDMA write request, over RoCEv2, where the hash function is calculated in the transmitter RNIC 112, according to some embodiments of the present disclosure.
  • an opcode of write and check sum is created.
  • the destination virtual address (VA) of the last SGE, where the data has to be written to is determined instead of the source of the location from where to take the data.
  • the data i.e. the hash function
  • the RNIC 112 at 1402 reads the data from the memory 111, which are divided to scatter gather entries (SGE).
  • the RNIC 112 extracts the data from each SGE, calculates the hash function and stores the result data in a temporary memory inside QPC of the RNIC 112.
  • the RNIC 112 sends the packets that ware read by the RNIC 112 to the network as a write request, i.e. divided to write first, write middle, write last.
  • the destination VA is written, so the RNIC knows the destination, i.e. the address to write the object.
  • the last SGE is read by the RNIC 112 and at 1406, the final value of the hash function, which is stored at the QPC in the RNIC 112, is written to an additional packet “write only”, which is sent at 1407 to the network after the packets of the write first, write middle and write last. At 1408, when there is an immediate data packet it is sent after the “write only” packet.
  • FIG. 15 schematically shows a flow chart of a method for checking a NIC coherency in RDMA transactions, by a hash function, when transmitting a read request, according to some embodiments of the present disclosure.
  • the RNIC 112 of transmitter 110 sends a read request to receiver 120, to read an object from or o a memory address in memory 121
  • the RNIC 122 of the receiver 120 extracts data from each packet of the obj ect and calculates a hash function value for the data in each packet of the obj ect.
  • the value of the calculated hash function is stored at a QPC, using a temporary memory of the RNIC 122, where fore each packet, the hash function value is updated at the QPC and sent at a last packet, the RNIC 122 verifies the updated hash function value matches a value in a hash value field in a header of the object.
  • the transmitter 110 receives the object from the receiver (i.e. the object that was read successfully).
  • the transmitter 110 at 1506 receives a failure response.
  • the hash function may be calculated by the RNIC 112 of transmitter 110.
  • the RNIC since RDMA protocol defines specific read behavior the RNIC enables the receiver to return a special read response for failures detection (no payload returned). Acknowledgment extended transport header (AETH) syndrome field may be used for that. As a result, a transmitter that got the special read response or AETH syndrome field skips the packet sequence number (PSN) gap reserved for that specific read response as it would do if it would have got all the packets (PSNs) from the receiver. The transmitter reports the error using the completion queue to the application for that failure even if the read operation was not signaled for completion generation.
  • AETH acknowledgement extended transport header
  • FIG. 16 schematically shows a flow chart of a method for sending a read before write request (RBW) which is a read request with a notification on a write request which is expected to be sent after the read request, according to some embodiments of the present disclosure.
  • RNIC 122 receives from one or more transmitters, a read before write (RBW) request to read an object from a memory address in memory 121, where packets of the RBW request contain a bit, which indicates a request to write to the object memory address is expected to be received from the one or more transmitters.
  • RBW read before write
  • the RNIC 122 assigns in a table, a row with an identification, ID, of the one or more transmitters and a timestamp for the exact time it received request to read the object from the memory address of each transmitter.
  • a request to write an obj ect to the address memory is received from another transmitter, wherein the other transmitter has a row in the table for a request to read the object from the memory address with a timestamp smaller than the time stamp of the one or more transmitters requests.
  • the RNIC 122 sends a notification to the one or more transmitters, according to the transmitter ID in the table, to avoid from sending the request to write to the memory address, as the content of the memory address changed. Then at 1605, the RNIC 122 removes the rows of the other transmitter and one or more transmitters ID and timestamp from the table, as the RBW request of the other transmitter is accomplished and the write request of the one or more transmitters is avoided.
  • the one or more transmitters send a RBW request to read the object from the memory, where packets of the RBW request contain a bit, which indicates a request to write another object to the memory address is expected to be received from the one or more transmitters.
  • the one or more transmitters may send a zero read request indicating that no request to write an object to the address memory is expected to be sent from the one or more transmitters.
  • the receiver RNIC 122 receives from the one or more transmitters, the zero read request indicating that no request to write an object to the address memory is expected to be sent from the one or more transmitters, and, removes the row of the one or more transmitters IDs and timestamps from the table.
  • FIGs. 17a-17g schematically show an example for the method of the read before write (RBW) request, according to some embodiments of the present disclosure.
  • FIG. 17a shows transmitters B, 1702, which sends to the RNIC 1705 of receiver 1704 a RBW request, to read from address 0x1234.
  • a read request also contains a bit indicating a write request is expected to be sent from transmitter B 1702 and when this bit is enabled the read request is a RBW request.
  • the RNIC 1704 receives the request and assign in table 1706 a row for transmitter B 1702.
  • FIG. 17b schematically shows, transmitter C 1703, which sends to the RNIC 1705 of receiver 1704 a RBW request to read from address 0x1234.
  • the read request also contains a bit indicating a write request is expected to be sent from transmitter C 1703.
  • the RNIC 1704 receives the RBW request and assign in table 1706 a row for transmitter C 1703.
  • FIG. 17c schematically shows transmitter A 1701, which sends to the RNIC 1705 of receiver 1704 a request to read from address 0x1234.
  • the read request does not contain any indication for a write request that is expected to be sent from transmitter A 1701 later.
  • the RNIC 1704 receives the request, however, it is not assigned any row in table 1706.
  • FIG. 17d schematically shows, transmitter B 1702, which sends a write request to address 0x1234, as indicated in the RBW request sent by transmitter B 1702 earlier (as seen in FIG. 17a).
  • FIG. 17e schematically shows that the row of transmitter B 1702 is removed, since the RBW request is accomplished, the write request of transmitter B 1702 is accomplished, and a write is performed on the content of address 0x1234. Since the content of address 0x1234 was changed, the RNIC 1705 notifies the rest of the transmitters in the table, that were expected to write to address 0x1234, i.e. transmitter C 1703 in this case, that the content of address 0x1234 was changed, and removes the row of transmitter C 1703 from table 1706.
  • FIG. 17f transmitter A 1701 sends a write request to address 0x1234.
  • FIG. 17g schematically shows the RNIC 1705 sends a failure notice to transmitter A 1701. From this example it can be seen that by using the request to read with the indication bit of a write request expected to be sent, the write request transmission, failure and failure notice transmission is avoided, in case the content of the address has been changed. This saves time and computational resources.
  • FIG. 18 schematically shows a flow chart of a method optimizing a flow operation of a compare and swap (CAS) operation and a read request to a single operation, in RDMA transactions, according to some embodiments of the present disclosure.
  • CAS compare and swap
  • 2PL Two phase locking
  • the optimized CAS and read operation is a new opcode CAS-and-READ that combines the two actions: namely, if CAS on address X was successful, READ N bytes from address Y. If not, return failure.
  • RNIC 122 receives from transmitter 110 a request for an optimized compare and swap and read (CAS and read) operation.
  • the RNIC 122 compares a content of a first memory address to a first value.
  • the RNIC 122 replaces the content of the first memory address with a second value.
  • the RNIC 122 reads a content of a second memory address and at 1805 the RNIC 122 sends a success response and the content read from the second memory address to the transmitter 110. However, when the content is not equal to the first value at 1806, the RNIC 122 sends a failure response to the transmitter 110.
  • FIG. 19 schematically shows a sequence diagram of an optimized compare and swap (CAS) and read operation, according to some embodiments of the present disclosure.
  • a QP is created at the transmitter 110, and at the receiver 120.
  • the RNIC 122 in the receiver 120 waits for a CAS and read request to arrive from a transmitter.
  • a CAS and read operation is sent from application 114 to the RNIC 112 in the transmitter 110.
  • the CAS and read operation includes the compare and swap destination address and a first value to which the content of the destination address is compared to, a second value to replace the content of the CAS destination address, a read destination address, and the read length.
  • the CAS and read request is transmitted from the RNIC 112 in the transmitter to the RNIC 122 in the receiver.
  • the RNIC 122 goes to the CAS address and at 1906, the RNIC 122 compares the content of the CAS address to the first value.
  • the RNIC sends a failure response to the RNIC 112 in transmitter 110 and at 1908, the failure response is sent to the application 114.
  • the RNIC 122 replaces the content of the CAS address with the second value.
  • the RNIC goes to the second memory address (the read address) and reads the content of a second memory address.
  • the RNIC 122 checks if the read operation was successful. When the read fails and the content of the second address memory is not read successfully a failure response is sent to the RNIC 112 at 1911, and the failure response is sent from the RNIC 112 to the application 1 14, at 1911. However, when the read of the content of the second address memory is successful, at 1913, the RNIC 122 sends a success response with the content read from the second memory address to the transmitter 110. And at 1914, the success response is then sent to the application 114.
  • composition or method may include additional ingredients and/or steps, but only if the additional ingredients and/or steps do not materially alter the basic and novel characteristics of the claimed composition or method.
  • the singular form “a”, “an” and “the” include plural references unless the context clearly dictates otherwise.
  • the term “a compound” or “at least one compound” may include a plurality of compounds, including mixtures thereof.
  • range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the disclosure. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
  • a numerical range is indicated herein, it is meant to include any cited numeral (fractional or integral) within the indicated range.
  • the phrases “ranging/ranges between” a first indicate number and a second indicate number and “ranging/ranges from” a first indicate number “to” a second indicate number are used herein interchangeably and are meant to include the first and second indicated numbers and all the fractional and integral numerals therebetween.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Methods and devices for checking a network interface card (NIC) coherency in Remote Direct Memory Access (RDMA) transactions are presented. The method comprises receiving by a RDMA NIC at a receiver, from a transmitter a request to read an object from an address memory and checking when a header version is unlocked and extracting from each cache line a version number, Vobj, and verifying that the Vobj matches a least significant bytes, LSB, of a version field of a header of the object, ObV. When the Vobj in each cache line matches the ObV: removing the Vobj from each cache line of the object, reading data of the object and transmitting the object to the transmitter. When there is no match, retry verifying that each Vobj matches the ObV in the header of the object a predefined number of times and when there is no match transmitting a failure response.

Description

METHODS AND DEVICES FOR NETWORK INTERFACE CARD (NIC) OBJECT COHERENCY (NOC) MESSAGES
Technical Field
The present disclosure, in some embodiments thereof relates to communication systems. More specifically, but not exclusively to methods and devices for network interface card (NIC) object coherency (NOC) messages.
Background
Remote Direct Memory Access (RDMA) is a direct memory access from a memory of one computer into that of another computer without involving either one of the computers operating system. RDMA is a technology widely used in modem datacenters and computer-clusters for low-latency and high-bandwidth networking. RDMA offloads memory operations from the central processing unit (CPU) to the RDMA NIC (RNIC), which directly accesses the memory. This offloading saves CPU time, so the CPU is free to perform other tasks. The RDMA software-layer uses instruction called “verbs” to perform RDMA operations, which are then transformed to work-requests written to a queue of the RNIC. Each work-request is called a work queue element (WQE). WQEs are for tagged (one-sided WRITE/RE AD/ ATOMIC) operations and for untagged (two-sided SEND/RECV) operations. RDMA peers are communicating over queue-pairs (QPs) offering various transport services. The common QP types (published by IB specification) are Reliable Connection (RC), Reliable Datagram (RD), Unreliable Datagram (UD), Extended Reliable Connection (XRC), and Unreliable Connection (UC). There are companies that have proprietary QP types like Amazon SRD and Mellanox DCT. Summary
It is an object of the present disclosure to provide devices and methods fortesting object coherency and reducing computing resources and network bandwidth and shorten latency of large packets by offloading object coherency check and manipulation to the RNIC and by optimizing the object coherency flow.
It is another object of the present disclosure to provide a method for performing a read before write (RBW) request, which reduces computing resources and saves time.
It is yet another object of the present disclosure to provide a method for performing an RDMA protocol opcode of compare and swap (CAS) and read as one single operation, where, the RDMA behavior has the knowledge of the packet structure, and the manipulation needed on the packet using the CAS and read opcode. Thereby, saving time and computing resources.
Other systems, methods, features, and advantages of the present disclosure will be or become apparent to one with skill in the art upon examination of the following drawings and detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description, be within the scope of the present disclosure, and be protected by the accompanying claims.
In one aspect the present disclosure relates to a device for receiving a plurality of transactions comprising a Remote Direct Memory Access, RDMA, Network Interface Card, RNIC, configured to: receive from a transmitter a request to read an object from a memory address; check when a header version is unlocked and extract from each cache line a version number, Vobj, and verify that the Vobj matches a least significant bytes, LSB, of a version field of a header of the object; when the Vobj in each cache line matches the value in the version field: remove the Vobj from each cache line of the object, read data of the object; transmit the object to the transmitter; and when the header version is locked or when the Vobj in each cache line does not match the value in the version field, retry to verify that header version is unlocked and that each Vobj matches the value in the version field in the header of the object for a predefined number of times and when there is no match transmit a failure response.
Performing the check by the RNIC and sending the object to the transmitter without the Vobj saves network and computing resources as well as improves latency.
In a further implementation of the first aspect, the RNIC is further configured to: receive from a transmitter a request to write an object to the memory address; extract the value in the version field of the header of the object and write the value into each cache line of the object and write the object to the memory address; transmit a success response to the transmitter.
In a second aspect, the present disclosure relates to a device for receiving a plurality of transactions comprising a Remote Direct Memory Access, RDMA, Network Interface Card, RNIC, configured to: receive from a transmitter a request to read an object from a memory address; extract data from each packet of the object; calculate a hash function value for the data in each packet of the object and update the calculated hash function value of a current packet at a temporary RNIC memory address; at a last packet, verify that the updated hash function value matches a value in a hash value field in a header of the object; and when the hash value matches the value in the hash value field in the header of the object transmit the object to the transmitter; when the hash value calculated does not match to the value in the hash value field of the header of the object, retry for a predefined number of times to: calculate the hash function value for the data in each packet and update the calculated hash function value of the current packet at the temporary RNIC memory and at the last packet, verify that the updated hash function value matches the value in the hash value field of the header of the object and when there is no match, transmitting a failure response.
In a further implementation of the second aspect, the RNIC is further configured to: receive from a transmitter a request to write an object to the memory address; calculate the hash function value for the data in each packet of the object and update the calculated hash function value of the current packet at a temporary RNIC memory address and at a last packet, update the calculated hash function value in the hash value field in the header of the object and the calculated hash function value of a current packet is updated at a temporary transmitter RNIC memory address and wherein at a last packet, the updated hash function value is written into the hash value field, in the header of the object; write the object to the memory address; and transmit a success response to the transmitter.
In a further implementation of the second aspect, the RNIC is further configured to: receive from the transmitter the request to write an object to the address memory; wherein the hash function value is calculated by a transmitter RNIC for data in each packet of the object and the calculated hash function value of a current packet is updated at a temporary transmitter RNIC memory address and wherein at a last packet, the updated hash function value is written into the hash value field, in the header of the object; write the object to the address memory; and transmit a success response to the transmitter.
In a further implementation of the second aspect, the RNIC is further configured to: receive from one or more transmitters a read before write, RBW, request to read an object from the memory address where packets of the RBW request contain a bit which indicates a request to write another object to the memory address is expected to be received from the one or more transmitters; assign in a table, a row with an identification, ID, of the one or more transmitters, a timestamp for the received RBW request and the address memory to read the object from of each transmitter; when a request to write an object to the address memory is received from another transmitter, wherein the other transmitter has a row in the table for a RBW request to read the object from the memory address with a timestamp smaller than the time stamp of the one or more transmitters requests: send a notification to the one or more transmitters according to the transmitter ID in the table, to avoid from sending the request to write to the memory address; and remove the rows of the other transmitter and of the one or more transmitters ID and timestamp from the table.
In a further implementation of the second aspect, the RNIC is further configured to: send by one or more transmitters a RBW request to read the object from the memory address, where packets of the RBW request contain a bit which indicates a request to write another object to the memory address is expected to be received from the one or more transmitters.
In a further implementation of the second aspect, the RNIC is further configured to: send a zero read request indicating that no request to write an object to the address memory is expected to be sent from the one or more transmitters.
In a further implementation of the second aspect, the RNIC is further configured to: receive from the one or more transmitters, a zero read request indicating that no request to write an object to the address memory is expected to be sent from the one or more transmitters; and remove the row of the one or more transmitters IDs and timestamps from the table.
In a third aspect, the present disclosure relates to a device for transmitting a plurality of transactions comprising a Remote Direct Memory Access, RDMA, Network Interface Card, RNIC, configured to: transmit to a receiver a request to read an object from a memory address, wherein a version number, Vobj, is extracted by the receiver from each cache line of the object and the Vobj of each cache line is verified to match the LSB value in a version field of a header of the object; receive the object from the receiver, when the Vobj in each cache line matches the LSB value in the version field, with the Vobj removed from each cache line of the object by the receiver; or receive a failure response from the receiver, when the Vobj in each cache line does not match the LSB value in the version field.
In a further implementation of the third aspect, the RNIC is further configured to: transmit to a receiver a request to write an object to the memory address; wherein the LSB value in the version field of the header of the object is extracted by the receiver and written into each cache line of the object and the object is written to the memory address; receive a success response from the receiver.
In a fourth aspect, the present disclosure relates to a device for transmitting a plurality of transactions comprising a Remote Direct Memory Access, RDMA, Network Interface Card, RNIC, configured to: transmit to a receiver a request to read an object from a memory address; wherein data is extracted by the receiver from each packet of the object and a hash function value is calculated for the data in each packet of the object, and at a last packet, the updated hash function value is verified by the receiver to match a value in a hash value field in a header of the object; and when the hash value matches the value in the hash value field in the header of the object, receive the obj ect from the receiver; and when the hash value calculated does not match to the value in the hash value field of the header of the object receive a failure response.
In a further implementation of the fourth aspect, the RNIC is further configured to: transmit to a receiver a request to write an object to a memory address; wherein the hash function value is calculated by the receiver for the data in each packet of the object and the calculated hash function value of the current packet is updated by the receiver at a temporary receiver RNIC memory address and at a last packet, the calculated hash function value is updated in the hash value field in the header of the object; and receive a success response from the receiver after the object is written to the address memory by the receiver.
In a further implementation of the fourth aspect, the RNIC is further configured to: calculate the hash function value for the data in each packet of the object and update the calculated hash function value of a current packet at a temporary RNIC memory address; at a last packet, write the updated hash function value into a hash value field in the header of the object; transmit the request to write the object to the receiver; and receive a success response from the receiver after the object is written to the address memory by the receiver.
In a fifth aspect, the present disclosure relates to a device for optimizing a flow operation of a compare and swap operation and a read request to a single operation, in Remote Direct Memory Access, RDMA, transactions, comprising a RDMA, Network Interface Card, RNIC, configured to: receive from a transmitter a request for an optimized compare and swap and read operation: compare a content of a first memory address to a first value; when the content of the first memory address is equal to the first value: replace the content of the first memory address with a second value; read a content of a second memory address; and send a success response with the content read from the second memory address to the transmitter; and when the content of the first memory address is not equal to the first value: send a failure response to the transmitter.
In a sixth aspect the present disclosure relates to a method for receiving a plurality of transactions, comprising: at a Remote Direct Memory Access, RDMA, Network Interface Card, RNIC: receiving from a transmitter a request to read an obj ect from a memory address; extracting data version number, Vobj, from each cache line of the object and verifying that the Vobj of each cache line matches a Least Significant Bytes, LSB, value in a version field of a header of the object; when the Vobj in each cache line matches the value in the version field: removing the Vobj from each cache line of the object, reading data of the object; transmitting the object to the transmitter; and when the Vobj in each cache line does not match the LSB value in the version field, retry verifying that each Vobj matches the value in the LSB version field in the header of the object a predefined number of times and when there is no match transmitting a failure response.
In a further implementation of the sixth aspect, the method further comprises: receiving from a transmitter a request to write an object to the memory address; extracting the LSB value in the version field of the header of the object and writing the value into each cache line of the object and writing the object to the memory address; transmitting a success response to the transmitter. In a seventh aspect, the present disclosure relates to a method for receiving a plurality of transactions comprising: at a Remote Direct Memory Access, RDMA, Network Interface Card, RNIC: receiving from a transmitter a request to read an object from a memory address; extracting data from each packet of the obj ect; calculating a hash function value for the data in each packet of the object and updating the calculated hash function value of a current packet at a temporary RNIC memory address; at a last packet, verifying that the updated hash function value matches a value in a hash value field in a header of the object; and when the hash value matches the value in the hash value field in the header of the object, transmitting the object to the transmitter; when the hash value calculated does not match to the value in the hash value field of the header of the object, retrying for a predefined number of times: calculating the hash function value for the data in each packet and updating the calculated hash function value of the current packet at the temporary RNIC memory and at the last packet, verifying that the updated hash function value matches the value in the hash value field of the header of the object and when there is no match transmitting a failure response.
In a further implementation of the seventh aspect, the method further comprises: receiving from a transmitter a request to write an object to the memory address; calculating the hash function value for the data in each packet of the object and update the calculated hash function value of the current packet at a temporary RNIC memory address and at a last packet, update the calculated hash function value in the hash value field in the header of the object; and writing the object to the memory address and transmitting a success response. In a further implementation of the seventh aspect, the method further comprises: calculating the hash function value at a transmitter RNIC for the data in each packet of the object and updating the calculated hash function value of a current packet at a temporary transmitter RNIC memory address; at a last packet, writing the updated hash function value into the hash value field in the header of the object; transmitting the request to write the object to the receiver; at the receiver RNIC, receiving the request to write the object; and transmitting a success response to the transmitter.
In an eighth aspect, the present disclosure relates to a method for transmitting a plurality of transactions, comprising: at a Remote Direct Memory Access, RDMA, Network Interface Card, RNIC: transmitting to a receiver a request to read an object from a memory address, wherein a version number, Vobj, is extracted by a receiver RNIC from each cache line of the object and the Vobj of each cache line is verified by the receiver RNIC to match the LSB value in a version field of a header of the object; receiving the object from the receiver RNIC, when the Vobj in each cache line matches the LSB value in the version field, with the Vobj removed from each cache line of the object by the receiver; or receiving a failure response from the receiver RNIC, when the Vobj in each cache line does not match the LSB value in the version field.
In a ninth aspect, the present disclosure relates to a method for transmitting a plurality of transactions comprising: at a Remote Direct Memory Access, RDMA, Network Interface Card, RNIC: transmitting to a receiver RNIC a request to read an object from a memory address; wherein data is extracted by the receiver RNIC from each packet of the object and a hash function value is calculated for the data in each packet of the object, and at a last packet, the updated hash function value is verified by the receiver RNIC to match a value in a hash value field in a header of the object; and when the hash value matches the value in the hash value field in the header of the object, receiving the obj ect from the receiver; and when the hash value calculated does not match to the value in the hash value field of the header of the object, receiving a failure response.
In a tenth aspect, the present disclosure relates to a method for optimizing a flow operation of a compare and swap operation and a read request to a single operation, in Remote Direct Memory Access, RDMA, transactions, comprising: at a receiver, when receiving from a transmitter a request for an optimized compare and swap and read operation: comparing a content of a first memory address to a first value; when the content of the first memory address is equal to the first value: replacing the content of the first memory address with a second value; reading a content of a second memory address; and sending a success response and the content read from the second memory address to the transmitter; and when the content is not equal to the first value: sending a failure response to the transmitter.
Unless otherwise defined, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which embodiments. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments, exemplary methods and/or materials are described below. In case of conflict, the patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and are not intended to be necessarily limiting. Brief Description of the Several Views of the Drawings
Some embodiments of the disclosure are herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of embodiments of the disclosure. In this regard, the description taken with the drawings makes apparent to those skilled in the art how embodiments of the disclosure may be practiced.
In the drawings:
FIG. 1 schematically shows a block diagram of an apparatus for checking a NIC object coherency in RDMA transactions, according to some embodiments of the present disclosure;
FIG. 2 schematically shows a layout of an object memory with an object version field, ObV, of 64 bits in the beginning of the header of the object (i.e. in the first cache line), and 16 bits of the LSB of the ObV, denoted as Vobj, in the beginning of each of the following cache lines, according to some embodiments of the present disclosure;
FIG. 3a schematically shows an object memory layout before removing the Vobj from the cache lines, according to some embodiments of the present disclosure;
FIG. 3b schematically shows an object memory layout after removing the Vobj from the cache lines, according to some embodiments of the present disclosure;
FIG. 4 is a schematic sequence diagram of an example for checking a NIC object coherency in RDMA transactions, when receiving a read request, by verifying the Vobj in each cache line is equal to the LSB of the object version number ObV in the header of the object, according to some embodiments of the present disclosure;
FIG. 5 is a schematic sequence diagram of an example for checking a NIC object coherency in RDMA transactions, when receiving a write request, by verifying the Vobj in each cache line is equal to the LSB of the object version number ObV in the header of the object, according to some embodiments of the present disclosure; FIG. 6 schematically shows a flowchart of a method for checking a NIC object coherency in RDMA transactions, when receiving a read request, by verifying the Vobj in each cache line is equal to the LSB of the ObV in the header of the object, according to some embodiments of the present disclosure;
FIG. 7 schematically shows a flowchart of a method for checking a NIC object coherency in RDMA transactions, when transmitting a read request, by inserting the LSB of the ObV in the header of the object into each cache line of the object, according to some embodiments of the present disclosure;
FIG. 8 schematically shows a layout of a memory, where an obj ect is kept in the memory and has a header field with a value calculated by a hash function over the object, according to some embodiments of the present disclosure;
FIG. 9 schematically shows a flowchart of a method for checking a NIC coherency in RDMA transactions, by a hash function, when receiving a read request, according to some embodiments of the present disclosure;
FIG. 10 schematically shows a sequence diagram of an example for checking a NIC object coherency in RDMA transactions, when receiving a read request, by a hash function, without locking the object first, according to some embodiments of the present disclosure;
FIG. 11 schematically shows a flow chart of an example of an RDMA write request, where the hash function is calculated in the receiver RNIC 122, according to some embodiments of the present disclosure;
FIG. 12 schematically shows a sequence diagram of an example for writing an object in RDMA transactions, with a hash function calculated at the receiver RNIC, according to some embodiments of the present disclosure;
FIG. 13 schematically shows a sequence diagram of an example for writing an object in RDMA transactions, with a hash function calculated at the transmitter RNIC, according to some embodiments of the present disclosure;
FIG. 14 schematically shows a flow chart of an example of an RDMA write request, where the hash function is calculated in the transmitter RNIC 112, according to some embodiments of the present disclosure; FIG. 15 schematically shows a flow chart of a method for checking a NIC coherency in RDMA transactions, by a hash function, when transmitting a read request, according to some embodiments of the present disclosure;
FIG. 16 schematically shows a flow chart of a method for sending a read before write request (RBW) which is a read request with a notification on a write request, which is expected to be sent after the read request, according to some embodiments of the present disclosure;
FIGs. 17a-17g schematically show an example for the method of the read before write (RBW) request, according to some embodiments of the present disclosure;
FIG. 18 schematically shows a flow chart of a method optimizing a flow operation of a compare and swap (CAS) operation and a read request to a single operation, in RDMA transactions, according to some embodiments of the present disclosure; and
FIG. 19 schematically shows a sequence diagram of an optimized compare and swap (CAS) and read operation, according to some embodiments of the present disclosure.
Detailed Description
The present disclosure, in some embodiments thereof relates to communication systems. More specifically, but not exclusively to methods and apparatuses for network interface card (NIC) object coherency (NOC) messages.
In multi-thread applications running on multi-core systems or running on distributed computing platforms, it occurs often that there are certain data structures, which are frequently read but relatively seldom change. An example for this would be a database server that has a list of databases that changes rarely, but needs to be consulted for every single query hitting the database, another example is a main memory distributed object store exploiting RDMA. In such situations, it is needed to guarantee extremely fast read access as well as protection against inconsistencies. The way to achieve this is to use lock-free read-only operations.
In distributed shared memory systems with RDMA, lock-free-read methods place a heavy load on compute resources and network resources. There are methods, which use Object coherency by inserting on every cache line the least significant bytes (LSB) of a version number and then checking the version number has not changed. However, this wastes a large amount of compute resources and network resources when reading a remote object. The way remote read is implemented in those methods is by issuing an RDMA read request over RC QP, and testing the LSB on each cache line with the LSB of version number, if test fails, wait timeout value and issue a request again, wasting CPU, bandwidth, and high latency.
The way remote write is implemented in these methods is by adding the LSB to each cache line in the remote side, then sending the object back to its original machine, wasting compute CPU and network latency.
Today RDMA does not support, a way to test the object coherency, and to strip the object least significant bytes (LSB) from each cache line on reading, or add LSB to each cache line on a remote or local NIC.
There is therefore a need to provide a device and a method for testing object coherency, which checks that the memory area is strictly serializable and coherent on one hand and reducing computing resources and network bandwidth and shorten latency of large packets on the other hand.
According to some embodiments of the present disclosure, devices and methods are presented for testing object coherency and reducing computing resources and network bandwidth and shorten latency of large packets by offloading object coherency manipulation to the NIC and by optimizing the object coherency flow.
Before explaining at least one embodiment of the disclosure in detail, it is to be understood that the disclosure is not necessarily limited in its application to the details of construction and the arrangement of the components and/or methods set forth in the following description and/or illustrated in the drawings and/or the Examples. The disclosure is capable of other embodiments or of being practiced or carried out in various ways.
The present disclosure may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure. The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
The computer readable program instructions may execute entirely on the user's computer and /or computerized device, partly on the user's computer and /or computerized device, as a stand-alone software package, partly on the user's computer (and /or computerized device) and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer and /or computerized device through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present disclosure.
Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical fiinction(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
Reference is now made to FIG. 1, which schematically shows a block diagram of an apparatus for checking a NIC object coherency in RDMA transactions, according to some embodiments of the present disclosure. Apparatus 100 includes a transmitter 110 and a receiver 120. The transmitter includes a memory 111, an RDMA NIC (RNIC) 112, which also includes a processor 113, and one or more applications 114. The receiver 120 includes a memory 121, an RNIC 122, which includes a processor 123, and one or more applications 124. Transmitter 110 sends a lock-free read request to receiver 120, in order to read an object from memory 121, which is a remote memory for transmitter 110. According to some embodiments of the present disclosure, the RNIC 122 verifies object coherency by using the least significant bytes (LSB) header version field in each and every cache line. When transmitter 110 requests to read the object from memory 121, the object has the LSB of the header version number appended to a fixed position on each and every object cache line. Then RNIC 122 checks by processor 123 that the header version is unlocked and in case it is unlocked the RNIC 122 extracts from each cache line the version number, Vobj, and verifies it matches the LSB of the object version field, ObV, of the header of the object. The verification is done by comparing the LSB of the ObV to the Vobj of each cache line.
Fig. 2 schematically shows a layout of an object memory with an object version field, ObV, of 64 bits in the beginning of the header of the object (i.e. in the first cache line), and 16 bits of the LSB of the ObV, denoted as Vobj, in the beginning of each of the following cache lines, according to some embodiments of the present disclosure. When a new object is created or when an object is updated, the ObV in the first cache line is incremented and the LSB of the ObV are added into a fixed position of each and every cache line spanning the object, then the memory 121 is updated. According to some embodiments of the present disclosure, when the object is read by a remote transmitter 110 or by a thread on the local receiver 120 a verification is made by looping over all the cache lines of the object and checking if the Vobj in each cache line is the same as the LSB of the ObV. When it is the same, the object is read and transmitted to the transmitter 110, without the Vobj in each cache line.
When transmitter 110 reads and object from receiver 120, it allocates a buffer into which the object is written without the Vobj in each cache line. When the RNIC writes the object into the buffer it skips the LSB fields and does not write it into the buffer. So, in the buffer the object without the LSB fields is received, i.e. without the Vobj in each cache line. In case the header version of the object is locked or in case the Vobj of one or more cache lines is not equal to the LSB of the ObV, the receiver retries to verify that header version is unlocked and that each Vobj matches the value in the version field in the header of the object for a predefined number of times. When there is no match the receiver 120 transmits a failure response to transmitter 110.
Fig. 3a schematically shows an object memory layout before removing the Vobj from the cache lines, according to some embodiments of the present disclosure. Fig. 3b schematically shows an object memory layout after removing the Vobj from the cache lines, according to some embodiments of the present disclosure.
According to some embodiments of the present disclosure, when transmitter 110 sends a write request to receiver 120, to write an object to memory 121, the RNIC 122 of receiver 120 extracts the value ObV in the version field of the header of the object and write the value into each cache line of the object and then writes the object to a memory address in memory 121. When the object is successfully written to memory 121, RNIC 122 transmits a success response to the transmitter 110.
FIG. 4 is a schematic sequence diagram of an example for checking a NIC object coherency in RDMA transactions, when receiving a read request, by verifying the Vobj in each cache line is equal to the LSB of the ObV in the header of the object, according to some embodiments of the present disclosure. At 401, a QP is created on the application 114 and sent to the RNIC 112 of the transmitter. At 402, a read request is sent from the application 114 to the RNIC 112, to read address A of length L in memory 121. In RDMA, When the QP is created at the transmitter a QP is created in parallel at the receiver, and the receiver RNIC 122 at 403 waits to receive a read request from the transmitter. At 404, the read request is transmitted from the RNIC 112 at the transmitter to the RNIC 122 at the receiver and the read request is received. At 405, when the read request is received a counter is set to zero, and a 406, the RNIC 122 checks if the counter is equals to a predefined number X. At 407, when the counter is different from X, the RNIC 122 extracts all the values Vobj from each cache line in the requested address and verifies it is equal to the value of the LSB of the object version ObV in the header of the object. When it is not equal, at 408, the RNIC 122 waits a predefined time interval T, and adds 1 to the counter. Then the RNIC 122 returns to 406, to tries to verify again. After X predefined times that the receiver RNIC 122 tries and fails, i.e. when the counter is equal to X, at 409, a failure response is transmitted to the transmitter, and the RNIC 122 returns to 403 and waits for s new read request. In case the values of the Vobj extracted from each cache line are equal to the LSB of the ObV in the header of the object, at 411, the Vobj values are removed from each cache line of the object, the object is read and sent to the transmitter without the Vobj in each cache line. Then the RNIC 122 returns to 403 and waits for a new read request.
FIG. 5 is a schematic sequence diagram of an example for checking a NIC object coherency in RDMA transactions, when receiving a write request, by verifying the Vobj in each cache line is equal to the LSB of the object version number ObV in the header of the object, according to some embodiments of the present disclosure. At 501, a QP is created on the application 114 and sent to the RNIC 112 of the transmitter. At 502, a write request is sent from the application 114 to the RNIC 112, to write an object to address A of length L in memory 121. The packets that are sent from the transmitter in the write request are without the value of the Vobj in the cache lines. In RDMA, when the QP is created at the transmitter a QP is created in parallel at the receiver, and the receiver RNIC 122 at 503 waits to receive a write request from the transmitter. At 504, the write request is transmitted from the RNIC 112 at the transmitter to the RNIC 122 at the receiver and the write request is received. When the write request is received, the RNIC 122 at 505, extracts the LSB value of the ObV in the header of the object and inserts the value into the beginning of each cache line. After the LSB value is inserted to each cache line, at 506, the object is written to the memory 121, and at 507, the RNIC 122 transmits a success response to the transmitter 110.
FIG. 6 schematically shows a flowchart of a method for checking a NIC object coherency in RDMA transactions, when receiving a read request, by verifying the Vobj in each cache line is equal to the LSB of the ObV in the header of the object, according to some embodiments of the present disclosure. At 601, a request to read an object from a memory address on memory 121 is received at the RNIC 122 of receiver 120. At 602, the RNIC 122, extracts a data version number Vobj, from each cache line of the object and verifies, that the Vobj of each cache line matches the LSB value of a version field of a header of the object ObV. At 603, when the Vobj in each cache line matches the value in the version field, the Vobj is removed from each cache line of the object, by the RNIC 122 and the data of the object is read, and the object is transmitted to the transmitter 110. And at 604 when the Vobj in each cache line does not match the LSB value in the version field, the RNIC 122 retries to verify that each Vobj matches the value in the LSB of the ObV in the header of the object a predefined number of times. When there is no match, a failure response is transmitted to the transmitter 110.
FIG. 7 schematically shows a flowchart of a method for checking a NIC object coherency in RDMA transactions, when transmitting a read request, by inserting the LSB of the ObV in the header of the object into each cache line of the object, according to some embodiments of the present disclosure. At 701, a request to read an object from a memory address is transmitted from a transmitter 110, by the RNIC 112 to receiver 120. When the read request is received at receiver 120, a version number, Vobj, is extracted by the receiver from each cache line of the object and the Vobj of each cache line is verified to match the LSB value of an object version number, ObV, field of a header of the object. At 702, the transmitter 110, receives the object from the receiver 120, when the Vobj in each cache line matches the LSB value of the ObV, with the Vobj removed from each cache line of the object by the receiver, i.e. the object is received in the transmitter 110 without the Vobj in each cache line. Otherwise, at 703, when the Vobj in each cache line does not match the LSB value of the ObV, the transmitter 110 receives a failure response from the receiver. In case a write request, which requests to write an object to an address in memory 121 is transmitted from transmitter 110 to receiver 120, the LSB value of the ObV is extracted by the receiver 120 and written into each cache line of the object. The object is written to the memory address in memory 121, and the transmitter 110 receives a success response from the receiver 120. According to some embodiments of the present disclosure, the LSB value of the version field in the header of the object is written into the beginning of each cache line.
According to some other embodiments of the present disclosure, a NIC coherency in RDMA transactions, when receiving a read or write request, may be checked by a hash function.
FIG. 8 schematically shows a layout of a memory, where an obj ect is kept in the memory and has a header field with a value calculated by a hash function over the object, according to some embodiments of the present disclosure. In some embodiments of the present disclosure, when creating or updating an object a hash value is calculated over the object and stored in a hash value header field. When the object is read by a transmitter (or any remote node) or a thread on a local node, a test is done by calculating the hash value on the object and the result is compared with the hash value header, if it is the same the object is serializable and coherent if not, it means, the object is being worked on by the application, and should be read again or discarded.
According to some embodiments of the present disclosure, any hash function that has good distribution may be used including Cyclic Redundancy Check -16 (CRC-16), Cyclic Redundancy Check -32 (CRC-32), Cyclic Redundancy Check -64 (CRC-64), Message digest algorithm 5 (MD5), and the like.
The hash value header field is chosen according to the hash function used, for example MD5 uses 128-bit header, CRC-64 uses 64-bit header, CRC-32 uses 32-bit header, and the like.
The hash function covers the entire content of the object and is calculated by the RNIC that writes the object into the memory. The object is coherent if the hash function calculated by reader matches the value in the hash field in the header of the object.
FIG. 9 schematically shows a flowchart of a method for checking a NIC coherency in RDMA transactions, by a hash function, when receiving a read request, according to some embodiments of the present disclosure. At 901, a read request is received in receiver 120, to read from a memory address in memory 121. At 902, the RNIC 122 at the receiver 120, extracts data from each packet of the object. At 903, the RNIC 122 calculates a hash function value for the data in each packet of the object and updates the calculated hash function value of the current packet at a temporary memory address in RNIC 122. At 904, when the RNIC gets to the last packet of the object, the RNIC 122 verifies that the updated hash function value matches a value in a hash value field in a header of the object. At 905, when the hash value matches the value in the hash value field in the header of the object, the RNIC 122 at 906, transmits the object to the transmitter 110 in response to the request to read the object. Alternatively, at 907, when the hash value calculated does not match to the value in the hash value field of the header of the object, the RNIC 122 retries for a predefined number of times to calculate the hash function value for the data in each packet. The RNIC 122 updates the calculated hash function value of the current packet at the temporary RNIC memory. And at the last packet, verifies that the updated hash function value matches the value in the hash value field of the header of the object. At 908, when there is no match, the RNIC 122 transmits a failure response to transmitter 110.
FIG. 10 schematically shows a sequence diagram of an example for checking a NIC object coherency in RDMA transactions, when receiving a read request, by a hash function, without locking the object first, according to some embodiments of the present disclosure. At 1001, a QP is created at the application 114 and sent to the RNIC 112 of the transmitter 110. At 1002, a read request is sent from the application 114 to the RDMA NIC 112, to read address A of length L in memory 121. When the QP is created at the transmitter a QP is created in parallel at the receiver 120, the QP in the receiver is created with a hash function that is calculated over the object stored in memory 121. The result of the hash function is stored at a hash value field in the header of the object, together with the header length. The receivers RDMA NIC 122 at 1003 waits to receive a read request from the transmitter 110. At 1004, the read request is transmitted from the RDMA NIC 112 at the transmitter 110 to the RDMA NIC 122 at the receiver 120 and the read request is received. At 1005, when the read request is received a counter is set to zero, and at 1006, the RDMA NIC 122 checks if the counter equals to a predefined number X. At 1007, when the counter is different from X, the RDMA NIC 122 extracts data from each packet of the object and calculates the hash function for each packet and updates the calculated hash function value of the current packet at a temporary memory address in RNIC 122. When the RNIC gets to the last packet of the object, the RNIC 122 verifies that the updated hash function value matches the value in the hash value field in a header of the object. When the updated hash function value stored in the temporary memory of the RNIC 122 is not equal to the value of the hash value field in the header of the object, at 1008, the RDMA NIC 122 waits a predefined time interval T, and adds 1 to the counter. Then the RDMA NIC 122 returns to 1006, to tries to verify again. After X predefined times that the receivers RDMA NIC 122 tries and fails, i.e. when the counter is equal to X, at 1009, a fail response is transmitted to the transmitter 110, and at 1010 the RDMA NIC 122 returns to 1003 and waits for a new read request. In case the updated hash function value stored in the temporary memory of the RNIC 122 is equal to the value of the hash value field in the header of the object, at 1011, the RNIC 122, transmits the object to the transmitter 110 in response to the request to read the object. The object is then sent at 1012, to the application 114. Then, at 1013, the RDMA NIC 122 returns to 1003 and waits for a new read request.
According to some other embodiments of the present disclosure, in a RDMA write request the hash function is calculated by the RNIC 122 in the receiver 120. FIG. 11 schematically shows a flow chart of an example of an RDMA write request, where the hash function is calculated in the receiver RNIC 122, according to some embodiments of the present disclosure. At 1101, an RDMA write request is sent from the transmitter 110 to the receiver
120. In the write request of the example of FIG. 11 the packets are divided to write first, write middle, write last and immediate data (for example as in InfiniBand (IB) / RDMA over converged Ethernet (RoCE) / RoCE version 2 (RoCEv2)) . The write request is received at the receiver RNIC 122, which at 1102 extracts data from all the packets of the object in the memory
121, and calculates the hash function over each packet of the object. The result of each calculation is stored on a temporary memory of the RNIC 122, e.g. in the QP context (QPC) and is updated with every calculation of a packet. When calculating the hash function for the last packet, at 1103, the result is updated, e.g. at the QPC, and the updated hash function is then updated at the hash value field at the header of the object. At 1104, the RNIC 122 writes to the object in the memory 121 and a success response is sent to the transmitter 110.
FIG. 12 schematically shows a sequence diagram of an example for writing an object in RDMA transactions, with a hash function calculated at the receiver RNIC, according to some embodiments of the present disclosure. At 1201, a QP is created at the application 114 and sent to the RNIC 112 of the transmitter 110. At 1202, a write request is sent from the application 114 to the RNIC 112, to write to an address A of length L in memory 121. When the QP is created at the transmitter a QP is created in parallel at the receiver 120, the QP in the receiver is created with a hash function that is calculated over the object stored in memory 121. The result of the hash function is stored at a hash value field in the header of the object, together with the header length. The receiver RNIC 122 at 1203 waits to receive a write request from the transmitter 110. At 1204, the write request is transmitted from the RNIC 112 at the transmitter 110 to the RNIC 122 at the receiver 120 and the write request is received. At 1205, when the write request is received the RNIC 122 extract data from all the packets of the object in the memory 121, and calculates the hash function over each packet of the object. The result of each calculation is stored on a temporary memory of the RNIC 122, e.g. in the QP context (QPC), and is updated with every calculation of a packet. When calculating the hash function for the last packet, the result is updated, e.g. at the QPC. At 1206, the RNIC 122 writes to the object in the memory 121 and at 1207 a success response is sent to the transmitter 110.
According to some other embodiments of the present disclosure, in case of an RDMA write request, the hash function may be calculated at the transmitter RNIC 112 instead of being calculated at the receiver RNIC 122. In that case, the receiver 120, receives from the transmitter 110 the request to write an object to an address memory in memory 121, together with the hash function value calculated by the transmitter RNIC 112 for data in each packet of the object. The receiver RNIC 122 writes the object to the memory 121 and transmits a success response to the transmitter 110.
According to some embodiments of the present disclosure, the hash function is calculated in the transmitter 110 instead of being calculated in the receiver 120. FIG. 13 schematically shows a sequence diagram of an example for writing an object in RDMA transactions, with a hash function calculated at the transmitter RNIC, according to some embodiments of the present disclosure. At 1301, a QP is created at the application 114 and sent to the RNIC 112 of the transmitter 110. The QP in the transmitter is created with a hash function that is calculated over the object stored in memory 111. At 1302, the result of the final value of the hash function is stored at a hash offset field in the header of the object, together with the header length. At 1303, a write request is sent from the application 114 to the RNIC 112, to write to an address A of length L in memory 121. At 1304, the hash function is calculated over the whole object and at 1305, the write request is sent from the transmitter RNIC 112 to the receiver RNIC 122. When the QP is created at the transmitter a QP is created in parallel at the receiver 120. The receiver RNIC 122 at 1306 waits to receive a write request from the transmitter 110. At 1307, when the write request is received the RNIC 122 writes the object to the memory 121 and inserts the hash function value calculated at the transmitter RNIC 112 into the hash field in the header of the object written to memory 121. At 1307 a success response is sent to the transmitter 110.
FIG. 14 schematically shows a flow chart of an example of an RDMA write request, over RoCEv2, where the hash function is calculated in the transmitter RNIC 112, according to some embodiments of the present disclosure. At 1401, an opcode of write and check sum is created. In this opcode, the destination virtual address (VA) of the last SGE, where the data has to be written to is determined instead of the source of the location from where to take the data. This is because the data, i.e. the hash function, in this case is stored at the QPC, which is calculated by the transmitter RNIC 112, and is therefore known to the RNIC 112. The RNIC 112 at 1402, reads the data from the memory 111, which are divided to scatter gather entries (SGE). At 1403, the RNIC 112 extracts the data from each SGE, calculates the hash function and stores the result data in a temporary memory inside QPC of the RNIC 112. At 1404, the RNIC 112, sends the packets that ware read by the RNIC 112 to the network as a write request, i.e. divided to write first, write middle, write last. In the last SGE, that is read by the RNIC 112, the destination VA is written, so the RNIC knows the destination, i.e. the address to write the object. In addition, at 1405, the last SGE is read by the RNIC 112, and at 1406, the final value of the hash function, which is stored at the QPC in the RNIC 112, is written to an additional packet “write only”, which is sent at 1407 to the network after the packets of the write first, write middle and write last. At 1408, when there is an immediate data packet it is sent after the “write only” packet.
Reference is now made to FIG. 15, which schematically shows a flow chart of a method for checking a NIC coherency in RDMA transactions, by a hash function, when transmitting a read request, according to some embodiments of the present disclosure. At 1501, the RNIC 112 of transmitter 110 sends a read request to receiver 120, to read an object from or o a memory address in memory 121 At 1502, the RNIC 122 of the receiver 120 extracts data from each packet of the obj ect and calculates a hash function value for the data in each packet of the obj ect. The value of the calculated hash function is stored at a QPC, using a temporary memory of the RNIC 122, where fore each packet, the hash function value is updated at the QPC and sent at a last packet, the RNIC 122 verifies the updated hash function value matches a value in a hash value field in a header of the object. At 1503, when the hash value matches the value in the hash value field in the header of the object, at 1504, the transmitter 110 receives the object from the receiver (i.e. the object that was read successfully). However, at 1505, when the hash value calculated does not match to the value in the hash value field of the header of the object, the transmitter 110, at 1506 receives a failure response. According to some embodiments of the present disclosure, the hash function may be calculated by the RNIC 112 of transmitter 110.
According to some embodiments of the present disclosure, since RDMA protocol defines specific read behavior the RNIC enables the receiver to return a special read response for failures detection (no payload returned). Acknowledgment extended transport header (AETH) syndrome field may be used for that. As a result, a transmitter that got the special read response or AETH syndrome field skips the packet sequence number (PSN) gap reserved for that specific read response as it would do if it would have got all the packets (PSNs) from the receiver. The transmitter reports the error using the completion queue to the application for that failure even if the read operation was not signaled for completion generation.
Reference is now made to FIG. 16, which schematically shows a flow chart of a method for sending a read before write request (RBW) which is a read request with a notification on a write request which is expected to be sent after the read request, according to some embodiments of the present disclosure. At 1601, RNIC 122 receives from one or more transmitters, a read before write (RBW) request to read an object from a memory address in memory 121, where packets of the RBW request contain a bit, which indicates a request to write to the object memory address is expected to be received from the one or more transmitters. At 1602, the RNIC 122 assigns in a table, a row with an identification, ID, of the one or more transmitters and a timestamp for the exact time it received request to read the object from the memory address of each transmitter. At 1603, a request to write an obj ect to the address memory is received from another transmitter, wherein the other transmitter has a row in the table for a request to read the object from the memory address with a timestamp smaller than the time stamp of the one or more transmitters requests. In this case, at 1604, according to some embodiments of the present disclosure, the RNIC 122 sends a notification to the one or more transmitters, according to the transmitter ID in the table, to avoid from sending the request to write to the memory address, as the content of the memory address changed. Then at 1605, the RNIC 122 removes the rows of the other transmitter and one or more transmitters ID and timestamp from the table, as the RBW request of the other transmitter is accomplished and the write request of the one or more transmitters is avoided. In some embodiments of the present disclosure, the one or more transmitters send a RBW request to read the object from the memory, where packets of the RBW request contain a bit, which indicates a request to write another object to the memory address is expected to be received from the one or more transmitters.
In some other embodiments of the present disclosure, after the one or more transmitters send a RBW request, the one or more transmitters may send a zero read request indicating that no request to write an object to the address memory is expected to be sent from the one or more transmitters. In this case, the receiver RNIC 122, receives from the one or more transmitters, the zero read request indicating that no request to write an object to the address memory is expected to be sent from the one or more transmitters, and, removes the row of the one or more transmitters IDs and timestamps from the table.
FIGs. 17a-17g schematically show an example for the method of the read before write (RBW) request, according to some embodiments of the present disclosure. FIG. 17a shows transmitters B, 1702, which sends to the RNIC 1705 of receiver 1704 a RBW request, to read from address 0x1234. According to some embodiments of the present disclosure, a read request also contains a bit indicating a write request is expected to be sent from transmitter B 1702 and when this bit is enabled the read request is a RBW request. The RNIC 1704 receives the request and assign in table 1706 a row for transmitter B 1702. In this row, the ID of transmitter B 1702 is stored, and also are stored, the time stamp 1111 of the time the RBW request from transmitter B 1702 was received and the address 0x1234 to which the read and write will be performed. FIG. 17b schematically shows, transmitter C 1703, which sends to the RNIC 1705 of receiver 1704 a RBW request to read from address 0x1234. The read request also contains a bit indicating a write request is expected to be sent from transmitter C 1703. The RNIC 1704 receives the RBW request and assign in table 1706 a row for transmitter C 1703. In this row, the ID of transmitter C 1702 is stored, and also are stored, the time stamp 1130 of the time the RBW request from transmitter C 1703 was received and the address 0x1234 to which the read and write will be performed. FIG. 17c schematically shows transmitter A 1701, which sends to the RNIC 1705 of receiver 1704 a request to read from address 0x1234. The read request does not contain any indication for a write request that is expected to be sent from transmitter A 1701 later. The RNIC 1704 receives the request, however, it is not assigned any row in table 1706.
FIG. 17d schematically shows, transmitter B 1702, which sends a write request to address 0x1234, as indicated in the RBW request sent by transmitter B 1702 earlier (as seen in FIG. 17a). FIG. 17e schematically shows that the row of transmitter B 1702 is removed, since the RBW request is accomplished, the write request of transmitter B 1702 is accomplished, and a write is performed on the content of address 0x1234. Since the content of address 0x1234 was changed, the RNIC 1705 notifies the rest of the transmitters in the table, that were expected to write to address 0x1234, i.e. transmitter C 1703 in this case, that the content of address 0x1234 was changed, and removes the row of transmitter C 1703 from table 1706. As a result of the notice, the write request expected from transmitter C 1703 is avoided. In FIG. 17f, transmitter A 1701 sends a write request to address 0x1234. However, since the content of this address was changed in the time between the read request from transmitter A 1701 to the write request from transmitter A 1701, the write request fails. FIG. 17g schematically shows the RNIC 1705 sends a failure notice to transmitter A 1701. From this example it can be seen that by using the request to read with the indication bit of a write request expected to be sent, the write request transmission, failure and failure notice transmission is avoided, in case the content of the address has been changed. This saves time and computational resources.
Reference is now made to FIG. 18, which schematically shows a flow chart of a method optimizing a flow operation of a compare and swap (CAS) operation and a read request to a single operation, in RDMA transactions, according to some embodiments of the present disclosure. In Two phase locking (2PL) protocol, it is needed to lock an object and then read its content. When the object is on a remote node, this is currently done with two single-sided RDMA requests: a CAS (to lock the object) request and then a read request. This takes two round-trips from local node L to remote node R. The optimized CAS and read operation is a new opcode CAS-and-READ that combines the two actions: namely, if CAS on address X was successful, READ N bytes from address Y. If not, return failure. At 1801, RNIC 122 receives from transmitter 110 a request for an optimized compare and swap and read (CAS and read) operation. The RNIC 122, at 1802, compares a content of a first memory address to a first value. At 1803, when the content of the first memory address is equal to the first value, the RNIC 122 replaces the content of the first memory address with a second value. At 1804, the RNIC 122 reads a content of a second memory address and at 1805 the RNIC 122 sends a success response and the content read from the second memory address to the transmitter 110. However, when the content is not equal to the first value at 1806, the RNIC 122 sends a failure response to the transmitter 110.
FIG. 19 schematically shows a sequence diagram of an optimized compare and swap (CAS) and read operation, according to some embodiments of the present disclosure. At 1901, a QP is created at the transmitter 110, and at the receiver 120. At 1902, the RNIC 122 in the receiver 120 waits for a CAS and read request to arrive from a transmitter. At 1903, a CAS and read operation is sent from application 114 to the RNIC 112 in the transmitter 110. The CAS and read operation includes the compare and swap destination address and a first value to which the content of the destination address is compared to, a second value to replace the content of the CAS destination address, a read destination address, and the read length. At 1904, the CAS and read request is transmitted from the RNIC 112 in the transmitter to the RNIC 122 in the receiver. At 1905, after the CAS and read request is received, the RNIC 122 goes to the CAS address and at 1906, the RNIC 122 compares the content of the CAS address to the first value. At 1907, when the content the CAS address is not equal to the first value the RNIC sends a failure response to the RNIC 112 in transmitter 110 and at 1908, the failure response is sent to the application 114. However, at 1905, when the content of the CAS address is equal to the first value, the RNIC 122 replaces the content of the CAS address with the second value. Then, at 1909, the RNIC goes to the second memory address (the read address) and reads the content of a second memory address. Then at 1910, the RNIC 122 checks if the read operation was successful. When the read fails and the content of the second address memory is not read successfully a failure response is sent to the RNIC 112 at 1911, and the failure response is sent from the RNIC 112 to the application 1 14, at 1911. However, when the read of the content of the second address memory is successful, at 1913, the RNIC 122 sends a success response with the content read from the second memory address to the transmitter 110. And at 1914, the success response is then sent to the application 114.
Other systems, methods, features, and advantages of the present disclosure will be or become apparent to one with skill in the art upon examination of the following drawings and detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description, be within the scope of the present disclosure, and be protected by the accompanying claims.
The descriptions of the various embodiments of the present disclosure have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
It is expected that during the life of a patent maturing from this application many relevant methods and apparatuses for checking a NIC object coherency and reducing computing resources and network bandwidth and shorten latency of large packets will be developed and the scope of the term methods and apparatuses for checking NIC object coherency and reducing computing resources and network bandwidth and shorten latency of large packets is intended to include all such new technologies a priori.
As used herein the term “about” refers to ± 10 %.
The terms "comprises", "comprising", "includes", "including", “having” and their conjugates mean "including but not limited to". This term encompasses the terms "consisting of' and "consisting essentially of'.
The phrase "consisting essentially of' means that the composition or method may include additional ingredients and/or steps, but only if the additional ingredients and/or steps do not materially alter the basic and novel characteristics of the claimed composition or method. As used herein, the singular form "a", "an" and "the" include plural references unless the context clearly dictates otherwise. For example, the term "a compound" or "at least one compound" may include a plurality of compounds, including mixtures thereof.
The word “exemplary” is used herein to mean “serving as an example, instance or illustration” Any embodiment described as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments and/or to exclude the incorporation of features from other embodiments.
The word “optionally” is used herein to mean “is provided in some embodiments and not provided in other embodiments”. Any particular embodiment of the disclosure may include a plurality of “optional” features unless such features conflict.
Throughout this application, various embodiments of this disclosure may be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the disclosure. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
Whenever a numerical range is indicated herein, it is meant to include any cited numeral (fractional or integral) within the indicated range. The phrases “ranging/ranges between” a first indicate number and a second indicate number and “ranging/ranges from” a first indicate number “to” a second indicate number are used herein interchangeably and are meant to include the first and second indicated numbers and all the fractional and integral numerals therebetween.
It is appreciated that certain features of the disclosure, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the disclosure, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as suitable in any other described embodiment of the disclosure. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments, unless the embodiment is inoperative without those elements.
All publications, patents and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present disclosure. To the extent that section headings are used, they should not be construed as necessarily limiting. In addition, any priority document(s) of this application is/are hereby incorporated herein by reference in its/their entirety.

Claims

Claims
1. A device for receiving a plurality of transactions comprising a Remote Direct Memory Access, RDMA, Network Interface Card, RNIC, configured to: receive from a transmitter a request to read an object from a memory address; check when a header version is unlocked and extract from each cache line a version number, Vobj, and verify that the Vobj matches a least significant bytes, LSB, of a version field of a header of the object ; when the Vobj in each cache line matches the value in the version field: remove the Vobj from each cache line of the object, read data of the object; transmit the object to the transmitter; and when the header version is locked or when the Vobj in each cache line does not match the value in the version field, retry to verify that header version is unlocked and that each Vobj matches the value in the version field in the header of the object for a predefined number of times and when there is no match transmit a failure response.
2. The RNIC of claim 1, further configured to: receive from a transmitter a request to write an object to the memory address; extract the value in the version field of the header of the object and write the value into each cache line of the object and write the object to the memory address; transmit a success response to the transmitter.
3. A device for receiving a plurality of transactions comprising a Remote Direct Memory Access, RDMA, Network Interface Card, RNIC, configured to: receive from a transmitter a request to read an object from a memory address; extract data from each packet of the object; calculate a hash function value for the data in each packet of the object and update the calculated hash function value of a current packet at a temporary RNIC memory address; at a last packet, verify that the updated hash function value matches a value in a hash value field in a header of the object; and when the hash value matches the value in the hash value field in the header of the object, transmit the obj ect to the transmitter; when the hash value calculated does not match to the value in the hash value field of the header of the object, retry for a predefined number of times to: calculate the hash function value for the data in each packet and update the calculated hash function value of the current packet at the temporary RNIC memory and at the last packet, verify that the updated hash function value matches the value in the hash value field of the header of the object and when there is no match, transmitting a failure response.
4. The RNIC of claim 3, further configured to: receive from a transmitter a request to write an object to the memory address; calculate the hash function value for the data in each packet of the object and update the calculated hash function value of the current packet at a temporary RNIC memory address and at a last packet, update the calculated hash function value in the hash value field in the header of the object; and write the object to the memory address; and transmit a success response.
5. The RNIC of claim 4, further configured to: receive from the transmitter the request to write an object to the address memory; wherein the hash function value is calculated by a transmitter RNIC for data in each packet of the object and the calculated hash function value of a current packet is updated at a temporary transmitter RNIC memory address and wherein at a last packet, the updated hash function value is written into the hash value field, in the header of the object; write the object to the address memory; and transmit a success response to the transmitter.
6. The device of claim 3, further configured to: receive from one or more transmitters a read before write, RBW, request to read an object from the memory address where packets of the RBW request contain a bit which indicates a request to write another object to the memory address is expected to be received from the one or more transmitters; assign in a table, a row with an identification, ID, of the one or more transmitters, a timestamp for the received RBW request and the address memory to read the object from of each transmitter; when a request to write an object to the address memory is received from another transmitter, wherein the other transmitter has a row in the table for a RBW request to read the object from the memory address with a timestamp smaller than the time stamp of the one or more transmitters requests: send a notification to the one or more transmitters according to the transmitter ID in the table, to avoid from sending the request to write to the memory address; and remove the rows of the other transmitter and of the one or more transmitters ID and timestamp from the table.
7. The device of claim 6, further configured to: send by one or more transmitters a RBW request to read the object from the memory address, where packets of the RBW request contain a bit which indicates a request to write another object to the memory address is expected to be received from the one or more transmitters.
8. The device of claim 7, further configured to: send a zero read request indicating that no request to write an object to the address memory is expected to be sent from the one or more transmitters.
9. The device of claim 7, further configured to: receive from the one or more transmitters, a zero read request indicating that no request to write an object to the address memory is expected to be sent from the one or more transmitters; and remove the row of the one or more transmitters IDs and timestamps from the table.
10. A device for transmitting a plurality of transactions comprising a Remote Direct Memory Access, RDMA, Network Interface Card, RNIC, configured to: transmit to a receiver a request to read an object from a memory address, wherein a version number, Vobj, is extracted by the receiver from each cache line of the object and the Vobj of each cache line is verified to match the LSB value in a version field of a header of the object; receive the object from the receiver, when the Vobj in each cache line matches the LSB value in the version field, with the Vobj removed from each cache line of the object by the receiver; or receive a failure response from the receiver, when the Vobj in each cache line does not match the LSB value in the version field.
11. The RNIC of claim 10, further configured to: transmit to a receiver a request to write an object to the memory address; wherein the LSB value in the version field of the header of the object is extracted by the receiver and written into each cache line of the object and the object is written to the memory address; receive a success response from the receiver.
12. A device for transmitting a plurality of transactions comprising a Remote Direct Memory Access, RDMA, Network Interface Card, RNIC, configured to: transmit to a receiver a request to read an object from a memory address; wherein data is extracted by the receiver from each packet of the object and a hash function value is calculated for the data in each packet of the object, and at a last packet, the updated hash function value is verified by the receiver to match a value in a hash value field in a header of the object; and when the hash value matches the value in the hash value field in the header of the object, receive the object from the receiver; and when the hash value calculated does not match to the value in the hash value field of the header of the object, receive a failure response.
13. The RNIC of claim 12, further configured to: transmit to a receiver a request to write an object to a memory address; wherein the hash function value is calculated by the receiver for the data in each packet of the object and the calculated hash function value of the current packet is updated by the receiver at a temporary receiver RNIC memory address and at a last packet, the calculated hash function value is updated in the hash value field in the header of the object; and receive a success response from the receiver after the object is written to the address memory by the receiver.
14. The RNIC of claim 13, further configured to: calculate the hash function value for the data in each packet of the object and update the calculated hash function value of a current packet at a temporary RNIC memory address; at a last packet, write the updated hash function value into the hash value field in the header of the obj ect; transmit the request to write the object to the receiver; and receive a success response from the receiver after the object is written to the address memory by the receiver.
15. A device for optimizing a flow operation of a compare and swap operation and a read request to a single operation, in Remote Direct Memory Access, RDMA, transactions, comprising a RDMA, Network Interface Card, RNIC, configured to: receive from a transmitter a request for an optimized compare and swap and read operation: compare a content of a first memory address to a first value; when the content of the first memory address is equal to the first value: replace the content of the first memory address with a second value; read a content of a second memory address; and send a success response with the content read from the second memory address to the transmitter; and when the content of the first memory address is not equal to the first value: send a failure response to the transmitter.
16. A method for receiving a plurality of transactions, comprising: at a Remote Direct Memory Access, RDMA, Network Interface Card, RNIC: receiving from a transmitter a request to read an object from a memory address; extracting data version number, Vobj, from each cache line of the object and verifying that the Vobj of each cache line matches a Least Significant Bytes, LSB, value in a version field of a header of the object; when the Vobj in each cache line matches the value in the version field: removing the Vobj from each cache line of the object, reading data of the object; transmitting the object to the transmitter; and when the Vobj in each cache line does not match the LSB value in the version field, retry verifying that each Vobj matches the value in the LSB version field in the header of the object a predefined number of times and when there is no match transmitting a failure response.
17. The method of claim 16, further comprising: receiving from a transmitter a request to write an object to the memory address; extracting the LSB value in the version field of the header of the object and writing the value into each cache line of the object and writing the object to the memory address; transmitting a success response to the transmitter.
18. A method for receiving a plurality of transactions comprising: at a Remote Direct Memory Access, RDMA, Network Interface Card, RNIC: receiving from a transmitter a request to read an object from a memory address; extracting data from each packet of the object; calculating a hash function value for the data in each packet of the object and updating the calculated hash function value of a current packet at a temporary RNIC memory address; at a last packet, verifying that the updated hash function value matches a value in a hash value field in a header of the object; and when the hash value matches the value in the hash value field in the header of the object transmitting the object to the transmitter; when the hash value calculated does not match to the value in the hash value field of the header of the object, retrying for a predefined number of times: calculating the hash function value for the data in each packet and updating the calculated hash function value of the current packet at the temporary RNIC memory and at the last packet, verifying that the updated hash function value matches the value in the hash value field of the header of the object and when there is no match transmitting a failure response.
19. The method of claim 18, further comprises to: receiving from a transmitter a request to write an object to the memory address; calculating the hash function value for the data in each packet of the object and update the calculated hash function value of the current packet at a temporary RNIC memory address and at a last packet, update the calculated hash function value in the hash value field in the header of the object; and writing the object to the memory address and transmitting a success response.
20. The method of claim 19, further comprising: calculating the hash function value at a transmitter RNIC for the data in each packet of the object and updating the calculated hash function value of a current packet at a temporary transmitter RNIC memory address; at a last packet, writing the updated hash function value into the hash value field in the header of the object; transmitting the request to write the object to the receiver; at the receiver RNIC, receiving the request to write the object writing the object to the memory address; and transmitting a success response to the transmitter.
21. A method for transmitting a plurality of transactions, comprising: at a Remote Direct Memory Access, RDMA, Network Interface Card, RNIC: transmitting to a receiver a request to read an object from a memory address, wherein a version number, Vobj, is extracted by a receiver RNIC from each cache line of the object and the Vobj of each cache line is verified by the receiver RNIC to match the LSB value in a version field of a header of the object; receiving the object from the receiver RNIC, when the Vobj in each cache line matches the LSB value in the version field, with the Vobj removed from each cache line of the object by the receiver; or receiving a failure response from the receiver RNIC, when the Vobj in each cache line does not match the LSB value in the version field.
22. A method for transmitting a plurality of transactions comprising: at a Remote Direct Memory Access, RDMA, Network Interface Card, RNIC: transmitting to a receiver RNIC a request to read an object from a memory address; wherein data is extracted by the receiver RNIC from each packet of the object and a hash function value is calculated for the data in each packet of the object, and at a last packet, the updated hash function value is verified by the receiver RNIC to match a value in a hash value field in a header of the object; and when the hash value matches the value in the hash value field in the header of the object, receiving the object from the receiver; and when the hash value calculated does not match to the value in the hash value field of the header of the object, receiving a failure response.
23. A method for optimizing a flow operation of a compare and swap operation and a read request to a single operation, in Remote Direct Memory Access, RDMA, transactions, comprising: at a receiver, when receiving from a transmitter a request for an optimized compare and swap and read operation: comparing a content of a first memory address to a first value; when the content of the first memory address is equal to the first value: replacing the content of the first memory address with a second value; reading a content of a second memory address; and sending a success response and the content read from the second memory address to the transmitter; and when the content is not equal to the first value: sending a failure response to the transmitter.
PCT/EP2022/051476 2022-01-24 2022-01-24 Methods and devices for network interface card (nic) object coherency (noc) messages WO2023138789A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2022/051476 WO2023138789A1 (en) 2022-01-24 2022-01-24 Methods and devices for network interface card (nic) object coherency (noc) messages

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2022/051476 WO2023138789A1 (en) 2022-01-24 2022-01-24 Methods and devices for network interface card (nic) object coherency (noc) messages

Publications (1)

Publication Number Publication Date
WO2023138789A1 true WO2023138789A1 (en) 2023-07-27

Family

ID=80168170

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2022/051476 WO2023138789A1 (en) 2022-01-24 2022-01-24 Methods and devices for network interface card (nic) object coherency (noc) messages

Country Status (1)

Country Link
WO (1) WO2023138789A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100312941A1 (en) * 2004-10-19 2010-12-09 Eliezer Aloni Network interface device with flow-oriented bus interface
US20130332557A1 (en) * 2012-06-12 2013-12-12 International Business Machines Corporation Redundancy and load balancing in remote direct memory access communications
WO2019118255A1 (en) * 2017-12-15 2019-06-20 Microsoft Technology Licensing, Llc Multi-path rdma transmission

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100312941A1 (en) * 2004-10-19 2010-12-09 Eliezer Aloni Network interface device with flow-oriented bus interface
US20130332557A1 (en) * 2012-06-12 2013-12-12 International Business Machines Corporation Redundancy and load balancing in remote direct memory access communications
WO2019118255A1 (en) * 2017-12-15 2019-06-20 Microsoft Technology Licensing, Llc Multi-path rdma transmission

Similar Documents

Publication Publication Date Title
US11809367B2 (en) Programmed input/output mode
US9836431B2 (en) Secure handle for intra-and inter-processor communications
US7889749B1 (en) Cut-through decode and reliability
US8301788B2 (en) Deterministic finite automata (DFA) instruction
US8225182B2 (en) Processing of block and transaction signatures
CA2483197C (en) System, method, and product for managing data transfers in a network
US6721806B2 (en) Remote direct memory access enabled network interface controller switchover and switchback support
US8392590B2 (en) Deterministic finite automata (DFA) processing
CN100383751C (en) System and method for dynamic mirror-bank addressing
US7089289B1 (en) Mechanisms for efficient message passing with copy avoidance in a distributed system using advanced network devices
US8265092B2 (en) Adaptive low latency receive queues
US8397025B2 (en) Apparatus and method for determining a cache line in an N-way set associative cache using hash functions
US10924591B2 (en) Low-latency link compression schemes
US20060184949A1 (en) Methods, systems, and storage mediums for timing work requests and completion processing
US20130325950A1 (en) Adaptive and dynamic replication management in cloud computing
KR20150077288A (en) A look-aside processor unit with internal and external access for multicore processors
US20060095606A1 (en) Method, system and storage medium for lockless InfiniBandTM Poll for I/O completion
JP2014532234A (en) System and method for supporting composite message headers in a transaction middleware machine environment
WO2023138789A1 (en) Methods and devices for network interface card (nic) object coherency (noc) messages
US20050149623A1 (en) Application and verb resource management
Taranov et al. Kafkadirect: Zero-copy data access for apache kafka over rdma networks
US11159656B2 (en) Methods and devices for generating a plurality of data packets
US20060168092A1 (en) Scsi buffer memory management with rdma atp mechanism
US20220138178A1 (en) Blockchain machine network acceleration engine
US20090271802A1 (en) Application and verb resource management

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22702227

Country of ref document: EP

Kind code of ref document: A1