WO2020001459A1 - Data processing method, remote direct memory access network card, and device - Google Patents
Data processing method, remote direct memory access network card, and device Download PDFInfo
- Publication number
- WO2020001459A1 WO2020001459A1 PCT/CN2019/092937 CN2019092937W WO2020001459A1 WO 2020001459 A1 WO2020001459 A1 WO 2020001459A1 CN 2019092937 W CN2019092937 W CN 2019092937W WO 2020001459 A1 WO2020001459 A1 WO 2020001459A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- rnic
- request
- processor
- rdma
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/20—Handling requests for interconnection or transfer for access to input/output bus
- G06F13/28—Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17306—Intercommunication techniques
- G06F15/17331—Distributed shared memory [DSM], e.g. remote direct memory access [RDMA]
Definitions
- the present application relates to the field of computer technology, and in particular, to a data processing method, a remote direct access storage network card, and a device.
- SCM Storage class memory
- NVM non-volatile memory
- the SCM can be embedded into the motherboard's slot like a dynamic random access memory. Compared to a dynamic random access memory, the SCM can still save data without interruption in the state of power loss, and has the characteristics of power-off saving. SCM can provide faster read and write speeds than flash memory, and is cheaper than dynamic random access memory.
- SCM is used as memory usage.
- multiple SCM-based computing devices are connected in an interconnected manner to form an SCM resource pool to expand the capacity of the SCM, thereby achieving redundant backup of data.
- any two computing devices can communicate and transfer data based on remote direct memory access (RDMA) technology.
- Remote direct memory access can be referred to as For remote direct fetch.
- RDMA technology can directly transfer data from the memory of one computing device to the memory of another computing device without the intervention of the operating systems or kernels of the two computing devices.
- SCM resource pool based on RDMA technology for communication and data transmission it is necessary to solve the problem of remote memory persistence.
- Memory persistence refers to writing data from the volatile storage medium of the computing device back to the non-volatile storage medium of the computing device; while remote memory persistence refers to the SCM of a computing device After writing the data to another computing device, the data is stored in the SCM of the other computing device.
- SCM storage system that uses SCM as a memory
- data in the SCM of computing device A (the computing device that initiates the remote operation request) is to be saved to the SCM of computing device B (the computing device that receives the remote operation request).
- the data requires After passing through the processor of the computing device B, and because there is a cache in the processor of the computing device B, the data may be temporarily cached in the cache, and there is a risk of losing power.
- computing device A generally writes data to computing device B through an RDMA write request, and then initiates an RDMA read request or RDMA send request to The data written by the RDMA write request is written back to the SCM of the computing device B.
- the RDMA read request or RDMA send request initiated by the computing device A can be regarded as a remote memory persistence request.
- This solution has the following problems: Since the computing device A must initiate a remote memory persistence request after each RDMA write request, the network load of the RDMA network is increased.
- This application provides a data processing method, a remote direct access memory card and a device, and solves the problem of a large network load caused by an additional remote memory persistence request.
- a data processing method including:
- the first remote direct memory card receives the remote direct memory access (RDMA) write request sent by the second RNIC.
- the RDMA write request includes the first data and data persistence. Flag, the RDMA write request is used to request the first data to be written to the first device, and the data persistence flag is used to indicate that the first data is data to be persisted, where the first RNIC is the RNIC of the first device, that is, The RNIC of the device receiving the RDMA request.
- the second RNIC is the RNIC of the second device, that is, the RNIC of the device sending the RDMA request.
- the first device and the second device communicate based on the RDMA method.
- the first RNIC determines The first data needs to be written to the first device.
- the first RNIC sends a direct memory access (DMA) write request to the first processor.
- the DMA write request includes the first data to instruct the first processor to send the first data to the first processor.
- a data is written into the first device, wherein the first processor is the processor of the first device, and the first RNIC and the first processor communicate in a DMA-based manner; according to the data Persistence flag.
- the first RNIC determines that the first data is data to be persisted, and the first RNIC instructs the first processor to save the first data to a non-volatile memory of the first device.
- the non-volatile memory may be an SCM.
- SCM can be phase-change random access memory (PRAM), PRAM can be 3D Xpoint, resistive random access memory (ReRAM), magnetic random access memory (magnetic random access memory) memory, MRAM), and so on.
- a device that receives an RDMA request can be called a remote device, and a device that sends an RDMA request can be called a local device.
- the RNIC of the remote device determines that the first data is to be persisted according to the data persistence flag in the RDMA write request, it directly instructs the processor of the remote device to save the first data to the first device.
- the processor of the remote device To complete the memory persistence of the first data in the non-volatile memory, without the RNIC of the local device initiating a remote memory persistence request. Because the local device does not need to initiate a remote memory persistence request after initiating an RDMA write request, the load on the RDMA network is reduced; in addition, the data persistence flag is carried in the RDMA write request to enable remote write operations and remote memory persistence operations It becomes an operation that is performed continuously to ensure that data can be saved to non-volatile memory after being written to the remote device to avoid data inconsistencies.
- the data persistence mark may be a write persistence instruction.
- the RDMA write request can also be called an RDMA write-persistence request, such as rdma write durable.
- the data persistence mark is a destination storage address corresponding to the first data, and a storage space corresponding to the destination storage address is used to store the first data, and the destination storage address is in the first device.
- the persistent storage address which is used to store the data to be persisted.
- the destination storage address corresponding to the first data is a storage address corresponding to a storage space for storing the first data in the first device designated by the second device.
- the remote device allocates storage space for storing data to be persisted in advance, and determines the destination storage address corresponding to the first data is the storage corresponding to the storage space allocated for storing data to be persisted in the remote device in advance In the case of an address, the RNIC of the remote device may determine that the first data in the RDMA write request is data to be persisted.
- the first RNIC may instruct the first processor to save the first data to the non-volatile memory of the first device in the following manner:
- the first RNIC receives the data corresponding to the second RNIC.
- a DMA least significant bit (LSB) read request is added to the queue (receive queue, SQ);
- the first RNIC sends the DMA LSB read request to the first processor, and the DMA LSB read request is used to instruct the first processor Write all data buffered in the peripheral bus link of the first device to the non-volatile memory of the first device.
- This method may be applicable to the case where the data does not pass through the last level cache (LLC) of the first processor during the data writing process of the first RNIC.
- LLC last level cache
- the data may not yet be written and cached in the cache of the processor's input / output (I / O) controller.
- PCIe peripheral component interconnect standard
- PCIe peripheral component interconnect standard
- the read operation corresponding to the LSB read request.
- the read operation can write the data that has not been written on the peripheral bus to the non-volatile memory, and then can save the first data that may also be cached on the peripheral bus. To non-volatile memory.
- the first RNIC may instruct the first processor to write all data buffered in the peripheral bus link of the first device to the first by sending another DMA read request to the processor.
- the above DMA read request may be a read request to read an arbitrary address in the SCM, and the address read by the read request may be any address in the SCM.
- the address is a starting address range used to store the first data. Start address, end address, or any address other than start address and end address.
- the address to be read by the read request may also be an address segment.
- the address segment is an address range for storing the first data, or any one of the storage ranges in the SCM.
- the first RNIC may instruct the first processor to save the first data to the non-volatile memory of the first device in the following manner: the first RNIC generates RDMA reception according to the first data (receive) After requesting a corresponding work queue entry (work queue entry, WQE), the WQE is cleared, and the RDMA receive request is used to receive an RDMA send request initiated by the second device; after clearing the WQE, the first RNIC Generate a completion queue entry (CQE) corresponding to the RDMA request, to instruct the first processor to store first data buffered in a volatile storage medium of the first processor to a non-volatile memory of the first device Volatile memory.
- the volatile storage medium of the first processor may be LLC.
- This method can be applicable to the case where the data passes through the LLC of the processor of the remote device during the data writing process of the RNIC of the remote device. Because the first data passes through the LLC of the first processor, by generating and clearing the WQE corresponding to the RDMA reception request and generating the CQE corresponding to the RDMA reception request according to the first data, when the CQE is obtained by the first processor, an interrupt can be generated and the The first processor is caused to write the data of the corresponding address back to the non-volatile memory. WQE and CQE are both generated based on the first data, which corresponds to the first data. The first data can be stored in the non-volatile memory by writing the data of the corresponding address back to the non-volatile memory.
- the first RNIC may also send the persistence corresponding to the first data to the second RNIC.
- a confirmation message includes a second packet sequence number (packet sequence number, PSN), and the second PSN is a sequence number of the first data.
- the persistence confirmation message is used to indicate that the first data memory is persistent. .
- the second RNIC can determine that the first data is stored in the non-volatile memory of the first device according to the second PSN.
- the first RNIC after the first RNIC receives the RDMA write request sent by the second RNIC, it sends a reception confirmation message corresponding to the first data to the second RNIC, and the reception confirmation message corresponding to the first data includes the first PSN
- the first PSN is a serial number of the first data
- the reception confirmation message is used to indicate that the first RNIC receives the first data.
- another data processing method including: the second RNIC receives an RDMA write persistence request from a second processor, the RDMA write persistence request includes a data persistence flag, and the RDMA write persistence request is used for The first data is requested to be stored in the non-volatile memory of the first device, and the data persistence flag is used to indicate that the first data is data to be persisted, where the second RNIC and the second processor are second The device's RNIC and processor, the second RNIC and the second processor are based on DMA communication, the second device is the device initiating the RDMA request, the first device is the device receiving the RDMA request, and the second device and the first device are based on RDMA Mode communication; the second RNIC generates an RDMA write request according to the RDMA write persistence request, the RDMA write request includes the first data and a data persistence flag; the second RNIC sends the RDMA write request to the first RNIC, and the RDMA write request is used Request to write the first data to the
- the device receiving the RDMA request may be called a remote device, and the device sending the RDMA request may be called a local device, that is, the first device is a remote device, and the second device is a local device.
- the RDMA write request is generated according to the RDMA write persistence request.
- the meaning of the RDMA write request itself can make the remote device's
- the RNIC writes the first data to the remote device.
- the data persistence mark in the RDMA write request can make the RNIC of the remote device know that the first data is data to be persisted, so the RNIC of the remote device can write the first data.
- the first data is stored in the non-volatile memory of the first device to complete the memory persistence of the first data.
- the local device does not need to send additional memory persistence requests, which reduces the load of the RDMA network.
- the data persistence mark is carried in the RDMA write.
- the remote write operation and the remote memory persistence operation are continuously performed operations, ensuring that data can be saved to the non-volatile memory after being written to the remote device, thereby avoiding the problem of data inconsistency.
- the foregoing data persistence is marked as a write persistence instruction.
- the RDMA request can also be called an RDMA write-persistence request, such as rdma write durable.
- the RNIC of the remote device can determine that the first data in the RDMA write request is to be persisted after parsing the RDMA write request to obtain the write persistence instruction. data.
- the data persistence mark is a destination virtual memory address corresponding to the first data
- the destination storage address corresponding to the first data is a persistent storage address in the first device
- the persistent storage address corresponds to Of storage space is used to store data to be persisted.
- the destination storage address corresponding to the first data refers to a storage address in the first device designated by the second device for storing the first data.
- the second RNIC after receiving the persistence confirmation message corresponding to the first data sent by the first RNIC, the second RNIC generates RDMA The CQE corresponding to the write persistence request.
- the persistence confirmation message corresponding to the first data is used to indicate the completion of the memory persistence of the first data.
- the CQE corresponding to the RDMA write persistence request is used to notify the second processor of the memory persistence completion. .
- This method can be applied to the case that the data does not pass through the LLC of the processor of the remote device during the process of writing data by the RNIC of the remote device.
- the second RNIC Upon receiving the persistence confirmation message sent by the first RNIC, the second RNIC generates a CQE corresponding to the RDMA write persistence request.
- the second processor obtains the CQE, the first number of memory persistences can be determined according to the CQE After the modification is completed, the first processor does not need to initiate a remote memory persistence request again, which reduces the load of the RDMA network.
- the second RNIC buffers the first data in a case where the second RNIC receives a reception confirmation corresponding to the first data sent by the first RNIC.
- a corresponding reception confirmation message, the reception confirmation message corresponding to the first data includes a first PSN, the first PSN is a serial number of the first data, and the reception confirmation message corresponding to the first data is used to instruct the first RNIC to receive the first data;
- the second RNIC receives the first confirmation message sent by the first RNIC, and the first confirmation message includes the second PSN.
- the second PSN is the same as the first PSN
- the second RNIC determines to receive the persistent confirmation corresponding to the first data.
- an acknowledgement (acknowledgement, ACK) message is only used to indicate the meaning of acknowledgement, and the ACK message is specifically used to indicate that the confirmation of what request needs to be based on the request sent before the acknowledgement message is received or the received ACK
- the order of the messages is judged.
- the first confirmation message carrying the serial number of the first data must be the confirmation of the reception of the first data.
- the RNIC of the second device does not need to wait for the persistent confirmation message of the first data to be sent to the first device RNIC.
- the memory persistence of one data and the write operation of the next data of the data can be performed in parallel, which improves the efficiency of data memory persistence and reduces the delay.
- the second RNIC after the second RNIC sends an RDMA write request to the first RNIC, upon receiving a reception confirmation message corresponding to the first data sent by the first RNIC, the second RNIC generates an RDMA write persistence
- the CQE corresponding to the request is received.
- the reception confirmation message corresponding to the first data is used to indicate that the first RNIC receives the first data.
- the CQE corresponding to the RDMA write persistence request is used to notify the second processor that the first data has been written to the first.
- the second RNIC receives the RDMA reception request sent by the second processor; in the case of receiving the persistence confirmation message corresponding to the first data, the second RNIC generates a CQE corresponding to the RDMA request and the persistence confirmation corresponding to the first data
- the message is used to indicate that the first number of memory persistence is completed, and the CQE corresponding to the RDMA receiving request is used to notify the second processor that the memory persistence of the first data is completed.
- This method can be applicable to the case where the data passes through the LLC of the processor of the remote device during the data writing process of the RNIC of the remote device.
- the CQE corresponding to the RDMA write persistence request is generated when the reception confirmation message corresponding to the first data is received, so that the second processor can write persistence according to the RDMA
- the CQE corresponding to the request initiates an RDMA reception request, eliminating the process of the RDMA sending request initiated by the second processor, and reducing the load of the RDMA network.
- the second RNIC may also buffer the reception confirmation message corresponding to the first data, and the reception confirmation message corresponding to the first data includes the first PSN, the first One PSN is the serial number of the first data; the second RNIC receives the first confirmation message sent by the first RNIC, and the first confirmation message includes the second PSN; when the second PSN is the same as the first PSN, the second RNIC It is determined that a persistence confirmation message corresponding to the first data is received.
- the second device can send the next data of the first data to the first device without waiting for receiving the persistent confirmation of the first data, and a data can be made by comparing the PSN
- the memory persistence and the next data write operation of the data are performed in parallel, improving the efficiency of data memory persistence and reducing the delay.
- another data processing method including: the second processor sends an RDMA write persistence request to the second RNIC, the RDMA write persistence request includes a data persistence flag, and the RDMA write persistence request is used for the request
- the first data is stored in the non-volatile memory of the first device, and the data persistence flag is used to indicate that the first data is data to be persisted, wherein the second RNIC and the second processor are the data of the second device, respectively.
- the RNIC and the processor, the second RNIC and the second processor communicate in a DMA-based manner
- the second device is a device initiating an RDMA request
- the first device is a device receiving an RDMA request
- the first device and the second device communicate in an RDMA
- the second processor determines that the first data has been stored in the non-volatile memory of the first device.
- the device that receives the RDMA request may be called a remote device, and the device that sends the RDMA request may be called a local device. That is, the first device is a remote device and the second device is a local device.
- This solution can be applied to the case that the data does not pass through the LLC of the processor of the remote device during the process of writing data by the RNIC of the remote device.
- the processor of the local device only needs to send an RDMA write request to write and store the first data in the remote device. It does not need to initiate a remote memory persistence request after initiating an RDMA write request, which reduces the RDMA.
- the load of the network; in addition, carrying the data persistence tag in an RDMA request makes the remote write operation and remote memory persistence operation be continuously performed operations, ensuring that data can be saved to non-volatile after writing to the remote device To avoid data inconsistency in the memory.
- the foregoing data persistence is marked as a write persistence instruction.
- the RNIC of the remote device can determine that the first data in the RDMA request is to be persisted after parsing the RDMA request to obtain the RDMA write persistence instruction. data.
- the data persistence mark is a destination storage address corresponding to the first data
- the destination storage address corresponding to the first data is a persistent storage address in the first device
- the persistent storage address corresponds to Storage space is used to store data to be persisted.
- the destination storage address corresponding to the first data refers to a storage address in the first device designated by the second device for storing the first data.
- another data processing method including: the second processor sends an RDMA write persistence request to the second RNIC, the RDMA write persistence request includes a data persistence flag, and the RDMA write persistence request is used for the request Storing first data in a non-volatile memory of the first device, and a data persistence flag is used to indicate that the first data is data to be persisted, wherein the second RNIC and the second processor are second The device's RNIC and processor, the second RNIC and the second processor communicate in a DMA-based manner, the second device is the device that initiates the RDMA request, the first device is the device that receives the RDMA request, and the first device and the second device are based on RDMA Mode communication; when the CQE corresponding to the RDMA write persistent request is acquired, the second processor sends an RDMA reception request to the second RNIC; when the CQE corresponding to the RDMA reception request sent by the second RNIC is acquired, the second The processor determines that the first data
- This solution can be applied to the case that the data does not pass through the LLC of the processor of the remote device during the data writing process of the RNIC of the remote device. Eliminates the need for the local device to send an RDMA transmission request, reducing the load on the RDMA network.
- the foregoing data persistence is marked as a write persistence instruction.
- the RNIC of the remote device can determine that the first data in the RDMA request is to be persisted after parsing the RDMA request to obtain the RDMA write persistence instruction. data.
- the data persistence mark is a destination storage address corresponding to the first data
- the destination storage address corresponding to the first data is a persistent storage address in the first device
- the persistent storage address corresponds to Storage space is used to store data to be persisted.
- the destination storage address corresponding to the first data refers to a storage address in the first device designated by the second device for storing the first data.
- an RNIC including various modules for executing the data processing method in the first aspect or any possible implementation manner of the first aspect.
- another RNIC including various modules for executing the data processing method in the second aspect or any one of the possible implementation manners of the second aspect.
- a processor is provided to execute part or all of the processes involved in the third aspect or the fourth aspect.
- a first device including a processor, a non-volatile memory, and an RNIC, and the RNIC is configured to execute the operation steps in the method flow described in the first aspect.
- a second device including a processor, a non-volatile memory, and an RNIC.
- the RNIC is configured to execute the operation steps in the method flow described in the second aspect
- the processor is configured to execute the first Operation steps in the method flow described in the third aspect or the fourth aspect.
- a computer-readable storage medium stores instructions that, when run on a computer, cause the computer to execute the methods described in the above aspects.
- a computer program product containing instructions, which when executed on a computer, causes the computer to perform the methods described in the above aspects.
- FIG. 1 is a schematic structural diagram of a computing device according to an embodiment of the present application.
- FIG. 2 is a schematic flowchart of remote memory persistence provided by an embodiment of the present application.
- FIG. 3 is a schematic flowchart of another remote memory persistence process provided by an embodiment of the present application.
- FIG. 4 is a schematic diagram of a computing device network that communicates through RDMA technology according to an embodiment of the present application
- FIG. 5 is a schematic diagram of a communication system composed of a local device and a remote device according to an embodiment of the present application
- FIG. 6 is a schematic structural diagram of an implementation manner of an RNIC according to an embodiment of the present application.
- FIG. 7 is another schematic structural diagram of an RNIC according to an embodiment of the present application.
- FIG. 8 is a schematic diagram of a channel between two computing devices according to an embodiment of the present application.
- FIG. 9 is a schematic flowchart of a data processing method according to an embodiment of the present application.
- FIG. 10 is a schematic flowchart of another data processing method according to an embodiment of the present application.
- FIG. 11 is a schematic flowchart of another data processing method according to an embodiment of the present application.
- FIG. 12 is a schematic structural diagram of a computing device according to an embodiment of the present application.
- FIG. 13 is a schematic structural diagram of another computing device according to an embodiment of the present application.
- FIG. 1 is a schematic diagram of a computing device.
- the computing device 10 includes an RNIC 101, a processor 102, and an SCM 103.
- the RNIC, the processor 102, and the SCM 103 are connected through a bus 104.
- the bus 104 includes, but is not limited to, a peripheral bus (such as a peripheral component interconnect). (PCI) bus, PCIe bus, etc.), system bus; the processor may be a central processing unit (central processing unit, CPU).
- the processor includes 102 including an integrated input / output controller (IIO), a last level cache (LLC), an integrated memory controller (iMC), and one or more processor cores (core).
- IIO is used to process messages that interact with peripherals such as RNICs.
- the messages can be PCI messages, PCIe messages, and so on.
- IIO, LLC, iMC, and processor cores can be connected through the chip's internal bus.
- the processor is a chip of an advanced reduced instruction set machine (ARMC) architecture
- IIO, LLC, iMC, and the processor core are advanced through High performance bus (advanced high performance bus, AHB) connection.
- AHB Advanced high performance bus
- Data written in RNIC to SCM can have two types of data as shown in Figure 1.
- the first type of data flow includes: 1 data is written to the processor's IIO by the RNIC, 2 the processor's IIO writes the data to the processor's iMC, and the processor's iMC caches the data in the asynchronous dynamic random storage of the iMC Take the memory refresh area (asynchronous DRAM refresh, ADR), 3 the processor's iMC writes the data to the SCM.
- the second type of data flow includes: 1 data is written to the processor's IIO by RNIC, 2 the processor's IIO writes the data to the processor's LLC, and 3 the data in the processor's LLC is flushed to the processor's In iMC, the processor's iMC caches the data in the iMC's ADR, and 4 the processor's iMC writes the data to the SCM. Because iMC's ADR has the feature of not losing after power failure, under normal circumstances, after data is written into the ADR, it can be considered as completing the memory persistence.
- the flow of data in the processor is determined by the characteristics of the processor architecture.
- the flow of data in the processor is related to the I / O data direct I / O (DDIO) function. If the DDIO function in IIO is enabled, the data is in the processor. The flow direction is the above-mentioned second data flow direction; if the DDIO function in IIO is not enabled, the data flow in the processor is the above-mentioned first data flow direction.
- the data flow in the processor is related to the DDIO function in IIO and the message sent by RNIC to IIO.
- the data flow in the processor is the first data flow direction described above; if the RNIC sends an NS flag in the message to IIO If the bit is not 1, and the DDIO function of the processor is enabled, the data flow in the processor is the second data flow direction described above.
- NS no-snoop
- a storage system formed by a computing device using SCM as a memory may include multiple computing devices using SCM as a memory.
- remote access between two computing devices is implemented by the RNIC of the computing device.
- Two computing devices can be divided into two roles, local device and remote device. Among them, "local” and “remote” are two opposite concepts. Local device refers to the computing device that initiates the RDMA request, that is, to request access. Computing device of another computing device.
- a remote device refers to a computing device that receives an RDMA request, that is, a computing device that is accessed by another computing device. The access of the local device to the remote device can be the writing of data from the local device to the remote device.
- the local device transmits the data in the local device to the RNIC of the remote device through the RNIC of the local device.
- the RNIC of the end device receives the data, thereby transmitting the data in the local device to the remote device.
- the access of the local device to the remote device may also be that the local device reads data from the remote device.
- the local device can read the data in the SCM of the remote device through the RNIC of the local device.
- the RNIC of the device sends the data to be read by the local device to the RNIC of the local device, and the RNIC of the local device receives the data to complete reading the data in the remote device.
- Remote memory persistence is implemented in two computing devices. For the above two different data flows, in some design schemes, there are different remote memory persistence processes.
- the processor of the local device initiates a first RDMA write request to the RNIC of the local device.
- the RNIC of the local device processes the first RDMA write request, and generates a second RDMA write request according to the first RDMA write request.
- S203 The RNIC of the local device sends a second RDMA write request to the RNIC of the remote device, and the RNIC of the remote device receives the second RDMA write request.
- the RNIC of the remote device writes the data in the second RDMA write request to the SCM in a DMA manner.
- the data flow direction of the data in the second RDMA data write request in the remote device is the first data flow direction. Because the data is transmitted on the bus, there is a certain delay. Due to the bus being occupied, some data may be The buffer is in the buffer of the intermediate medium such as IIO on the peripheral bus link.
- the RNIC of the remote device sends an ACK to the RNIC of the local device, and the RNIC of the local device receives the ACK.
- the RNIC of the local device generates a CQE corresponding to the first RDMA write request.
- S207 The processor of the local device obtains a CQE corresponding to the first RDMA write request, and determines that data writing is completed.
- the processor of the local device initiates a first RDMA read request to the RNIC of the local device.
- the RNIC of the local device processes the first RDMA read request, and generates a second RDMA read request according to the first RDMA read request.
- S210 The RNIC of the local device sends a second RDMA read request to the RNIC of the remote device, and the RNIC of the remote device receives the second RDMA read request.
- the RNIC of the remote device reads data from the SCM through DMA.
- peripheral bus protocol such as the PCI protocol and the PCIe protocol
- all write operations before the read operation need to be completed. Therefore, reading data from the SCM through DMA can cache the data.
- the data in the intermediate medium on the peripheral bus link such as IIO is written into the SCM.
- data in the first RDMA write request that may also be buffered in the intermediate medium may be written into the SCM.
- RNIC of the remote device The RNIC of the remote device sends an ACK to the RNIC of the local device, and the RNIC of the local device receives the ACK.
- the RNIC of the local device generates a CQE corresponding to the first RDMA read request.
- the processor of the local device obtains a CQE corresponding to the first RDMA read request, and determines that the data persistence is completed.
- the data passes through the LLC of the processor, and the remote memory persistence process is shown in Figure 3, including the following steps:
- the processor of the local device initiates a first RDMA write request to the RNIC of the local device.
- the RNIC of the local device processes the first RDMA write request, and generates a second RDMA write request according to the first RDMA write request.
- the RNIC of the local device sends a second RDMA write request to the RNIC of the remote device, and the RNIC of the remote device receives the second RDMA write request.
- the RNIC of the remote device writes the data in the second RDMA write request to the SCM through DMA.
- the data flow direction of the remote device is the second data flow direction. Since the data passes through the LLC, the data is buffered in the LLC during the process of writing the data to the SCM.
- the RNIC of the remote device sends an ACK to the RNIC of the local device, and the RNIC of the local device receives the ACK.
- the RNIC of the local device generates a CQE corresponding to the first RDMA write request.
- the processor of the local device obtains a CQE corresponding to the first RDMA write request, and determines that data writing is completed.
- the processor of the local device initiates a first RDMA transmission request to the RNIC of the local device, and the first RDMA transmission request carries a persistence flag.
- the RNIC of the local device processes the first RDMA transmission request, generates a second RDMA transmission request according to the first RDMA transmission request, and the second RDMA request carries a persistent mark and a destination virtual memory address of the data.
- S310 The RNIC of the local device sends a second RDMA sending request to the RNIC of the remote device.
- the processor of the remote device initiates a first RDMA reception request to the RNIC of the remote device.
- S312 The RNIC of the remote device generates a WQE corresponding to the first RDMA reception request.
- the RNIC of the remote device generates a CQE corresponding to the first RDMA reception request according to the second RDMA transmission request.
- S314 The RNIC of the remote device sends an ACK to the RNIC of the local device, and the RNIC of the local device receives the ACK.
- the RNIC of the local device generates a CQE corresponding to the first RDMA transmission request.
- the processor of the local device obtains a CQE corresponding to the first RDMA transmission request.
- the processor of the local device initiates a second RDMA reception request to the RNIC of the local device.
- S318 The RNIC of the local device generates a WQE corresponding to the second RDMA reception request.
- S319 The processor of the remote device obtains a CQE corresponding to the first RDMA reception request, and determines that data corresponding to the first RDMA reception request needs to be persisted.
- the processor of the remote device flashes the data of the corresponding address back to the AMC of the iMC.
- the processor of the remote device initiates a third RDMA transmission request to the RNIC, and the third RDMA request carries an indication of data flushing completion.
- the RNIC of the remote device processes the third RDMA transmission request, generates a fourth RDMA transmission request according to the third RDMA transmission request, and the fourth RDMA transmission request carries an indication of data flushing completion.
- the RNIC of the remote device sends a fourth RDMA transmission request to the RNIC of the local device, and the RNIC of the local device receives the fourth RDMA transmission request.
- the RNIC of the local device generates a CQE corresponding to the second RDMA reception request according to an indication of data flushing completion in the fourth RDMA transmission request.
- the processor of the local device obtains a CQE corresponding to the second RDMA receiving request, and determines that the data persistence is completed.
- the embodiments of the present application provide a data processing method, an RNIC, and a device to solve the problem of large network load in the remote memory persistence process shown in FIG. 2 and FIG. 3.
- the embodiments of the present application can be applied to a computer network that interconnects computing devices and communicates through RDMA technology.
- the computer network can be as shown in FIG. 4.
- the computing device and the computing device can be connected and communicated through a wired network (such as Ethernet). .
- this computer network there are two roles, local device and remote device.
- the embodiments of the present application can be specifically applied to the communication system formed by the local device and the remote device.
- the communication system formed by the local device and the remote device can be as shown in FIG. 5, and the structure of the local device and the remote device can refer to the computing device shown in FIG. 1.
- Each computing device includes an RNIC, a processor, and an SCM. Among them, The definition of the local device and the remote device can be seen in the foregoing description.
- the local device and the remote device perform data transmission with the peer computing device through their respective RNICs.
- the embodiments of the present application reduce the network load required by the remote memory persistence process by improving the structure of the RNIC and the remote memory persistence process.
- the RNIC of the embodiment of the present application further includes a persistent memory (PM) module.
- the functions implemented by the PM module can be The hardware circuit composed of the gate array can also be implemented by a software program running in the RNIC.
- the control method of the control component in RNIC is a microprogram control method with micro storage as the core
- the functions implemented by the PM module can be implemented by a software program running in the RNIC; the control method of the control component in the RNIC
- the functions implemented by the PM module can be implemented by a hardware circuit composed of a gate array.
- the PM module is used to perform operations related to memory persistence. Among them, the PM module can determine whether the data in the RDMA write request needs to be persisted according to the data and parameters carried in the received RDMA write request, and determine that the data in the RDMA write request needs to be persisted according to the parameters. In the case of data storage, the processor is directly instructed to write the data in the RDMA write request to the non-volatile memory of the computing device to complete the memory persistence of the data.
- the non-volatile memory may be an SCM.
- the SCM may specifically be PRAM, and PRAM may be, for example, 3D Xpoint, ReRAM, MRAM, and so on.
- FIG. 6 is a schematic structural diagram of an implementation manner of an RNIC 60 provided by an embodiment of the present application.
- the RNIC can be used as an RNIC of a remote device or an RNIC of a local device.
- the RNIC 60 may include a receiving module 601, a scheduling module 602, a sending module 603, and a persistent memory module 604.
- the receiving module 601 is configured to receive a message sent by an external computing device.
- the receiving module 601 may be configured to receive a request or data sent by the RNIC of the remote device.
- the receiving module 601 may be configured to execute the second interaction process between the second RNIC and the first RNIC in the method embodiment shown in FIG. 9 to FIG. 11. Receive operation performed by RNIC.
- the receiving module 601 may be configured to receive a request or data sent by the RNIC of the local device.
- the receiving module 601 may be used to execute the first interaction process between the first RNIC and the second RNIC in the method embodiment shown in FIG. 9 to FIG. 11. A receive operation performed by an RNIC.
- the sending module 603 is configured to send a message to an external computing device.
- the sending module 603 may be configured to send a request or data to the RNIC of the remote device.
- the sending module 603 may be used to execute the second RNIC in the interaction process between the second RNIC and the first RNIC in the method embodiment shown in FIG. 9 to FIG. 11. The send operation performed.
- the sending module 603 may be used to send a request or data to the RNIC of the local device.
- the sending module 603 may be used to execute the first interaction process between the second RNIC and the first RNIC in the method embodiment shown in FIG. 9 to FIG. 11. Send operation performed by RNIC.
- the scheduling module 602 is configured to communicate with a processor of a computing device and perform corresponding data processing.
- the scheduling module 602 is configured to communicate with a processor of the local device and perform data processing.
- the scheduling module 602 may be configured to perform an operation performed by the second RNIC in the method embodiment shown in FIG. 9 to FIG. 11 to interact with the second processor or Operations related to scheduling.
- the scheduling module 602 is configured to communicate with the processor of the remote device and perform data processing.
- the scheduling module 602 may be configured to perform operations performed by the first RNIC in the method embodiments shown in FIG. 9 to FIG. 11 to interact with the first processor. Or scheduling-related operations.
- the PM module 604 is used to perform operations related to memory persistence.
- the PM module 604 is configured to determine whether the first data in the RDMA write request is data to be persisted when the receiving module 601 receives the RDMA write request.
- an RDMA LSB read request is added to the SQ corresponding to the RDMA write request, so that the scheduling module 603 can send the RDMA LSB read request to the processing A processor, so that the processor writes the data buffered on the peripheral bus link into the non-volatile memory, thereby writing the first data into the non-volatile memory.
- the PM module 604 is configured to determine whether the first data in the RDMA write request is data to be persisted when the receiving module 601 receives the RDMA write request.
- a WQE corresponding to the RDMA reception request is generated according to the first data in the RDMA write request, and then the WQE is cleared, and then the RDMA reception is generated
- the corresponding CQE is requested, so that the processor corresponding to the RNIC 60 (that is, the processor of the remote device) can write the first data buffered in the volatile storage medium back to the non-volatile memory according to the CQE.
- the functions implemented by the receiving module 601, the scheduling module 602, the sending module 603, and the persistent memory module 604 may be performed by the operation logic component 701, the register 702, the control component 703, and the input / output interface 704.
- the operation logic component 701, the register 702, the control component 703, and the input / output interface 704 may be connected through one or more internal buses 705.
- the operation logic unit 701 can be used to execute operation commands, such as addition, subtraction, multiplication, division, and so on; the operation logic unit 701 can also be used to obtain logical commands such as OR logic commands, AND logic commands, and non-logic Commands, etc .; the logic operation unit 701 may be further configured to obtain a control signal from the control unit 703, obtain data corresponding to the control signal from the register 702 according to the obtained control signal, and perform a corresponding operation.
- Register 702 is a kind of memory with a small storage space. Register 702 can be used to store various instructions. Register 702 can also be used to store register operands and intermediate or final operation results temporarily stored during instruction execution.
- the control unit 703 is used to decode the instructions stored in the register and send out control signals to complete each operation to be performed by each instruction.
- the micro-program control method can be stored in the register 702, and the other is a hardware control method mainly based on a logical hard-wired structure.
- the control unit 703 can be composed of various AND-OR arrays, for example.
- the input-output interface 704 is used to send or receive data. There may be multiple input-output interfaces 704, which can be used to receive data sent by the processor or send data to the processor, or used to receive data or send data from an external computing device. Data to external computing devices.
- the RNIC may further include a crystal oscillator, a media access controller, a physical interface transceiver, and the like, which are not limited in the embodiments of the present application.
- RDMA unilateral operation means that when the application on the local device accesses the memory of the remote device, only the processor of the local device participates, and the processor of the remote device is not required to participate, that is, only one side of the processor is working. As long as the local device knows the source and destination addresses of the data, it can complete the reading and writing of data in the memory of the remote device.
- the RDMA unilateral operation mainly involves an RDMA write operation (RDMA-Write).
- RDMA bilateral operation means that when the application on the local device accesses the memory of the remote device, the processor of the local device and the processor of the remote device are required to participate, that is, the processors of both devices are working.
- the RDMA bilateral operations mainly involved include an RDMA send operation (RDMA-Send) and an RDMA receive operation (RDMA-Receive). If the local device is to transmit data to the memory of the remote device through the RDMA send operation, the remote device must first initiate an RDMA receive operation to receive the RDMA send operation initiated by the local device.
- the queue in the embodiment of the present application is similar to the concept of a message queue in socket communication. It can be understood that the queue is a container for storing various information or data for asynchronous processing.
- RDMA RDMA technology
- a channel connection is established between the RNICs of the two computing devices, and the end and end of each channel are two pairs of queues (QPs).
- the channel between the RNICs of the two computing devices may be as shown in FIG. 8.
- Each pair of QPs is composed of a send queue (SQ) and RQ.
- SQ and RQ manage various types of messages.
- the QP is directly mapped to the virtual address space of the application (client) of the computing device, so that the application in the computing device can directly access the RNIC through it.
- Both SQ and RQ can be called WQ.
- WQ For a computing device to send data, WQ is SQ; for a computing device to receive data, WQ is RQ.
- An application in a computing device can create a work request (WR) to notify a certain WQ in QP by using WR.
- WR describes the remote operation request of the application (such as remote read operation request, remote write operation request, etc.), so that The RNIC of the computing device can determine the operations to be scheduled and executed.
- WQ WQ
- WR is converted to WQE format, waiting for RNIC to schedule and analyze it.
- the application of computing device A wants to transfer the content stored in address A to address B (address A is the address in computer A, address B is the address in computer B), and the application uses address WR to address A and address B and the write instruction inform the RNIC of computer A.
- the RNIC of computer A adds WQE to the SQ.
- the WQE includes the address A, address B, and the write instruction.
- An application of a computing device may send an RDMA request as a WR to the RNIC of the computing device.
- the RNIC of the computing device After receiving the RDMA request, the RNIC of the computing device adds a WQE corresponding to the RDMA request to the WQ.
- the WQE corresponding to the RDMA requests of this transmission type is added to the SQ, as shown in Figure 8;
- the WQE corresponding to the RDMA request of the reception type is added to the RQ.
- CQ completion queue entry
- CQ completion queue entry
- CQ completion queue entry
- the application of computer A wants to transfer the content stored in address A to address B (address A is the address in computer A, address B is the address in computer B), and the RNIC of computer A sends the content of address A to Computer B ’s RNIC and determined that computer B ’s RNIC received the content, computer A ’s RNIC generates a CQE, and after computer A ’s application obtains the CQE, it determines that the content stored in address A is transferred to address B. .
- FIG. 9 is a schematic flowchart of a data processing method according to an embodiment of the present application.
- the first RNIC and the first processor are the RNIC and the processor of the first device, respectively, and the first device is the remote device.
- the second RNIC and the second processor are the RNIC and the processor of the second device, respectively, and the second device is a local device.
- the method includes:
- the second processor sends an RDMA write persistence request to the second RNIC.
- the second RNIC receives the RDMA write persistence request.
- the RDMA write persistence request includes a data persistence flag.
- the RDMA write persistence request can be understood as a WR created by the second processor and is used to describe the remote operation request of the second processor.
- the RDMA write persistence request is used to request that the first data be stored in the non-volatile memory of the first device to complete the memory persistence of the first data.
- the RDMA write persistence request may include an RDMA operation instruction, a source virtual memory address of the first data, and a destination virtual memory address of the first data.
- the source virtual memory address of the first data is a virtual memory address of the first data in the second device, and a storage space corresponding to the source virtual memory address is used to store the first data in the second device.
- the destination virtual memory address of the first data is a virtual memory address in the first device, and a storage space corresponding to the destination virtual memory address is used to store the first data after the first data is written to the first device.
- the destination virtual memory address is a virtual memory address registered by the first processor through a virtual memory address registration process. For example, the source virtual memory address and the destination virtual memory address are described.
- the first data is stored in a storage space corresponding to the virtual memory address A of the first device, and the second processor saves the first data to the first In the storage space corresponding to the virtual address memory address B of the two devices, the source virtual address of the first data is the virtual memory address A in the first device, and the destination virtual memory address of the second data is the virtual memory in the second device. Address B.
- the virtual memory address is a logical storage address that has a mapping relationship with a physical address in the memory of the computing device, and is used to implement isolation between programs in the computing device and ensure the normal operation of the program.
- a computing device after an application is compiled, it will form multiple subroutines.
- the addresses of these subroutines usually start with "0", and other addresses in the subroutine are relative to the starting address (that is, " 0 ”) is calculated, the address range formed by these addresses is called the address space, and the addresses in the address space are logical storage addresses; these addresses correspond to the addresses of the storage space of the computing device ’s memory storage space
- the resulting address range is called memory space, and the addresses in the memory space are physical addresses.
- the addresses of these subprograms need to be loaded from address "0". Because there is only one physical address of 0 in the computer, some subprograms cannot be loaded from "0". In this way, the logical storage address in the address space and the physical address in the memory space are inconsistent. For example, if subroutine A needs to be loaded from logical storage address 0, but actually loads from physical address 10, then Through the address mapping, the logical storage address in the address space is converted into the corresponding physical address in the memory space.
- the first processor allocates a piece of storage space in memory in advance. This storage space is used to store data related to RDMA operations, and then sends the target page table to the RNIC through the virtual address address registration process.
- the correspondence between the virtual memory address and the physical memory address corresponding to the storage space allows the RNIC to determine the virtual memory address and the physical memory address corresponding to the storage space used to store data related to the RDMA operation according to the target page table.
- the RNIC can also determine the physical memory address corresponding to a virtual memory address according to the target page table.
- the processor uses the storage space in the memory space with a physical address of 1001 to 3000 to store data related to the RDMA operation, and the corresponding virtual memory address is 1 to 2000.
- the processor will be as shown in Table 1.
- the page table shown is sent to RNIC.
- the processor In the process of registering the virtual memory address, in addition to sending the target page table to the RNIC, the processor also specifies the access mark of the registered address space.
- the access mark is used to indicate the attributes of the storage space corresponding to the address space, such as the address space.
- the attribute of the corresponding storage space is remote read, that is, the storage space is a storage space that can read data, and if the attribute of the corresponding storage space of the address space is remote writable, that is, the storage Space is a storage space where data can be written, and the storage space corresponding to the address space is remotely readable and writable, that is, the storage space is a storage space that can both read data and write data, etc. Wait.
- the processor sends the initial virtual memory address of the address space, the length of the address space, and the access mark of the address space to the RNIC, so as to inform the RNIC of the attributes of the storage space corresponding to each address space and its contents.
- Virtual memory address For example, for example, the processor uses the storage space in the memory space with a physical address of 1001 to 3000 to store data related to the RDMA operation, and the corresponding virtual memory address is 1 to 2000.
- a storage space corresponding to an address space with a memory address of 1 to 500 is designated as a readable storage space
- a storage space corresponding to an address space of 501 to 1000 is designated as a writable storage space
- a storage space corresponding to an address space of 1001 to 1500 is designated
- the space is designated as a readable and writable storage space
- the processor may send the information shown in Table 2 to the RNIC.
- the first device may send the related information of the address space registered during the virtual memory address registration process to the second device through an RDMA sending operation, so that the second device can learn the use of the first device.
- the related information of the address space includes a starting virtual memory address of the registered address space, a length of the address space, and an access mark.
- the data persistence flag is used to indicate that the first data is data to be persisted.
- data persistence marking can have the following two possible situations:
- the RNIC of the computing device performs the corresponding operation according to the RDMA operation instruction obtained by the analysis.
- a write persistence instruction may be added to the RDMA operation instruction, and the write persistence instruction indicates that the first data needs to be written and stored in memory Persistent data, that is, data persistence is marked as a write persistence instruction.
- the RDMA write persistence request may include a write persistence instruction, a source virtual memory address of the first data, and a destination virtual memory address of the first data.
- the data persistence flag may also be a destination storage address corresponding to the first data, where a storage space corresponding to the destination storage address is used to store the first data,
- the destination storage address is a persistent storage address in the first device, and a storage space corresponding to the persistent storage address is used to store data to be persisted.
- the processor of the first device specifies the address space corresponding to the storage space for storing data related to the RDMA operation and the access mark of the address space through the virtual memory address registration process, and performs virtual address registration After the process, the first device sends the information registered in the virtual memory address registration process to the second device through an RDMA sending operation.
- a remote write persistence mark can be added to the access mark.
- the remote write persistence mark indicates that the storage space corresponding to the address space is used to store data to be persisted.
- the destination storage address may be a destination virtual memory address, and the destination virtual memory address is a logical storage address in an address space where an access mark registered by the first processor through the virtual memory address registration process is a write persistence mark.
- the storage space corresponding to the address space marked by the write-persistent mark is used for storing the data to be persisted.
- the RDMA write persistence request may include a write instruction, a source virtual memory address of the first data, and a destination virtual memory address of the first data.
- the memory address is a logical storage address in an address space registered as a write-persistent mark for an access mark registered by the first processor through a virtual memory address registration process.
- the second RNIC generates an RDMA write request according to the RDMA write persistence request.
- the RDMA write request includes the first data and a data persistence mark.
- the second RNIC may create a WQE corresponding to the RDMA write persistence request in the SQ, and the WQE corresponding to the RDMA write persistence request may include a source virtual memory address of the first data and an RDMA operation instruction. And the destination virtual memory address of the first data; then when dispatching to the WQE corresponding to the RDMA write persistence request, obtaining the first data from the source virtual memory address according to the source virtual memory address of the first data, The data, the destination virtual memory address of the first data, and the RDMA operation instruction are encapsulated in an RDMA transmission message to form an RDMA write request.
- the RDMA transmission message refers to a message transmitted between the first RNIC and the second RNIC.
- the operation of generating a RDMA write request by the second RNIC according to the RDMA write persistence request may be specifically performed by a scheduling module of the second RNIC.
- the RDMA write request may include a write persistence instruction, a destination virtual memory address of the first data, and the first data, and the RDMA write request may also be referred to as a write persistence request; If the data persistence is marked as the destination virtual memory address of the first data, the RDMA write request may include a write instruction, the destination virtual memory address of the first data, and the first data, and the destination virtual memory address of the first data is the first The persistent virtual memory address that the processor registers with the virtual memory address.
- the RDMA write request may further include a serial number of the first data and a QP serial number of the first device.
- the serial number of the first data packet is used to uniquely identify the first data during the transmission between the first device and the second device, which is convenient for detecting missing or duplicate data packets;
- the QP serial number of the remote device is used to identify the local device And the only channel between the remote device.
- S803 The second RNIC sends an RDMA write request to the first RNIC, and the first RNIC receives the RDMA write request.
- the second RNIC may send an RDMA write request to the first RNIC based on an InfiniBand (IB) protocol; the second RNIC may also be based on a converged Ethernet remote direct memory access (RDMA over Converged Ethernet, RoCE)
- the protocol sends the RDMA write request to the first RNIC; the second RNIC may also send the RDMA write request to the first RNIC based on the remote direct memory access (iWARP) protocol of the transmission control protocol.
- IB InfiniBand
- RoCE converged Ethernet remote direct memory access
- the second RNIC sends the RDMA write request to the first RNIC through the sending module of the second RNIC, and the first RNIC receives the RDMA write request through the receiving module of the first RNIC.
- the receiving module of the first RNIC After receiving the RDMA write request, the receiving module of the first RNIC puts the RDMA write request into the RQ corresponding to the first RNIC, and waits for the scheduling module of the first RNIC to schedule it.
- the scheduling module of the first RNIC dispatches the RDMA write request
- the scheduling module of the first RNIC parses the RDMA write request to obtain first data, a destination virtual memory address of the first data, and an RDMA operation instruction.
- the scheduling module of the first RNIC may send the write persistence instruction to the PM module of the first RNIC, and the PM module of the first RNIC determines the first data according to the write persistence instruction.
- step S805 is performed; the scheduling module of the first RNIC determines that the first data needs to be written to the destination virtual memory address according to the destination virtual memory address of the first data and the write persistence instruction.
- the scheduling module of the RNIC executes step S804.
- the scheduling module of the first RNIC may send the destination virtual memory address of the first data to the PM module, because the destination virtual memory address of the first data is the first
- the access token registered by a processor through the virtual memory address registration process is a logical storage address in the address space of the remote write persistent tag.
- the PM module of the first RNIC determines that the first data is data to be persisted, and executes step S805; first The scheduling module of the RNIC determines that the first data needs to be written into the destination virtual memory address according to the destination virtual memory address and the write instruction of the first data, and the scheduling module of the first RNIC executes step S804.
- the first RNIC sends a DMA write request to the first processor.
- the first processor receives the DMA write request, and the DMA write request includes the first data.
- the operation that the first RNIC sends a DMA write request to the first processor is performed by a scheduling module that can be the first RNIC.
- the first RNIC sends a DMA write request to the first processor
- the first data is written into the non-volatile memory of the first device in a DMA manner.
- the first RNIC instructs the first processor to save the first data to a non-volatile memory of the first device.
- the RNIC of the remote device receives the RDMA write request, it is determined that the data in the RDMA write request is data for memory persistence according to the data persistence mark in the RDMA write request.
- Directly instructs the first process to save the first data to the non-volatile memory of the first device, which is equivalent to combining the RDMA write request and the RDMA persistence request into one request, eliminating the need for the remote device to initiate RDMA
- the operation of persistent requests because each request requires a certain amount of network bandwidth, eliminating the need for the remote device to initiate an RDMA persistent request operation, which saves the bandwidth occupied by a request and reduces the load on the RDMA network.
- Write requests and RDMA persistence requests are combined into one request, which is equivalent to turning data write operations and memory persistence operations into two operations that need to be performed continuously, ensuring that data can be saved to the remote end after it is written to the remote device
- the non-volatile memory of the device avoids data inconsistencies.
- the data received by an RNIC is written into the non-volatile memory of the computing device corresponding to the RNIC, and the data passes through the processor corresponding to the RNIC. Due to the bus being occupied and other reasons, the data may be Temporarily cached in all levels of the processor's cache. Due to different architecture characteristics of the processor, there are two types of data flow in the processor, that is, the first data has two different data flows in the first processor. In the embodiment of the present application, in combination with different data flows described in FIG. 1, the first RNIC instructs the first processor to save the first data to the non-volatile memory of the first device in different ways. The flow of two different data flow corresponding data processing methods will be specifically introduced. See Figures 10-11.
- FIG. 10 is a schematic flowchart of another data processing method according to an embodiment of the present application. The process is applicable to a case where the data flow is the first data flow direction described above, where the first RNIC and the first processor are the first device, respectively.
- the first device is a remote device; the second RNIC and the second processor are the RNIC and the processor of the second device, and the second device is a local device.
- the method includes the following processes:
- the second processor sends an RDMA write persistence request to the second RNIC.
- the second RNIC receives the RDMA write persistence request.
- the RDMA write persistence request includes a data persistence flag.
- the second RNIC generates an RDMA write request according to the RDMA write persistence request.
- the RDMA write request includes the first data and a data persistence mark.
- the second RNIC sends an RDMA write request to the first RNIC, and the first RNIC receives the RDMA write request.
- the first RNIC sends a DMA write request to the first processor.
- the first processor receives the DMA write request, and the DMA write request includes the first data.
- the PM module of the first RNIC determines that the first data is data to be persisted according to the persistence flag, and the PM module of the first RNIC performs step S905.
- the first RNIC adds a DMA LSB read request to the RQ corresponding to the second RNIC.
- the PM module of the first RNIC may determine the RQ corresponding to the second RNIC according to the QP of the first device in the RDMA write request, and then add a DMA LSB read request to the RQ.
- a DMA LSB read request is a request corresponding to a DMA read operation
- the virtual memory address corresponding to the DMA LSB read request may be a logical storage address in any segment of the address space registered by the first processor through the virtual memory address registration process.
- the virtual memory address corresponding to the DMA LSB read request refers to the virtual memory address corresponding to the storage space to be read by the DMA LSB read operation corresponding to the DMA LSB read request.
- the first RNIC may also instruct the first processor to send another DMA read request to the first processor. All data buffered in the peripheral bus link of the first device is written to the non-volatile memory of the first device.
- the above DMA read request may be a read request to read an arbitrary address in the SCM, and the address read by the read request may be any address in the SCM.
- the address is a starting address range used to store the first data. Start address, end address, or any address other than start address and end address.
- the address to be read by the read request may also be an address segment.
- the address segment is an address range for storing the first data, or any one of the storage ranges in the SCM.
- the scheduling module of the first RNIC executes step S906.
- the first RNIC sends a DMA LSB read request to the first processor, and the first processor receives the DMA LSB read request.
- the first RNIC sets the NS flag in the DMA LSB read request. Is 1.
- the first RNIC may notify the second RNIC of the completion of the memory persistence of the first data by sending a confirmation message.
- the method shown in FIG. 10 may further include:
- the first RNIC sends a persistence confirmation message corresponding to the first data to the second RNIC, and the second RNIC receives the persistence confirmation message corresponding to the first data.
- the persistence confirmation message corresponding to the first data includes a second PSN, and the second PSN is a sequence number of the first data.
- the first RNIC sends a persistence confirmation message corresponding to the first data to the second RNIC through the sending module of the first RNIC, and the second RNIC receives the persistence confirmation corresponding to the first data through the receiving module of the second RNIC. Message.
- the scheduling module of the first RNIC obtains the serial number of the first data from the RDMA write request, and as the second PSN, encapsulates the second PSN into the RDMA transmission message to form a confirmation message, and sends it to the sending module of the first RNIC to Second RNIC. After receiving the confirmation message, the receiving module of the second RNIC sends the confirmation message to the scheduling module of the second RNIC.
- the scheduling module of the second RNIC parses the confirmation message to obtain a second PSN, and determines the confirmation message according to the second PSN. A persistent confirmation message corresponding to the first data.
- the second RNIC generates a CQE corresponding to the RDMA write persistence request.
- the scheduling module of the second RNIC determines that the first data has been stored in the non-volatile memory of the first device according to the persistence confirmation message corresponding to the first data, and the scheduling module of the second RNIC determines the first according to the second PSN WQE corresponding to the data, obtain the content in the WQE, and generate the CQE corresponding to the RDMA write persistence request.
- the CQE corresponding to the RDMA write persistence request may include the source virtual memory address of the first data and / or the second PSN and / or the destination. Virtual memory address.
- the second processor obtains a CQE corresponding to the RDMA write persistence request, and determines that the first data has been stored in the non-volatile memory of the first device.
- the second processor may determine that the first data has been stored in the non-volatile memory of the first device according to the content in the CQE.
- the content of the CQE is the source virtual address of the first data
- the second processor determines that the data stored in the source virtual memory address has been stored in the non-volatile memory of the first device, that is, the first data has been stored in In the non-volatile memory of the first device.
- the memory persistence of the first data is completed.
- the embodiment of the present application only needs to initiate an RDMA write request to the RDMA write request in the embodiment of the present application.
- the data is written to the non-volatile memory of the remote device.
- one unilateral operation is omitted, the load of the processor of the local device is reduced, and the initiation is saved.
- the bandwidth occupied by RDMA persistent requests reduces the network load.
- FIG. 11 is a schematic flowchart of another data processing method according to an embodiment of the present application. The process is applicable to a case where the data flow is the second data flow direction, where the first RNIC and the first processor are the first device, respectively.
- the first device is a remote device; the second RNIC and the second processor are the RNIC and the processor of the second device, and the second device is a local device.
- the method includes the following processes:
- the second processor sends an RDMA write persistence request to the second RNIC.
- the second RNIC receives the RDMA write persistence request.
- the RDMA write persistence request includes a data persistence flag.
- the second RNIC generates an RDMA write request according to the RDMA write persistence request.
- the RDMA write request includes the first data and a data persistence mark.
- the second RNIC sends an RDMA write request to the first RNIC, and the first RNIC receives the RDMA write request.
- the first RNIC sends a DMA write request to the first processor.
- the first processor receives the DMA write request, and the DMA write request includes the first data.
- steps S1001 to S1004 After the PM module of the first RNIC determines that the first data is data to be persisted according to the data persistence flag, steps S1006 to S1007 are performed.
- the first processor sends a first RDMA reception request to the first RNIC, and the first RNIC receives the first RDMA reception request.
- the first processor is required to write the data in the LLC back to the non-volatile memory, that is, the first processor is required to participate, and the first processor is a remote device Processor, writing data from LLC back to non-volatile memory involves RDMA bilateral operations.
- the remote device For the RDMA bilateral operation, the remote device must first initiate an RDMA receive operation, then the first processor sends a first RDMA receive request to the second RNIC for receiving the RDMA send request sent by the second device.
- the second RNIC generates a WQE corresponding to the first RDMA reception request according to the first data.
- the WQE corresponding to the first RDMA reception request may include a destination virtual memory address of the first data; in another possible implementation manner, the CQE corresponding to the first RDMA reception request may include The serial number of the first data; in another possible implementation manner, the CQE corresponding to the first RDMA reception request may include a destination virtual memory address of the first data and a serial number of the first data.
- the second RNIC clears the WQE corresponding to the first RDMA reception request, and generates a CQE corresponding to the first RDMA reception request.
- the PM module of the second RNIC may obtain the WQE content from the WQE corresponding to the first RDMA reception request, and then generate the CQE corresponding to the first RDMA reception request according to the acquired WQE content.
- the CQE corresponding to the first RDMA reception request may include a destination virtual memory address of the first data and / or a serial number of the first data.
- steps S1006 to S1007 the operation performed by the PM module of the second RNIC replaces the operation of the RDMA transmission request initiated by the second device, and the effect achieved is the same as that of the second device by initiating the RDMA transmission request to instruct the first data to be stored in The effect in the non-volatile memory of the first device is the same.
- the first processor obtains a CQE corresponding to the first RDMA reception request, and stores the first data in a non-volatile memory of the first device.
- the first processor may find the first data in the LLC according to the destination virtual memory address of the first data, thereby converting the first data Data is stored in the non-volatile memory of the first device; in the case that the CQE corresponding to the first RDMA reception request includes the serial number of the first data, the first processor may be in the LLC according to the serial number of the first data The first data is found, thereby storing the first data in a non-volatile memory of the first device.
- the first RNIC may send a confirmation message to the second RNIC to inform the second RNIC that the first data is received, and after step S1003, the method further includes:
- the first RNIC sends a reception confirmation message corresponding to the first data to the second RNIC, and the second RNIC receives a reception confirmation message corresponding to the first data.
- the reception confirmation message corresponding to the first data includes a first PSN, the first PSN is a serial number of the first data, and the reception confirmation message corresponding to the first data is used to indicate that the first RNIC receives the first data.
- the first RNIC sends a reception confirmation message corresponding to the first data to the second RNIC through the sending module of the first RNIC, and the second RNIC receives the reception confirmation message corresponding to the first data through the receiving module of the second RNIC.
- the scheduling module of the first RNIC obtains the serial number of the first data from the RDMA write request.
- the first PSN is encapsulated into an RDMA transmission message to form a confirmation message, and is sent to the first RNIC sending module to Second RNIC.
- the receiving module of the second RNIC sends the confirmation message to the scheduling module of the second RNIC.
- the scheduling module of the second RNIC parses the confirmation message to obtain the first PSN, and the scheduling module of the second RNIC according to the first A PSN determines that the confirmation message is a reception confirmation message corresponding to the first data.
- the second RNIC generates a CQE corresponding to the RDMA write persistence request.
- the second processor obtains a CQE corresponding to the RDMA write persistence request, and determines that the first data transmission is completed.
- steps S1010 to S101 For specific implementations of steps S1010 to S1011, reference may be made to the description of steps S908 to S909, and details are not described herein again.
- the first RNIC initiates an RDMA transmission request operation instead of the second device, and then the second processor may initiate an RDMA reception request after determining that the first data transmission is completed, so as to receive the RDMA transmission request initiated by the first processor.
- the method may further include:
- S1012 The second processor initiates a second RDMA reception request to the second RNIC, and the second RNIC receives the second RDMA reception request.
- the second RNIC generates a WQE corresponding to the second RDMA reception request.
- the first RNIC may notify the second RNIC by sending a confirmation message to inform that the memory of the first data is persistent.
- the method may further include:
- the first processor sends an RDMA transmission request to the first RNIC, and the first RNIC receives the RDMA transmission request.
- the RDMA transmission request is used to indicate that the first data has been stored in the non-volatile memory of the first device, and the first RDMA transmission request may include a serial number of the first data.
- S1015 The first RNIC generates a WQE corresponding to the RDMA transmission request.
- the first RNIC sends a persistence confirmation message corresponding to the first data to the second RNIC, and the second RNIC receives the persistence confirmation message corresponding to the first data.
- the persistence confirmation message corresponding to the first data includes a second PSN, and the second PSN is a sequence number of the first data.
- the first RNIC sends a persistence confirmation message corresponding to the first data to the second RNIC through the sending module of the first RNIC, and the second RNIC receives the persistence confirmation corresponding to the first data through the receiving module of the second RNIC. Message.
- the scheduling module of the first RNIC obtains the sequence number of the first data packet from the first RDMA transmission request as the second PSN, and then encapsulates the second PSN into the RDMA transmission message to form a confirmation message, and passes the first RNIC's
- the sending module sends to the second RNIC.
- the receiving module of the second RNIC sends the confirmation message to the scheduling module of the second RNIC.
- the scheduling module of the second RNIC parses the confirmation message to obtain a second PSN.
- the scheduling module of the second RNIC determines the The confirmation message is a persistent confirmation message corresponding to the first data.
- the second RNIC executes step S1018.
- the first RNIC clears the WQE corresponding to the RDMA transmission request, and generates a CQE corresponding to the RDMA transmission request.
- the second RNIC generates a CQE corresponding to the second RDMA reception request.
- the second processor obtains a CQE corresponding to the second RDMA reception request, and determines that the first data has been stored in the non-volatile memory of the first device.
- the memory persistence of the first data is completed.
- the processor of the local device only needs to send the bilateral operation once after sending the unilateral operation
- the data can be saved to the non-volatile memory of the remote device.
- a bilateral operation is omitted, the load on the processor of the local device is reduced, and the initiation is saved.
- the bandwidth occupied by RDMA persistent requests reduces the network load.
- the processes shown in FIG. 10 and FIG. 11 above are related to a process of saving one data to the non-volatile memory of the first device.
- the above solution can also be used to save multiple data to the first device.
- the RDMA transmission protocol stipulates that the RNIC of the local device must send the next data to the RNIC of the remote device only after determining that the previous data transmission process has been completed, that is, the RNIC of the local device receives the RNIC sent from the remote device. Only the ACK message for the previous data will send the next data to the remote device.
- the memory persistence process of the former data and the write process of the latter data of the two data can be performed in parallel to improve the transmission efficiency.
- a specific implementation method for making the memory persistence process of the former data and the write process of the latter data of the two data in parallel can be: after receiving the RDMA write request, the first RNIC sends a second data to the second RNIC. Confirmation message, the second confirmation message carries the PSN of the first data; after receiving the second confirmation message, the second RNIC receives the confirmation message of the PSN whose PSN is the first data for the first time, and the second RNIC according to the The PSN of the first data determines that a reception confirmation message corresponding to the first data is received, and further determines that the first RNIC receives the first data, and the second RNIC sends an RDMA write request corresponding to the next data of the first data to the first RNIC; In addition, the second RNIC buffers the second confirmation message.
- the second RNIC When receiving the first confirmation message of the PSN of which the PSN is the first data, the second RNIC caches the PSN of the first data in the first confirmation message and the previous cache.
- the PSN of the first data in the second confirmation message is determined to be the second time that the PSN for which the PSN is the first data is received, and the second RNIC determines to receive the persistent confirmation corresponding to the first data, and executes Said step S908 or S1018.
- the acknowledgment message is only used to indicate the meaning of acknowledgment, and the acknowledgment message is specifically used to indicate that the confirmation of what request needs to be performed according to the request sent or the order of the received confirmation message It is judged that because the write operation occurs before the memory persistence operation, the first confirmation message carrying the serial number of the first data must be the first data reception confirmation message.
- the RNIC of the second device can send the next data of the first data to the first device RNIC without waiting for the persistent acknowledgement message of the first data to be received.
- the way of confirming the PSN in the message can make one data's memory persistent and the next data's write operation can be performed in parallel, improving the efficiency of saving multiple data in the non-volatile memory of the remote device, Reduce latency.
- the RDMA write request sent by the second device may not include a data persistence flag.
- the first device receives the RDMA write request
- the first RNIC in the first device is communicating with the second device.
- a DMA read request is added to the receiving queue corresponding to the second RNIC, and the DMA read request is sent to the first processor, instructing the first processor to store the first data to the SCM. All write operations of the first processor before the DMA read request can be completed, so that all data that has not been written to the non-volatile memory is written to the non-volatile memory, and guaranteed to pass before the read request is initiated
- the first data written by the DMA write request can be written into the non-volatile memory, and the memory persistence of the first data is completed.
- the foregoing method may be implemented on a RNIC and a processor of a computing device.
- the embodiment of the present application further provides a corresponding computing device.
- FIG. 12 is a schematic structural diagram of a computing device according to an embodiment of the present application.
- the computing device 130 includes an RNIC 131, a processor 132, and a non-volatile memory 133.
- the structure of the RNIC 131 may be as shown in FIG. 6, and the structure of the processor 132 may be shown as the processor 102 in FIG. 1.
- the non-volatile memory 133 may be an SCM.
- the RNIC 131 is configured to execute the steps performed by the first RNIC in the method embodiments shown in FIG. 9 to FIG. 11, and the processor 132 is configured to perform the steps performed by the first processor in the method embodiments shown in FIG. 9 to FIG. 11. A step of.
- FIG. 13 is a schematic structural diagram of a computing device according to an embodiment of the present application.
- the computing device 140 includes an RNIC 141, a processor 142, and a non-volatile memory 143.
- the structure of the RNIC 141 may be as shown in FIG. 6, and the structure of the processor 142 may be shown as the processor 102 in FIG. 1.
- the non-volatile memory 133 includes SCM, NVRAM, and NVDIMM.
- the RNIC 141 is configured to execute the steps performed by the second RNIC in the method embodiments shown in FIG. 9 to FIG. 11, and the processor 142 is configured to execute the steps performed by the second processor in the method embodiments shown in FIG. 9 to FIG. 11. A step of.
- An embodiment of the present application further provides a processor.
- the structure of the processor may be as shown in the processor in FIG. 1, and the processor is configured to execute the second processor in the method embodiment shown in FIG. 9 to FIG. 11. Steps to perform.
- the above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination.
- the above embodiments may be implemented in whole or in part in the form of a computer program product.
- the computer program product includes one or more computer instructions.
- the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
- the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be from a website site, a computer, a server, or a data center.
- the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server, a data center, and the like, including one or more sets of available media.
- the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium.
- the semiconductor medium may be a solid state drive (SSD).
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Quality & Reliability (AREA)
- Memory System Of A Hierarchy Structure (AREA)
- Bus Control (AREA)
Abstract
The present application provides a data processing method, an RNIC, and a device. The method comprises: a first RNIC receiving a remote direct memory access (RDMA) write request sent by a second RNIC, the RDMA write request comprising first data and a data persistence flag, the data persistence flag being used to indicate that the first data is data to be persisted; according to the RDMA write request, the first RNIC sending a DMA write request to a first processor, to instruct the first processor to write the first data into a first device, the first RNIC and the first processor belonging to the first device, the second RNIC belonging to a second device, and the two devices communicating on the basis of an RDMA approach; and according to the data persistence flag, the first RNIC determining that the first data is the data to be persisted, and the first RNIC instructing the first processor to save the first data into a non-volatile memory of the first device. Thus, the present invention reduces the load of an RDMA network during remote memory persistence.
Description
本申请涉及计算机技术领域,尤其涉及数据处理方法、远程直接访存网卡和设备。The present application relates to the field of computer technology, and in particular, to a data processing method, a remote direct access storage network card, and a device.
存储级存储器(storage class memory,SCM),例如3D Xpoint,是一种新型非易失性存储器(non-volatile memory,NVM),它是一种同时结合了传统的存储介质(如机械硬盘、固态硬盘等)和存储器(如动态随机存取存储器)的复合型存储器。SCM可以像动态随机存取存储器一样嵌入主机板的插槽中,相较于动态随机存取存储器,SCM在失去电源的状态下仍然可以不间断地保存资料,具备断电保存的特性。SCM能够提供比快闪存储器更快的读写速度,在成本上比动态随机存取存储器更便宜。在一些计算设备系统架构中,SCM被用作内存使用。在一些方案中,通过以互连的方式将多个以SCM作为内存的计算设备连接形成SCM资源池以扩大SCM的容量,从而实现对数据的冗余备份。Storage class memory (SCM), such as 3D Xpoint, is a new type of non-volatile memory (NVM), which is a combination of traditional storage media (such as mechanical hard drives, solid state Hard disks, etc.) and storage (such as dynamic random access memory). The SCM can be embedded into the motherboard's slot like a dynamic random access memory. Compared to a dynamic random access memory, the SCM can still save data without interruption in the state of power loss, and has the characteristics of power-off saving. SCM can provide faster read and write speeds than flash memory, and is cheaper than dynamic random access memory. In some computing device system architectures, SCM is used as memory usage. In some solutions, multiple SCM-based computing devices are connected in an interconnected manner to form an SCM resource pool to expand the capacity of the SCM, thereby achieving redundant backup of data.
在多个以SCM作为内存的计算设备组成的SCM资源池中,任意两台计算设备之间可以基于远程直接内存访问(remote direct memory access,RDMA)技术通信以及传输数据,远程直接内存访问可以简称为远程直接访存。与传统的网络传输技术不同,RDMA技术可以将数据直接从一台计算设备的内存传输到另一台计算设备的内存,而无需两台计算设备的操作系统或内核介入。对于基于RDMA技术进行通信以及数据传输的SCM资源池,需要解决远程内存持久化的问题。内存持久化,指的是将数据从计算设备的易失性存储介质写回到计算设备的非易失性存储介质中;而远程内存持久化,则指的是将一台计算设备的SCM中的数据写入到另一台计算设备后再令数据被保存到另一台计算设备的SCM中。在以SCM作为内存的存储系统中,计算设备A(发起远程操作请求的计算设备)的SCM中的数据要被保存至计算设备B(接收远程操作请求的计算设备)的SCM中,该数据需要经过计算设备B的处理器,而由于计算设备B的处理器中存在缓存,该数据可能会被暂时缓存在缓存中,面临掉电丢失的风险。在目前的解决方案中,计算设备A一般是在通过RDMA写(write)请求将数据写入到计算设备B中后,再通过发起一次RDMA读(read)请求或RDMA发送(send)请求,以将通过RDMA写请求写入的数据写回到计算设备B的SCM中,计算设备A发起的RDMA读请求或RDMA发送请求可以看作是远程内存持久化请求。这种方案存在以下问题:由于计算设备A在每次RDMA写请求之后都要发起一次远程内存持久化请求,增加了RDMA网络的网络负载。In the SCM resource pool composed of multiple computing devices with SCM as memory, any two computing devices can communicate and transfer data based on remote direct memory access (RDMA) technology. Remote direct memory access can be referred to as For remote direct fetch. Unlike traditional network transmission technologies, RDMA technology can directly transfer data from the memory of one computing device to the memory of another computing device without the intervention of the operating systems or kernels of the two computing devices. For the SCM resource pool based on RDMA technology for communication and data transmission, it is necessary to solve the problem of remote memory persistence. Memory persistence refers to writing data from the volatile storage medium of the computing device back to the non-volatile storage medium of the computing device; while remote memory persistence refers to the SCM of a computing device After writing the data to another computing device, the data is stored in the SCM of the other computing device. In a storage system that uses SCM as a memory, data in the SCM of computing device A (the computing device that initiates the remote operation request) is to be saved to the SCM of computing device B (the computing device that receives the remote operation request). The data requires After passing through the processor of the computing device B, and because there is a cache in the processor of the computing device B, the data may be temporarily cached in the cache, and there is a risk of losing power. In the current solution, computing device A generally writes data to computing device B through an RDMA write request, and then initiates an RDMA read request or RDMA send request to The data written by the RDMA write request is written back to the SCM of the computing device B. The RDMA read request or RDMA send request initiated by the computing device A can be regarded as a remote memory persistence request. This solution has the following problems: Since the computing device A must initiate a remote memory persistence request after each RDMA write request, the network load of the RDMA network is increased.
发明内容Summary of the invention
本申请提供数据处理方法、远程直接访存网卡和设备,解决额外发送的远程内存 持久化请求所引起的网络负载较大的问题。This application provides a data processing method, a remote direct access memory card and a device, and solves the problem of a large network load caused by an additional remote memory persistence request.
第一方面,提供了一种数据处理方法,包括:In a first aspect, a data processing method is provided, including:
第一远程直接访存网卡(remote direct memory access network interface card,RNIC)接收第二RNIC发送的远程直接访存(remote direct memory access,RDMA)写请求,RDMA写请求包括第一数据和数据持久化标记,该RDMA写请求用于请求将第一数据写入第一设备,该数据持久化标记用于指示第一数据为待持久化的数据,其中,第一RNIC是第一设备的RNIC,即接收RDMA请求的设备的RNIC,第二RNIC是第二设备的RNIC,即发送RDMA请求的设备的RNIC,第一设备和第二设备基于RDMA方式进行通信;根据该RDMA写请求,第一RNIC确定需要将第一数据写入第一设备,第一RNIC向第一处理器发送直接内存访问(direct memory access,DMA)写请求,该DMA写请求包括第一数据,以指示第一处理器将第一数据写入第一设备,其中,第一处理器是第一设备的处理器,第一RNIC和第一处理器之间基于DMA的方式通信;根据该数据持久化标记,第一RNIC确定该第一数据为待持久化的数据,第一RNIC指示第一处理器将将第一数据保存到第一设备的非易失性存储器中。The first remote direct memory card (remote direct memory access network interface card, RNIC) receives the remote direct memory access (RDMA) write request sent by the second RNIC. The RDMA write request includes the first data and data persistence. Flag, the RDMA write request is used to request the first data to be written to the first device, and the data persistence flag is used to indicate that the first data is data to be persisted, where the first RNIC is the RNIC of the first device, that is, The RNIC of the device receiving the RDMA request. The second RNIC is the RNIC of the second device, that is, the RNIC of the device sending the RDMA request. The first device and the second device communicate based on the RDMA method. According to the RDMA write request, the first RNIC determines The first data needs to be written to the first device. The first RNIC sends a direct memory access (DMA) write request to the first processor. The DMA write request includes the first data to instruct the first processor to send the first data to the first processor. A data is written into the first device, wherein the first processor is the processor of the first device, and the first RNIC and the first processor communicate in a DMA-based manner; according to the data Persistence flag. The first RNIC determines that the first data is data to be persisted, and the first RNIC instructs the first processor to save the first data to a non-volatile memory of the first device.
其中,非易失性存储器可以为SCM。SCM具体可以为相变式随机访问存储器(phase-change random access memory,PRAM),PRAM例如可以为3D Xpoint,电阻式随机访问存储器(resistive random access memory,ReRAM),磁性随机访问存储器(magnetic random access memory,MRAM),等等。接收RDMA请求的设备可以称之为远端设备,发送RDMA请求的设备可以称之为本地设备。The non-volatile memory may be an SCM. SCM can be phase-change random access memory (PRAM), PRAM can be 3D Xpoint, resistive random access memory (ReRAM), magnetic random access memory (magnetic random access memory) memory, MRAM), and so on. A device that receives an RDMA request can be called a remote device, and a device that sends an RDMA request can be called a local device.
在上述方案中,远端设备的RNIC在根据RDMA写请求中的数据持久化标记确定第一数据为待持久化的数据后,直接指示远端设备的处理器将第一数据保存到第一设备的非易失性存储器中,以完成对第一数据的内存持久化,不需要本地设备的RNIC再发起远程内存持久化请求。由于本地设备不需要在发起RDMA写请求之后再发起远程内存持久化请求,减小了RDMA网络的负载;另外,将数据持久化标记携带在RDMA写请求中使得远程写操作和远程内存持久化操作成为被连续执行的操作,保证数据写入远端设备后可被保存至非易失性存储器中,避免出现数据不一致的问题。In the above solution, after the RNIC of the remote device determines that the first data is to be persisted according to the data persistence flag in the RDMA write request, it directly instructs the processor of the remote device to save the first data to the first device. To complete the memory persistence of the first data in the non-volatile memory, without the RNIC of the local device initiating a remote memory persistence request. Because the local device does not need to initiate a remote memory persistence request after initiating an RDMA write request, the load on the RDMA network is reduced; in addition, the data persistence flag is carried in the RDMA write request to enable remote write operations and remote memory persistence operations It becomes an operation that is performed continuously to ensure that data can be saved to non-volatile memory after being written to the remote device to avoid data inconsistencies.
在一种可能的实现方式中,上述数据持久化标记可以为写持久化指令。在数据化标记为写持久化指令的情况下,该RDMA写请求也可以称之为RDMA写持久化请求,如rdma write durable。通过在原有的RDMA操作指令的基础上增加写持久化指令,远端设备的RNIC解析RDMA写请求得到该写持久化指令,远端设备的RNIC可确定该RDMA写请求中的第一数据为待持久化的数据。In a possible implementation manner, the data persistence mark may be a write persistence instruction. In the case where the data is marked as a write-persistent instruction, the RDMA write request can also be called an RDMA write-persistence request, such as rdma write durable. By adding a write persistence instruction to the original RDMA operation instruction, the RNIC of the remote device parses the RDMA write request to obtain the write persistence instruction, and the RNIC of the remote device can determine that the first data in the RDMA write request is pending. Persistent data.
在另一种可能的实现方式中,上述数据持久化标记为第一数据对应的目的存储地址,该目的存储地址对应的存储空间用于存储第一数据,该目的存储地址为上述第一设备中的持久存储地址,该持久存储地址用于存储待持久化的数据。这里,第一数据对应的目的存储地址为第二设备指定的第一设备中的用于存储第一数据的存储空间所对应的存储地址。远端设备事先分配用于存储待持久化的数据的存储空间,在确定第一数据对应的目的存储地址为远端设备中事先分配的用于存储待持久化的数据的存储空间所对应的存储地址的情况下,远端设备的RNIC可确定RDMA写请求中的第一数据为待持久化的数据。In another possible implementation manner, the data persistence mark is a destination storage address corresponding to the first data, and a storage space corresponding to the destination storage address is used to store the first data, and the destination storage address is in the first device. The persistent storage address, which is used to store the data to be persisted. Here, the destination storage address corresponding to the first data is a storage address corresponding to a storage space for storing the first data in the first device designated by the second device. The remote device allocates storage space for storing data to be persisted in advance, and determines the destination storage address corresponding to the first data is the storage corresponding to the storage space allocated for storing data to be persisted in the remote device in advance In the case of an address, the RNIC of the remote device may determine that the first data in the RDMA write request is data to be persisted.
在另一种可能的实现方式中,第一RNIC可以通过以下方式指示第一处理器将第一数据保存到第一设备的非易失性存储器中:第一RNIC在与第二RNIC对应的接收队列(receive queue,SQ)中添加DMA最低有效位(least significant bit,LSB)读请求;第一RNIC向第一处理器发送该DMA LSB读请求,该DMA LSB读请求用于指示第一处理器将缓存在第一设备的外设总线链路中的所有数据写入至第一设备的非易失性存储器中。这种方式可以适用于在第一RNIC写数据的过程中,数据不经过第一处理器的最后层级缓存(last level cache,LLC)的情况。由于数据不经过第一处理器的LLC,数据可能还未完成写入而缓存在处理器的输入/输出(I/O)控制器的缓存中,根据外围扩展互连总线标准(peripheral component interconnect express,PCIe)协议规定,在进行任意一个读操作前,需要将写操作全部完成,所以通过在接收队列中添加DMA LSB读请求,然后向处理器发送该DMA LSB读请求,使得处理器执行该DMA LSB读请求对应的读操作,读操作可以使得缓存在外设总线上的还未完成写入的数据写入到非易失性存储器中,进而可以将可能还缓存在外设总线上的第一数据保存到非易失性存储器中。In another possible implementation manner, the first RNIC may instruct the first processor to save the first data to the non-volatile memory of the first device in the following manner: The first RNIC receives the data corresponding to the second RNIC. A DMA least significant bit (LSB) read request is added to the queue (receive queue, SQ); the first RNIC sends the DMA LSB read request to the first processor, and the DMA LSB read request is used to instruct the first processor Write all data buffered in the peripheral bus link of the first device to the non-volatile memory of the first device. This method may be applicable to the case where the data does not pass through the last level cache (LLC) of the first processor during the data writing process of the first RNIC. Because the data does not pass through the LLC of the first processor, the data may not yet be written and cached in the cache of the processor's input / output (I / O) controller. According to the peripheral component interconnect standard (PCIe) protocol stipulates that before any read operation is performed, all write operations need to be completed, so by adding a DMA LSB read request to the receive queue, and then sending the DMA LSB read request to the processor, the processor executes the DMA. The read operation corresponding to the LSB read request. The read operation can write the data that has not been written on the peripheral bus to the non-volatile memory, and then can save the first data that may also be cached on the peripheral bus. To non-volatile memory.
在另一种可能的实现方式中,第一RNIC也可以通过向处理器发送其他DMA读请求,指示第一处理器将缓存在第一设备的外设总线链路中所有数据写入至第一设备的非易失性存储器。其中,上述DMA读请求可以是读取SCM中任意地址的读请求,读请求读取的地址可以是读取SCM中任意一个地址,例如,该地址为用于存储第一数据的地址区间的起始地址、结束地址或除起始地址和结束地址以外的任意一个地址。可选地,上述读请求所要读取的地址也可以是一段地址段,例如,该地址段为用于存储第一数据的一段地址区间,或者是SCM中任意一段存储区间。In another possible implementation manner, the first RNIC may instruct the first processor to write all data buffered in the peripheral bus link of the first device to the first by sending another DMA read request to the processor. The device's non-volatile memory. The above DMA read request may be a read request to read an arbitrary address in the SCM, and the address read by the read request may be any address in the SCM. For example, the address is a starting address range used to store the first data. Start address, end address, or any address other than start address and end address. Optionally, the address to be read by the read request may also be an address segment. For example, the address segment is an address range for storing the first data, or any one of the storage ranges in the SCM.
在另一种可能的实现方式中,第一RNIC可以通过以下方式指示第一处理器将第一数据保存到第一设备的非易失性存储器中:第一RNIC在根据第一数据生成RDMA接收(receive)请求对应的工作队列项(work queue entry,WQE)之后,清除该WQE,该RDMA接收请求用于接收第二设备发起的RDMA发送(send)请求;在清除该WQE之后,第一RNIC生成该RDMA请求对应的完成队列项(completion queue entry,CQE),以指示第一处理器将缓存在第一处理器的易失性存储介质中的第一数据存储至所述第一设备的非易失性存储器中。第一处理器的易失性存储介质可以是LLC。这种方式可以适用于在远端设备的RNIC写数据的过程中,数据经过远端设备的处理器的LLC的情况。由于第一数据经过第一处理器的LLC,通过根据第一数据生成并清除RDMA接收请求对应的WQE以及生成RDMA接收请求对应的CQE,当第一处理器获取到CQE时,可以产生中断从而将使得第一处理器将相应地址的数据写回到非易失性存储器中。而WQE以及CQE均是根据第一数据产生的,其与第一数据相对应,将相应地址的数据写回到非易失性存储器中即可以将第一数据存储到非易失性存储器中。In another possible implementation manner, the first RNIC may instruct the first processor to save the first data to the non-volatile memory of the first device in the following manner: the first RNIC generates RDMA reception according to the first data (receive) After requesting a corresponding work queue entry (work queue entry, WQE), the WQE is cleared, and the RDMA receive request is used to receive an RDMA send request initiated by the second device; after clearing the WQE, the first RNIC Generate a completion queue entry (CQE) corresponding to the RDMA request, to instruct the first processor to store first data buffered in a volatile storage medium of the first processor to a non-volatile memory of the first device Volatile memory. The volatile storage medium of the first processor may be LLC. This method can be applicable to the case where the data passes through the LLC of the processor of the remote device during the data writing process of the RNIC of the remote device. Because the first data passes through the LLC of the first processor, by generating and clearing the WQE corresponding to the RDMA reception request and generating the CQE corresponding to the RDMA reception request according to the first data, when the CQE is obtained by the first processor, an interrupt can be generated and the The first processor is caused to write the data of the corresponding address back to the non-volatile memory. WQE and CQE are both generated based on the first data, which corresponds to the first data. The first data can be stored in the non-volatile memory by writing the data of the corresponding address back to the non-volatile memory.
在另一种可能的实现方式中,第一RNIC指示第一处理器将第一数据保存到第一设备的非易失性存储器中之后,还可以向第二RNIC发送第一数据对应的持久化确认消息,该持久化确认消息包括第二数据包序列号(packet sequence number,PSN),第二PSN为第一数据的序列号,该持久化确认消息用于指示第一数据的内存持久化完成。通过向第二RNIC发送持久化确认消息,第二RNIC可以根据该第二PSN确定第 一数据被存储至第一设备的非易失性存储器中。In another possible implementation manner, after the first RNIC instructs the first processor to save the first data in the non-volatile memory of the first device, the first RNIC may also send the persistence corresponding to the first data to the second RNIC. A confirmation message. The persistence confirmation message includes a second packet sequence number (packet sequence number, PSN), and the second PSN is a sequence number of the first data. The persistence confirmation message is used to indicate that the first data memory is persistent. . By sending a persistent confirmation message to the second RNIC, the second RNIC can determine that the first data is stored in the non-volatile memory of the first device according to the second PSN.
在另一种可能的实现方式中,第一RNIC接收第二RNIC发送的RDMA写请求之后,向第二RNIC发送第一数据对应的接收确认消息,第一数据对应的接收确认消息包括第一PSN,第一PSN为第一数据的序列号,该接收确认消息用于指示第一RNIC接收到第一数据。通过向第二RNIC发送接收确认消息,第二RNIC可根据第一PSN确定第一数据的传输完成。In another possible implementation manner, after the first RNIC receives the RDMA write request sent by the second RNIC, it sends a reception confirmation message corresponding to the first data to the second RNIC, and the reception confirmation message corresponding to the first data includes the first PSN The first PSN is a serial number of the first data, and the reception confirmation message is used to indicate that the first RNIC receives the first data. By sending a reception confirmation message to the second RNIC, the second RNIC can determine that the transmission of the first data is completed according to the first PSN.
第二方面,提供了另一种数据处理方法,包括:第二RNIC接收第二处理器的RDMA写持久化请求,该RDMA写持久化请求包括数据持久化标记,该RDMA写持久化请求用于请求将第一数据存储至至第一设备的非易失性存储器中,数据持久化标记用于指示第一数据为待持久化的数据,其中,第二RNIC和第二处理器分别为第二设备的RNIC和处理器,第二RNIC和第二处理器基于DMA方式的通信,第二设备为发起RDMA请求的设备,第一设备为接收RDMA请求的设备,第二设备和第一设备基于RDMA方式通信;第二RNIC根据RDMA写持久化请求生成RDMA写请求,该RDMA写请求包括第一数据和数据持久化标记;第二RNIC向第一RNIC发送该RDMA写请求,该RDMA写请求用于请求将第一数据写入第一设备,第一RNC为第一设备的RNIC。In a second aspect, another data processing method is provided, including: the second RNIC receives an RDMA write persistence request from a second processor, the RDMA write persistence request includes a data persistence flag, and the RDMA write persistence request is used for The first data is requested to be stored in the non-volatile memory of the first device, and the data persistence flag is used to indicate that the first data is data to be persisted, where the second RNIC and the second processor are second The device's RNIC and processor, the second RNIC and the second processor are based on DMA communication, the second device is the device initiating the RDMA request, the first device is the device receiving the RDMA request, and the second device and the first device are based on RDMA Mode communication; the second RNIC generates an RDMA write request according to the RDMA write persistence request, the RDMA write request includes the first data and a data persistence flag; the second RNIC sends the RDMA write request to the first RNIC, and the RDMA write request is used Request to write the first data to the first device, and the first RNC is the RNIC of the first device.
其中,接收RDMA请求的设备可以称之为远端设备,发送RDMA请求的设备可以称之为本地设备,即,第一设备为远端设备,第二设备为本地设备。The device receiving the RDMA request may be called a remote device, and the device sending the RDMA request may be called a local device, that is, the first device is a remote device, and the second device is a local device.
在上述方案中,本地设备的RNIC接收到本地设备的处理器发起的RDMA写持久化请求的情况下,根据RDMA写持久化请求生成RDMA写请求,RDMA写请求自身的含义可以使远端设备的RNIC将第一数据写入远端设备,RDMA写请求中的数据持久化标记可以使远端设备的RNIC知道第一数据是待持久化的数据,从而远端设备的RNIC可以在将第一数据写入远端设备后,将该第一数据存储到第一设备的非易失性存储器中,完成对第一数据的内存持久化。该方案相当于将写请求和内存持久化请求融合在一个请求中,本地设备不需要再发送额外的内存持久化请求,减小了RDMA网络的负载;另外,将数据持久化标记携带在RDMA写请求中使得远程写操作和远程内存持久化操作成为被连续执行的操作,保证数据写入远端设备后可被保存至非易失性存储器中,避免出现数据不一致的问题。In the above solution, when the RNIC of the local device receives the RDMA write persistence request initiated by the processor of the local device, the RDMA write request is generated according to the RDMA write persistence request. The meaning of the RDMA write request itself can make the remote device's The RNIC writes the first data to the remote device. The data persistence mark in the RDMA write request can make the RNIC of the remote device know that the first data is data to be persisted, so the RNIC of the remote device can write the first data. After writing to the remote device, the first data is stored in the non-volatile memory of the first device to complete the memory persistence of the first data. This solution is equivalent to combining the write request and the memory persistence request in one request. The local device does not need to send additional memory persistence requests, which reduces the load of the RDMA network. In addition, the data persistence mark is carried in the RDMA write. In the request, the remote write operation and the remote memory persistence operation are continuously performed operations, ensuring that data can be saved to the non-volatile memory after being written to the remote device, thereby avoiding the problem of data inconsistency.
在一种可能的实现方式中,上述数据持久化标记为写持久化指令。在数据化标记为写持久化指令的情况下,该RDMA请求也可以称之为RDMA写持久化请求,如rdma write durable。通过在原有的RDMA操作指令的基础上增加写持久化指令,可使远端设备的RNIC在解析RDMA写请求得到该写持久化指令后确定该RDMA写请求中的第一数据为待持久化的数据。In a possible implementation manner, the foregoing data persistence is marked as a write persistence instruction. In the case where the data is marked as a write-persistent instruction, the RDMA request can also be called an RDMA write-persistence request, such as rdma write durable. By adding a write persistence instruction to the original RDMA operation instruction, the RNIC of the remote device can determine that the first data in the RDMA write request is to be persisted after parsing the RDMA write request to obtain the write persistence instruction. data.
在另一种可能的实现方式中,上述数据持久化标记为第一数据对应的目的虚拟内存地址,第一数据对应的目的存储地址为第一设备中的持久存储地址,该持久存储地地址对应的存储空间用于存储待持久化的数据。第一数据对应的目的存储地址指的是第二设备指定的第一设备中的用于存储第一数据的存储地址。通过将第一数据对应的目的存储地址设置为远端设备中的用于存储待持久化的数据的存储空间所对应的存储地址,可以使远端设备的RNIC根据该目的存储地址确定第一数据为待持久化的数据。In another possible implementation manner, the data persistence mark is a destination virtual memory address corresponding to the first data, and the destination storage address corresponding to the first data is a persistent storage address in the first device, and the persistent storage address corresponds to Of storage space is used to store data to be persisted. The destination storage address corresponding to the first data refers to a storage address in the first device designated by the second device for storing the first data. By setting the destination storage address corresponding to the first data to the storage address corresponding to the storage space for storing the data to be persisted in the remote device, the RNIC of the remote device can determine the first data according to the destination storage address For data to be persisted.
在另一种可能的实现方式中,第二RNIC在向第一RNIC发送RDMA写请求之后,在接收到第一RNIC发送的第一数据对应的持久化确认消息的情况下,第二RNIC生成RDMA写持久化请求对应的CQE,第一数据对应的持久化确认消息用于指示第一数据的内存持久化完成,该RDMA写持久化请求对应的CQE用于通知第二处理器的内存持久化完成。这种方式可以适用于在远端设备的RNIC写数据的过程中,数据不经过远端设备的处理器的LLC的情况。在接收到第一RNIC发送的持久化确认消息的情况下,第二RNIC生成RDMA写持久化请求对应的CQE,当第二处理器获取到CQE时,可以根据该CQE确定第一数的内存持久化完成,第一处理器不用再发起一次远程内存持久化请求,减小了RDMA网络的负载。In another possible implementation manner, after the second RNIC sends an RDMA write request to the first RNIC, after receiving the persistence confirmation message corresponding to the first data sent by the first RNIC, the second RNIC generates RDMA The CQE corresponding to the write persistence request. The persistence confirmation message corresponding to the first data is used to indicate the completion of the memory persistence of the first data. The CQE corresponding to the RDMA write persistence request is used to notify the second processor of the memory persistence completion. . This method can be applied to the case that the data does not pass through the LLC of the processor of the remote device during the process of writing data by the RNIC of the remote device. Upon receiving the persistence confirmation message sent by the first RNIC, the second RNIC generates a CQE corresponding to the RDMA write persistence request. When the second processor obtains the CQE, the first number of memory persistences can be determined according to the CQE After the modification is completed, the first processor does not need to initiate a remote memory persistence request again, which reduces the load of the RDMA network.
在另一种可能的实现方式中,第二RNIC生成RDMA写持久化请求对应的CQE之前,第二RNIC在接收到第一RNIC发送的第一数据对应的接收确认的情况下,缓存第一数据对应的接收确认消息,第一数据对应的接收确认消息包括第一PSN,第一PSN为第一数据的序列号,第一数据对应的接收确认消息用于指示第一RNIC接收到第一数据;第二RNIC接收第一RNIC发送的第一确认消息,第一确认消息包括第二PSN;在第二PSN与第一PSN相同的情况下,第二RNIC确定接收到第一数据对应的持久化确认消息。在RDMA协议中,确认(acknowledgement,ACK)消息仅用于表示确认的意思,而该ACK消息具体用于表示对什么请求的确认需要根据在接收到确认消息之前所发送的请求或者接收到的ACK消息的顺序进行判断,在该设计方案中,由于写入操作是发生在内存持久化操作之前,那么第一次接收到的携带第一数据的序列号的确认消息必然是第一数据的接收确认消息,在接收到第一数据的接收确认消息后,通过缓存该接收确认消息,则第二设备的RNIC不需要等到接收到第一数据的持久化确认消息就可以向第一设备RNIC发送第一数据的下一个数据,通过比较PSN的方式可以使一个数据的内存持久化和该数据的下一个数据的写入操作并行进行,提高数据内存持久化的效率,减小时延。In another possible implementation manner, before the second RNIC generates the CQE corresponding to the RDMA write persistence request, the second RNIC buffers the first data in a case where the second RNIC receives a reception confirmation corresponding to the first data sent by the first RNIC. A corresponding reception confirmation message, the reception confirmation message corresponding to the first data includes a first PSN, the first PSN is a serial number of the first data, and the reception confirmation message corresponding to the first data is used to instruct the first RNIC to receive the first data; The second RNIC receives the first confirmation message sent by the first RNIC, and the first confirmation message includes the second PSN. In the case where the second PSN is the same as the first PSN, the second RNIC determines to receive the persistent confirmation corresponding to the first data. Message. In the RDMA protocol, an acknowledgement (acknowledgement, ACK) message is only used to indicate the meaning of acknowledgement, and the ACK message is specifically used to indicate that the confirmation of what request needs to be based on the request sent before the acknowledgement message is received or the received ACK The order of the messages is judged. In this design solution, because the write operation occurs before the memory persistence operation, the first confirmation message carrying the serial number of the first data must be the confirmation of the reception of the first data. Message, after receiving the reception confirmation message of the first data, by buffering the reception confirmation message, the RNIC of the second device does not need to wait for the persistent confirmation message of the first data to be sent to the first device RNIC. For the next data of the data, by comparing the PSN method, the memory persistence of one data and the write operation of the next data of the data can be performed in parallel, which improves the efficiency of data memory persistence and reduces the delay.
在另一种可能的实现方式中,第二RNIC向第一RNIC发送RDMA写请求之后,在接收到第一RNIC发送的第一数据对应的接收确认消息的情况下,第二RNIC生成RDMA写持久化请求对应的CQE,第一数据对应的接收确认消息用于指示第一RNIC接收到第一数据,该RDMA写持久化请求对应的CQE用于通知第二处理器第一数据已经写入第一设备;第二RNIC接收第二处理器发送的RDMA接收请求;在接收到第一数据对应的持久化确认消息的情况下,第二RNIC生成RDMA请求对应的CQE,第一数据对应的持久化确认消息用于指示第一数的内存持久化完成,该RDMA接收请求对应的CQE用于通知第二处理器第一数据的内存持久化完成。这种方式可以适用于在远端设备的RNIC写数据的过程中,数据经过远端设备的处理器的LLC的情况。由于第一数据经过第一设备的处理器的LLC,则在接收到第一数据对应的接收确认消息的情况下生成RDMA写持久化请求对应的CQE,使得第二处理器可以根据RDMA写持久化请求对应的CQE发起RDMA接收请求,省去了第二处理器发起RDMA发送请求的过程,减小了RDMA网络的负载。In another possible implementation manner, after the second RNIC sends an RDMA write request to the first RNIC, upon receiving a reception confirmation message corresponding to the first data sent by the first RNIC, the second RNIC generates an RDMA write persistence The CQE corresponding to the request is received. The reception confirmation message corresponding to the first data is used to indicate that the first RNIC receives the first data. The CQE corresponding to the RDMA write persistence request is used to notify the second processor that the first data has been written to the first. Device; the second RNIC receives the RDMA reception request sent by the second processor; in the case of receiving the persistence confirmation message corresponding to the first data, the second RNIC generates a CQE corresponding to the RDMA request and the persistence confirmation corresponding to the first data The message is used to indicate that the first number of memory persistence is completed, and the CQE corresponding to the RDMA receiving request is used to notify the second processor that the memory persistence of the first data is completed. This method can be applicable to the case where the data passes through the LLC of the processor of the remote device during the data writing process of the RNIC of the remote device. Since the first data passes through the LLC of the processor of the first device, the CQE corresponding to the RDMA write persistence request is generated when the reception confirmation message corresponding to the first data is received, so that the second processor can write persistence according to the RDMA The CQE corresponding to the request initiates an RDMA reception request, eliminating the process of the RDMA sending request initiated by the second processor, and reducing the load of the RDMA network.
在另一种可能的实现方式中,第二RNIC生成RDMA请求对应的CQE之前,第二RNIC还可以缓存第一数据对应的接收确认消息,第一数据对应的接收确认消息包 括第一PSN,第一PSN为第一数据的序列号;第二RNIC接收第一RNIC发送的第一确认消息,该第一确认消息包括第二PSN;在第二PSN与第一PSN相同的情况下,第二RNIC确定接收到第一数据对应的持久化确认消息。通过缓存第一数据的接收确认消息,则第二设备不需要等到接收到第一数据的持久化确认就可以向第一设备发送第一数据的下一个数据,通过比较PSN的方式可以使一个数据的内存持久化和该数据的下一个数据的写入操作并行进行,提高数据内存持久化的效率,减小时延。In another possible implementation manner, before the second RNIC generates the CQE corresponding to the RDMA request, the second RNIC may also buffer the reception confirmation message corresponding to the first data, and the reception confirmation message corresponding to the first data includes the first PSN, the first One PSN is the serial number of the first data; the second RNIC receives the first confirmation message sent by the first RNIC, and the first confirmation message includes the second PSN; when the second PSN is the same as the first PSN, the second RNIC It is determined that a persistence confirmation message corresponding to the first data is received. By buffering the reception confirmation message of the first data, the second device can send the next data of the first data to the first device without waiting for receiving the persistent confirmation of the first data, and a data can be made by comparing the PSN The memory persistence and the next data write operation of the data are performed in parallel, improving the efficiency of data memory persistence and reducing the delay.
第三方面,提供了又一种数据处理方法,包括:第二处理器向第二RNIC发送RDMA写持久化请求,该RDMA写持久化请求包括数据持久化标记,RDMA写持久化请求用于请求将第一数据存储至第一设备的非易失性存储器中,数据持久化标记用于指示第一数据为待持久化的数据,其中,第二RNIC和第二处理器分别为第二设备的RNIC和处理器,第二RNIC和第二处理器基于DMA的方式通信,第二设备为发起RDMA请求的设备,第一设备为接收RDMA请求的设备,第一设备和第二设备基于RDMA方式通信;在获取到RDMA写持久化请求对应的CQE的情况下,第二处理器确定第一数据已经存储至所述第一设备的非易失性存储器中。According to a third aspect, another data processing method is provided, including: the second processor sends an RDMA write persistence request to the second RNIC, the RDMA write persistence request includes a data persistence flag, and the RDMA write persistence request is used for the request The first data is stored in the non-volatile memory of the first device, and the data persistence flag is used to indicate that the first data is data to be persisted, wherein the second RNIC and the second processor are the data of the second device, respectively. The RNIC and the processor, the second RNIC and the second processor communicate in a DMA-based manner, the second device is a device initiating an RDMA request, the first device is a device receiving an RDMA request, and the first device and the second device communicate in an RDMA In the case of obtaining the CQE corresponding to the RDMA write persistence request, the second processor determines that the first data has been stored in the non-volatile memory of the first device.
其中,接收RDMA请求的设备可以称之为远端设备,发送RDMA请求的可以称之为本地设备,即,第一设备为远端设备,第二设备为本地设备。The device that receives the RDMA request may be called a remote device, and the device that sends the RDMA request may be called a local device. That is, the first device is a remote device and the second device is a local device.
该方案可以适用于在远端设备的RNIC写数据的过程中,数据不经过远端设备的处理器的LLC的情况。本地设备的处理器只用发一次RDMA写请求即可将第一数据写入并内存持久化至远端设备中,不需要在发起RDMA写请求之后再发起远程内存持久化请求,减小了RDMA网络的负载;另外,将数据持久化标记携带在一个RDMA请求中使得远程写操作和远程内存持久化操作成为被连续执行的操作,保证数据写入远端设备后可被保存至非易失性存储器中,避免出现数据不一致的问题。This solution can be applied to the case that the data does not pass through the LLC of the processor of the remote device during the process of writing data by the RNIC of the remote device. The processor of the local device only needs to send an RDMA write request to write and store the first data in the remote device. It does not need to initiate a remote memory persistence request after initiating an RDMA write request, which reduces the RDMA. The load of the network; in addition, carrying the data persistence tag in an RDMA request makes the remote write operation and remote memory persistence operation be continuously performed operations, ensuring that data can be saved to non-volatile after writing to the remote device To avoid data inconsistency in the memory.
在另一种可能的实现方式中,上述数据持久化标记为写持久化指令。通过在原有的RDMA操作指令的基础上增加RDMA写持久化指令,可使远端设备的RNIC在解析RDMA请求得到该RDMA写持久化指令后确定该RDMA请求中的第一数据为待持久化的数据。In another possible implementation manner, the foregoing data persistence is marked as a write persistence instruction. By adding an RDMA write persistence instruction to the original RDMA operation instruction, the RNIC of the remote device can determine that the first data in the RDMA request is to be persisted after parsing the RDMA request to obtain the RDMA write persistence instruction. data.
在另一种可能的实现方式中,上述数据持久化标记为第一数据对应的目的存储地址,第一数据对应的目的存储地址为第一设备中的持久存储地址,该持久存储地地址对应的存储空间用于存储待持久化的数据。第一数据对应的目的存储地址指的是第二设备指定的第一设备中的用于存储第一数据的存储地址。通过将第一数据对应的目的存储地址设置为远端设备中的用于存储待持久化的数据的存储空间所对应的存储地址,可以使远端设备的RNIC根据该目的存储地址确定第一数据为待持久化的数据。In another possible implementation manner, the data persistence mark is a destination storage address corresponding to the first data, the destination storage address corresponding to the first data is a persistent storage address in the first device, and the persistent storage address corresponds to Storage space is used to store data to be persisted. The destination storage address corresponding to the first data refers to a storage address in the first device designated by the second device for storing the first data. By setting the destination storage address corresponding to the first data to the storage address corresponding to the storage space for storing the data to be persisted in the remote device, the RNIC of the remote device can determine the first data according to the destination storage address For data to be persisted.
第四方面,提供了又一种数据处理方法,包括:第二处理器向第二RNIC发送RDMA写持久化请求,该RDMA写持久化请求包括数据持久化标记,RDMA写持久化请求用于请求将第一数据存储至所述第一设备的非易失性存储器中,数据持久化标记用于指示第一数据为待持久化的数据,其中,第二RNIC和第二处理器分别为第二设备的RNIC和处理器,第二RNIC和第二处理器基于DMA的方式通信,第二设备为发起RDMA请求的设备,第一设备为接收RDMA请求的设备,第一设备和第二设备基于RDMA方式通信;在获取到RDMA写持久请求对应的CQE的情况下,第二处理 器向第二RNIC发送RDMA接收请求;在获取到第二RNIC发送的RDMA接收请求对应的CQE的情况下,第二处理器确定第一数据已经存储至所述第一设备的非易失性存储器中。According to a fourth aspect, another data processing method is provided, including: the second processor sends an RDMA write persistence request to the second RNIC, the RDMA write persistence request includes a data persistence flag, and the RDMA write persistence request is used for the request Storing first data in a non-volatile memory of the first device, and a data persistence flag is used to indicate that the first data is data to be persisted, wherein the second RNIC and the second processor are second The device's RNIC and processor, the second RNIC and the second processor communicate in a DMA-based manner, the second device is the device that initiates the RDMA request, the first device is the device that receives the RDMA request, and the first device and the second device are based on RDMA Mode communication; when the CQE corresponding to the RDMA write persistent request is acquired, the second processor sends an RDMA reception request to the second RNIC; when the CQE corresponding to the RDMA reception request sent by the second RNIC is acquired, the second The processor determines that the first data has been stored in a non-volatile memory of the first device.
这种方案可以适用于在远端设备的RNIC写数据的过程中,数据不经过远端设备的处理器的LLC的情况。省去了本地设备发送RDMA发送请求,减小RDMA网络的负载。This solution can be applied to the case that the data does not pass through the LLC of the processor of the remote device during the data writing process of the RNIC of the remote device. Eliminates the need for the local device to send an RDMA transmission request, reducing the load on the RDMA network.
在一种可能的实现方式中,上述数据持久化标记为写持久化指令。通过在原有的RDMA操作指令的基础上增加RDMA写持久化指令,可使远端设备的RNIC在解析RDMA请求得到该RDMA写持久化指令后确定该RDMA请求中的第一数据为待持久化的数据。In a possible implementation manner, the foregoing data persistence is marked as a write persistence instruction. By adding an RDMA write persistence instruction to the original RDMA operation instruction, the RNIC of the remote device can determine that the first data in the RDMA request is to be persisted after parsing the RDMA request to obtain the RDMA write persistence instruction. data.
在另一种可能的实现方式中,上述数据持久化标记为第一数据对应的目的存储地址,第一数据对应的目的存储地址为第一设备中的持久存储地址,该持久存储地地址对应的存储空间用于存储待持久化的数据。第一数据对应的目的存储地址指的是第二设备指定的第一设备中的用于存储第一数据的存储地址。通过将第一数据对应的目的存储地址设置为远端设备中的用于存储待持久化的数据的存储空间所对应的存储地址,可以使远端设备的RNIC根据该目的存储地址确定第一数据为待持久化的数据。In another possible implementation manner, the data persistence mark is a destination storage address corresponding to the first data, the destination storage address corresponding to the first data is a persistent storage address in the first device, and the persistent storage address corresponds to Storage space is used to store data to be persisted. The destination storage address corresponding to the first data refers to a storage address in the first device designated by the second device for storing the first data. By setting the destination storage address corresponding to the first data to the storage address corresponding to the storage space for storing the data to be persisted in the remote device, the RNIC of the remote device can determine the first data according to the destination storage address For data to be persisted.
第五方面,提供了一种RNIC,包括用于执行第一方面或第一方面任一种可能实现方式中的数据处理方法的各个模块。According to a fifth aspect, an RNIC is provided, including various modules for executing the data processing method in the first aspect or any possible implementation manner of the first aspect.
第六方面,提供了另一种RNIC,包括用于执行第二方面或第二方面任一种可能实现方式中的数据处理方法的各个模块。According to a sixth aspect, another RNIC is provided, including various modules for executing the data processing method in the second aspect or any one of the possible implementation manners of the second aspect.
第七方面,提供了一种处理器,用于执行上述第三方面或第四方面涉及的流程中的部分或全部流程。According to a seventh aspect, a processor is provided to execute part or all of the processes involved in the third aspect or the fourth aspect.
第八方面,提供了一种第一设备,包括处理器、非易失性存储器以及RNIC,该RNIC用于执行上述第一方面所述的方法流程中的操作步骤。According to an eighth aspect, a first device is provided, including a processor, a non-volatile memory, and an RNIC, and the RNIC is configured to execute the operation steps in the method flow described in the first aspect.
第九方面,提供了一种第二设备,包括处理器、非易失性存储器以及RNIC,该RNIC用于执行上述第二方面所述的方法流程中的操作步骤,该处理器用于执行上述第三方面或第四方面所述的方法流程中的操作步骤。According to a ninth aspect, a second device is provided, including a processor, a non-volatile memory, and an RNIC. The RNIC is configured to execute the operation steps in the method flow described in the second aspect, and the processor is configured to execute the first Operation steps in the method flow described in the third aspect or the fourth aspect.
第十方面,提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。According to a tenth aspect, a computer-readable storage medium is provided. The computer-readable storage medium stores instructions that, when run on a computer, cause the computer to execute the methods described in the above aspects.
第十一方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。According to an eleventh aspect, a computer program product containing instructions is provided, which when executed on a computer, causes the computer to perform the methods described in the above aspects.
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。On the basis of the implementation manners provided in the above aspects, this application may be further combined to provide more implementation manners.
图1是本申请实施例提供的一种计算设备的组成示意图;FIG. 1 is a schematic structural diagram of a computing device according to an embodiment of the present application; FIG.
图2是本申请实施例提供的一种远程内存持久化的流程示意图;2 is a schematic flowchart of remote memory persistence provided by an embodiment of the present application;
图3是本申请实施例提供的另一种远程内存持久化的流程示意图;3 is a schematic flowchart of another remote memory persistence process provided by an embodiment of the present application;
图4是本申请实施例提供的一种通过RDMA技术通信的计算设备网络的示意图;4 is a schematic diagram of a computing device network that communicates through RDMA technology according to an embodiment of the present application;
图5是本申请实施例提供的本地设备和远端设备所组成的通信系统的示意图;5 is a schematic diagram of a communication system composed of a local device and a remote device according to an embodiment of the present application;
图6是本申请实施例提供的RNIC的一种实现方式的结构示意图;6 is a schematic structural diagram of an implementation manner of an RNIC according to an embodiment of the present application;
图7是本申请实施例提供的RNIC的另一种结构示意图;7 is another schematic structural diagram of an RNIC according to an embodiment of the present application;
图8是本申请实施例提供的一种两台计算设备之间的通道示意图;8 is a schematic diagram of a channel between two computing devices according to an embodiment of the present application;
图9是本申请实施例提供的一种数据处理方法的流程示意图;9 is a schematic flowchart of a data processing method according to an embodiment of the present application;
图10是本申请实施例提供的另一种数据处理方法的流程示意图;10 is a schematic flowchart of another data processing method according to an embodiment of the present application;
图11是本申请实施例提供的另一种数据处理方法的流程示意图;11 is a schematic flowchart of another data processing method according to an embodiment of the present application;
图12是本申请实施例提供的一种计算设备的组成结构示意图;FIG. 12 is a schematic structural diagram of a computing device according to an embodiment of the present application; FIG.
图13是本申请实施例提供的另一种计算设备的组成结构示意图。FIG. 13 is a schematic structural diagram of another computing device according to an embodiment of the present application.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
首先,结合图1至图3介绍下传统的数据处理方法。First, a conventional data processing method is described with reference to FIGS. 1 to 3.
图1为一种计算设备的示意图。如图1所示,计算设备10包括RNIC101、处理器102以及SCM103,RNIC、处理器102以及SCM103通过总线104连接,其中,总线104包括但不限于外设总线(如外围器件互联(peripheral component interconnect,PCI)总线、PCIe总线等)、系统总线;处理器可以为中央处理器(central processing unit,CPU)。处理器包括102包括集成输入/输出控制器(integrated input/output,IIO),最后层级缓存(last level cache,LLC)、集成内存控制器(integrated memory controller,iMC)以及一个或多个处理器内核(core)。在处理器中,IIO用于处理与RNIC等外设之间交互的报文,该报文可以为PCI报文、PCIe报文,等等。IIO、LLC、iMC以及处理器内核可以通过芯片内部总线连接,例如,处理器为进阶精简指令集机器(advanced RISC machine,ARM)架构的芯片,则IIO、LLC、iMC以及处理器内核通过先进高性能总线(advanced high performance bus,AHB)连接。FIG. 1 is a schematic diagram of a computing device. As shown in FIG. 1, the computing device 10 includes an RNIC 101, a processor 102, and an SCM 103. The RNIC, the processor 102, and the SCM 103 are connected through a bus 104. The bus 104 includes, but is not limited to, a peripheral bus (such as a peripheral component interconnect). (PCI) bus, PCIe bus, etc.), system bus; the processor may be a central processing unit (central processing unit, CPU). The processor includes 102 including an integrated input / output controller (IIO), a last level cache (LLC), an integrated memory controller (iMC), and one or more processor cores (core). In the processor, IIO is used to process messages that interact with peripherals such as RNICs. The messages can be PCI messages, PCIe messages, and so on. IIO, LLC, iMC, and processor cores can be connected through the chip's internal bus. For example, if the processor is a chip of an advanced reduced instruction set machine (ARMC) architecture, then IIO, LLC, iMC, and the processor core are advanced through High performance bus (advanced high performance bus, AHB) connection.
在以SCM作为内存的计算设备中,要将RNIC中的数据写入至SCM中,该数据需要经过计算设备的处理器,RNIC中的数据写入到SCM可以有如图1所示的两种数据流向。第一种数据流向包括:①数据被RNIC写入到处理器的IIO中,②处理器的IIO将数据写入到处理器的iMC中,处理器的iMC将数据缓存在iMC的异步动态随机存取存储器刷新区(asynchronous DRAM refresh,ADR),③处理器的iMC将数据写入到SCM中。第二种数据流向包括:①数据被RNIC写入到处理器的IIO中,②处理器的IIO将数据写入到处理器的LLC中,③处理器的LLC中的数据被刷至处理器的iMC中,处理器的iMC将数据缓存在iMC的ADR中,④处理器的iMC将数据写入到SCM中。由于iMC的ADR具有掉电不丢失的特性,一般情况下,数据被写入到ADR中后可以被认为是完成了内存持久化。In a computing device that uses SCM as a memory, to write data in RNIC to SCM, the data needs to pass through the processor of the computing device. Data written in RNIC to SCM can have two types of data as shown in Figure 1. Flow direction. The first type of data flow includes: ① data is written to the processor's IIO by the RNIC, ② the processor's IIO writes the data to the processor's iMC, and the processor's iMC caches the data in the asynchronous dynamic random storage of the iMC Take the memory refresh area (asynchronous DRAM refresh, ADR), ③ the processor's iMC writes the data to the SCM. The second type of data flow includes: ① data is written to the processor's IIO by RNIC, ② the processor's IIO writes the data to the processor's LLC, and ③ the data in the processor's LLC is flushed to the processor's In iMC, the processor's iMC caches the data in the iMC's ADR, and ④ the processor's iMC writes the data to the SCM. Because iMC's ADR has the feature of not losing after power failure, under normal circumstances, after data is written into the ADR, it can be considered as completing the memory persistence.
由图1可知,数据在处理器中有两种可能的数据流向,其中,数据在处理器中的流向由处理器架构特性决定。在一些处理器架构中,数据在处理器中的流向与IIO中的I/O数据直接访问(data direct I/O,DDIO)功能有关,如果IIO中的DDIO功能开启,则数据在处理器中的流向为上述第二种数据流向;如果IIO中的DDIO的功能未开启,则数据在处理器中的流向为上述第一种数据流向。在另一些处理器架构(如因 特尔的Skylake架构)中,数据在处理器中的流向与IIO中的DDIO功能和RNIC发送给IIO的报文有关,如果RNIC发送给IIO的报文中的无监视(no-snoop,NS)标记位为1或IIO中的DDIO功能未开启,则数据在处理器中的流向为上述第一种数据流向;如果RNIC发送给IIO的报文中的NS标记位不为1,并且,处理器的DDIO功能开启,则数据在处理器中的流向为上述第二种数据流向。It can be known from FIG. 1 that there are two possible data flows in the processor. Among them, the flow of data in the processor is determined by the characteristics of the processor architecture. In some processor architectures, the flow of data in the processor is related to the I / O data direct I / O (DDIO) function. If the DDIO function in IIO is enabled, the data is in the processor. The flow direction is the above-mentioned second data flow direction; if the DDIO function in IIO is not enabled, the data flow in the processor is the above-mentioned first data flow direction. In other processor architectures (such as Intel ’s Skylake architecture), the data flow in the processor is related to the DDIO function in IIO and the message sent by RNIC to IIO. If the message sent by RNIC to IIO is The no-snoop (NS) flag bit is 1 or the DDIO function in IIO is not enabled, the data flow in the processor is the first data flow direction described above; if the RNIC sends an NS flag in the message to IIO If the bit is not 1, and the DDIO function of the processor is enabled, the data flow in the processor is the second data flow direction described above.
以SCM作为内存的计算设备所形成的存储系统可包括多台以SCM作为内存的计算设备,在该存储系统中,两台计算设备之间的远程访问由计算设备的RNIC实现。两台计算设备可划分为两种角色,本地设备和远端设备,其中,“本地”和“远端”是两个相对的概念,本地设备指的是发起RDMA请求的计算设备,即请求访问另一台计算设备的计算设备。远端设备指的是接收RDMA请求的计算设备,即被另一台计算设备访问的计算设备。本地设备对远端设备的访问可以为本地设备向远端设备中写入数据,具体地,本地设备通过本地设备的RNIC将本地设备中的数据传输给远端设备的RNIC,远端设备通过远端设备的RNIC接收该数据,从而将本地设备中的数据传输到远端设备中。本地设备对远端设备的访问也可以为本地设备从远端设备中读取数据,具体地,本地设备可通过本地设备的RNIC读取远端设备的SCM中的数据,远端设备通过远端设备的RNIC将本地设备要读取的数据发送给本地设备的RNIC,本地设备的RNIC接收该数据从而完成对远端设备中的数据的读取。A storage system formed by a computing device using SCM as a memory may include multiple computing devices using SCM as a memory. In this storage system, remote access between two computing devices is implemented by the RNIC of the computing device. Two computing devices can be divided into two roles, local device and remote device. Among them, "local" and "remote" are two opposite concepts. Local device refers to the computing device that initiates the RDMA request, that is, to request access. Computing device of another computing device. A remote device refers to a computing device that receives an RDMA request, that is, a computing device that is accessed by another computing device. The access of the local device to the remote device can be the writing of data from the local device to the remote device. Specifically, the local device transmits the data in the local device to the RNIC of the remote device through the RNIC of the local device. The RNIC of the end device receives the data, thereby transmitting the data in the local device to the remote device. The access of the local device to the remote device may also be that the local device reads data from the remote device. Specifically, the local device can read the data in the SCM of the remote device through the RNIC of the local device. The RNIC of the device sends the data to be read by the local device to the RNIC of the local device, and the RNIC of the local device receives the data to complete reading the data in the remote device.
在两台计算设备中实现远程内存持久化,针对于上述两种不同的数据流向,在一些设计方案中,有不同的远程内存持久流程。Remote memory persistence is implemented in two computing devices. For the above two different data flows, in some design schemes, there are different remote memory persistence processes.
对于第一种数据流向,数据不经过处理器的LLC,远程内存持久化的流程如图2所示,包括如下步骤:For the first type of data flow, the data does not pass through the LLC of the processor. The remote memory persistence process is shown in Figure 2, including the following steps:
S201,本地设备的处理器向本地设备的RNIC发起第一RDMA写请求。S201. The processor of the local device initiates a first RDMA write request to the RNIC of the local device.
S202,本地设备的RNIC处理第一RDMA写请求,根据第一RDMA写请求生成第二RDMA写请求。S202. The RNIC of the local device processes the first RDMA write request, and generates a second RDMA write request according to the first RDMA write request.
S203,本地设备的RNIC将第二RDMA写请求发送给远端设备的RNIC,远端设备的RNIC接收第二RDMA写请求。S203: The RNIC of the local device sends a second RDMA write request to the RNIC of the remote device, and the RNIC of the remote device receives the second RDMA write request.
S204,远端设备的RNIC将第二RDMA写请求中的数据通过DMA的方式写入到SCM中。S204. The RNIC of the remote device writes the data in the second RDMA write request to the SCM in a DMA manner.
这里,第二RDMA数据写请求中的数据在远端设备中的数据流向为第一种数据流向,由于数据在总线上传输,存在一定的时延,由于总线被占用等原因,部分数据可能被缓存在IIO等处于外设总线链路上的中间介质的缓存中。Here, the data flow direction of the data in the second RDMA data write request in the remote device is the first data flow direction. Because the data is transmitted on the bus, there is a certain delay. Due to the bus being occupied, some data may be The buffer is in the buffer of the intermediate medium such as IIO on the peripheral bus link.
S205,远端设备的RNIC向本地设备的RNIC发送ACK,本地设备的RNIC接收ACK。S205: The RNIC of the remote device sends an ACK to the RNIC of the local device, and the RNIC of the local device receives the ACK.
S206,本地设备的RNIC生成第一RDMA写请求对应的CQE。S206. The RNIC of the local device generates a CQE corresponding to the first RDMA write request.
S207,本地设备的处理器获取第一RDMA写请求对应的CQE,确定数据写入完成。S207: The processor of the local device obtains a CQE corresponding to the first RDMA write request, and determines that data writing is completed.
S208,本地设备的处理器向本地设备的RNIC发起第一RDMA读请求。S208. The processor of the local device initiates a first RDMA read request to the RNIC of the local device.
S209,本地设备的RNIC处理第一RDMA读请求,根据第一RDMA读请求生成第二RDMA读请求。S209. The RNIC of the local device processes the first RDMA read request, and generates a second RDMA read request according to the first RDMA read request.
S210,本地设备的RNIC将第二RDMA读请求发送给远端设备的RNIC,远端设备的RNIC接收第二RDMA读请求。S210: The RNIC of the local device sends a second RDMA read request to the RNIC of the remote device, and the RNIC of the remote device receives the second RDMA read request.
S211,远端设备的RNIC通过DMA的方式从SCM中读取数据。S211. The RNIC of the remote device reads data from the SCM through DMA.
由于在外设总线协议(如PCI协议、PCIe协议)中规定,在任意一个读操作被执行之前,需要完成在读操作之前的所有写操作,因此,通过DMA的方式从SCM中读取数据可以将缓存在IIO等外设总线链路上的中间介质中的数据写入SCM中。那么,通过DMA读取操作,可将可能还缓存在中间介质中的第一RDMA写请求中的数据写入SCM中。As specified in the peripheral bus protocol (such as the PCI protocol and the PCIe protocol), before any read operation is performed, all write operations before the read operation need to be completed. Therefore, reading data from the SCM through DMA can cache the data. The data in the intermediate medium on the peripheral bus link such as IIO is written into the SCM. Then, through the DMA read operation, data in the first RDMA write request that may also be buffered in the intermediate medium may be written into the SCM.
S212,远端设备的RNIC远端设备的RNIC向本地设备的RNIC发送ACK,本地设备的RNIC接收ACK。S212: RNIC of the remote device The RNIC of the remote device sends an ACK to the RNIC of the local device, and the RNIC of the local device receives the ACK.
S213,本地设备的RNIC生成第一RDMA读请求对应的CQE。S213: The RNIC of the local device generates a CQE corresponding to the first RDMA read request.
S214,本地设备的处理器获取第一RDMA读请求对应的CQE,确定数据的内存持久化完成。S214. The processor of the local device obtains a CQE corresponding to the first RDMA read request, and determines that the data persistence is completed.
对于第二种数据流向,数据经过处理器的LLC,远程内存持久化的流程如图3所示,包括如下步骤:For the second type of data flow, the data passes through the LLC of the processor, and the remote memory persistence process is shown in Figure 3, including the following steps:
S301,本地设备的处理器向本地设备的RNIC发起第一RDMA写请求。S301. The processor of the local device initiates a first RDMA write request to the RNIC of the local device.
S302,本地设备的RNIC处理第一RDMA写请求,根据第一RDMA写请求生成第二RDMA写请求。S302. The RNIC of the local device processes the first RDMA write request, and generates a second RDMA write request according to the first RDMA write request.
S303,本地设备的RNIC将第二RDMA写请求发送给远端设备的RNIC,远端设备的RNIC接收第二RDMA写请求。S303. The RNIC of the local device sends a second RDMA write request to the RNIC of the remote device, and the RNIC of the remote device receives the second RDMA write request.
S304,远端设备的RNIC将第二RDMA写请求中的数据通过DMA的方式写入到SCM中。S304. The RNIC of the remote device writes the data in the second RDMA write request to the SCM through DMA.
第二RDMA写请求中的数据远端设备的数据流向为第二种数据流向,由于数据要经过LLC,所以在数据被写入到SCM的过程中,数据被缓存在LLC中。Data in the second RDMA write request The data flow direction of the remote device is the second data flow direction. Since the data passes through the LLC, the data is buffered in the LLC during the process of writing the data to the SCM.
S305,远端设备的RNIC向本地设备的RNIC发送ACK,本地设备的RNIC接收ACK。S305: The RNIC of the remote device sends an ACK to the RNIC of the local device, and the RNIC of the local device receives the ACK.
S306,本地设备的RNIC生成第一RDMA写请求对应的CQE。S306. The RNIC of the local device generates a CQE corresponding to the first RDMA write request.
S307,本地设备的处理器获取第一RDMA写请求对应的CQE,确定数据写入完成。S307. The processor of the local device obtains a CQE corresponding to the first RDMA write request, and determines that data writing is completed.
S308,本地设备的处理器向本地设备的RNIC发起第一RDMA发送请求,第一RDMA发送请求中携带持久化标记。S308. The processor of the local device initiates a first RDMA transmission request to the RNIC of the local device, and the first RDMA transmission request carries a persistence flag.
S309,本地设备的RNIC处理第一RDMA发送请求,根据第一RDMA发送请求生成第二RDMA发送请求,第二RDMA请求中携带持久化标记以及数据的目的虚拟内存地址。S309: The RNIC of the local device processes the first RDMA transmission request, generates a second RDMA transmission request according to the first RDMA transmission request, and the second RDMA request carries a persistent mark and a destination virtual memory address of the data.
S310,本地设备的RNIC向远端设备的RNIC发送第二RDMA发送请求。S310: The RNIC of the local device sends a second RDMA sending request to the RNIC of the remote device.
S311,远端设备的处理器向远端设备的RNIC发起第一RDMA接收请求。S311. The processor of the remote device initiates a first RDMA reception request to the RNIC of the remote device.
S312,远端设备的RNIC生成第一RDMA接收请求对应的WQE。S312: The RNIC of the remote device generates a WQE corresponding to the first RDMA reception request.
S313,远端设备的RNIC根据第二RDMA发送请求生成第一RDMA接收请求对应的CQE。S313. The RNIC of the remote device generates a CQE corresponding to the first RDMA reception request according to the second RDMA transmission request.
S314,远端设备的RNIC向本地设备的RNIC发送ACK,本地设备的RNIC接收ACK。S314: The RNIC of the remote device sends an ACK to the RNIC of the local device, and the RNIC of the local device receives the ACK.
S315,本地设备的RNIC生成第一RDMA发送请求对应的CQE。S315. The RNIC of the local device generates a CQE corresponding to the first RDMA transmission request.
S316,本地设备的处理器获取第一RDMA发送请求对应的CQE。S316. The processor of the local device obtains a CQE corresponding to the first RDMA transmission request.
S317,本地设备的处理器向本地设备的RNIC发起第二RDMA接收请求。S317. The processor of the local device initiates a second RDMA reception request to the RNIC of the local device.
S318,本地设备的RNIC生成第二RDMA接收请求对应的WQE。S318: The RNIC of the local device generates a WQE corresponding to the second RDMA reception request.
S319,远端设备的处理器获取第一RDMA接收请求对应的CQE,确定第一RDMA接收请求对应的数据需要持久化。S319: The processor of the remote device obtains a CQE corresponding to the first RDMA reception request, and determines that data corresponding to the first RDMA reception request needs to be persisted.
S320,远端设备的处理器将对应地址的数据刷回到iMC的ADR中。S320. The processor of the remote device flashes the data of the corresponding address back to the AMC of the iMC.
S321,远端设备的处理器向RNIC发起第三RDMA发送请求,第三RDMA请求中携带数据冲刷完成的指示。S321. The processor of the remote device initiates a third RDMA transmission request to the RNIC, and the third RDMA request carries an indication of data flushing completion.
S322,远端设备的RNIC处理第三RDMA发送请求,根据第三RDMA发送请求生成第四RDMA发送请求,第四RDMA发送请求中携带数据冲刷完成的指示。S322. The RNIC of the remote device processes the third RDMA transmission request, generates a fourth RDMA transmission request according to the third RDMA transmission request, and the fourth RDMA transmission request carries an indication of data flushing completion.
S323,远端设备的RNIC向本地设备的RNIC发送第四RDMA发送请求,本地设备的RNIC接收第四RDMA发送请求。S323: The RNIC of the remote device sends a fourth RDMA transmission request to the RNIC of the local device, and the RNIC of the local device receives the fourth RDMA transmission request.
S324,本地设备的RNIC根据第四RDMA发送请求中的数据冲刷完成的指示生成第二RDMA接收请求对应的CQE。S324. The RNIC of the local device generates a CQE corresponding to the second RDMA reception request according to an indication of data flushing completion in the fourth RDMA transmission request.
S325,本地设备的处理器获取第二RDMA接收请求对应的CQE,确定数据的内存持久化完成。S325. The processor of the local device obtains a CQE corresponding to the second RDMA receiving request, and determines that the data persistence is completed.
可以看到,在图2与图3所示的远程内存持久化流程中,在本地设备发起RDMA写请求之后,为了确保RDMA写请求中的数据被写入SCM中,本地设备需要在发起RDMA写请求之后发起一次RDMA读请求或RDMA发送请求将可能还未写入到SCM的数据写回到SCM中,RDMA读请求和RDMA发送请求可以被称为RDMA持久化请求。由于本地设备在每次RDMA写请求之后都要发起一次RDMA持久化请求,每一次RDMA请求均会占用一定的带宽,额外的RDMA持久化请求增加了RDMA网络的网络负载。It can be seen that in the remote memory persistence processes shown in Figures 2 and 3, after the local device initiates an RDMA write request, in order to ensure that the data in the RDMA write request is written to the SCM, the local device needs to initiate an RDMA write After the request, an RDMA read request or an RDMA send request is initiated to write data that may not have been written to the SCM back to the SCM. The RDMA read request and RDMA send request can be called RDMA persistence requests. Since the local device initiates an RDMA persistence request after each RDMA write request, each RDMA request will occupy a certain bandwidth, and the additional RDMA persistence request increases the network load of the RDMA network.
本申请实施例提供一种数据处理的方法、RNIC以及设备,以解决上述图2和图3所示的远程内存持久化的流程中所存在的网络负载较大的问题。The embodiments of the present application provide a data processing method, an RNIC, and a device to solve the problem of large network load in the remote memory persistence process shown in FIG. 2 and FIG. 3.
本申请实施例可以适用于计算设备之间相互连接并且通过RDMA技术通信的计算机网络,该计算机网络可以如图4所示,计算设备与计算设备可以通过有线网络(例如以太网)连接并进行通信。在该计算机网络中,存在两种角色,分别为本地设备和远端设备,本申请实施例具体可以适用于本地设备和远端设备所形成的通信系统。本地设备和远端设备所形成的通信系统可以如图5所示,本地设备和远端设备的结构可以参见图1所示的计算设备,每个计算设备包括RNIC、处理器以及SCM,其中,本地设备和远端设备的定义可见前述描述,本地设备和远端设备通过各自的RNIC与对端计算设备进行数据传输。The embodiments of the present application can be applied to a computer network that interconnects computing devices and communicates through RDMA technology. The computer network can be as shown in FIG. 4. The computing device and the computing device can be connected and communicated through a wired network (such as Ethernet). . In this computer network, there are two roles, local device and remote device. The embodiments of the present application can be specifically applied to the communication system formed by the local device and the remote device. The communication system formed by the local device and the remote device can be as shown in FIG. 5, and the structure of the local device and the remote device can refer to the computing device shown in FIG. 1. Each computing device includes an RNIC, a processor, and an SCM. Among them, The definition of the local device and the remote device can be seen in the foregoing description. The local device and the remote device perform data transmission with the peer computing device through their respective RNICs.
本申请实施例通过对RNIC的结构以及远程内存持久化流程进行改进,减轻远程内存持久化过程所需的网络负载。The embodiments of the present application reduce the network load required by the remote memory persistence process by improving the structure of the RNIC and the remote memory persistence process.
首先,介绍本申请实施例提供的RNIC的一种实现方式,相较于一般的RNIC,本申请实施例的RNIC还包括持久化内存(persistent memory,PM)模块,该PM模块所实现的功能可以由门阵列组成的硬件电路实现,也可以由运行在RNIC中的软件程序实现。在RNIC中的控制部件的控制方式为以微存储为核心的微程序控制方式的情况下,该PM模块所实现的功能可以由运行在RNIC中的软件程序实现;在RNIC的控制部件的控制方式为以逻辑布线结构为主的控制方式的情况下,该PM模块所实现的功能可以由门阵列组成的硬件电路实现。PM模块用于执行与内存持久化有关的操作。其中,PM模块可以根据接收到的RDMA写请求中所携带的数据和参数判定该RDMA写请求中的数据是否需要进行内存持久化,并在根据该参数确定RDMA写请求中的数据需要进行内存持久化的情况下,直接指示处理器将该RDMA写请求中将该数据写入到计算设备的非易失性存储器中,完成对该数据的内存持久化。First, an implementation manner of the RNIC provided by the embodiment of the present application is introduced. Compared with a general RNIC, the RNIC of the embodiment of the present application further includes a persistent memory (PM) module. The functions implemented by the PM module can be The hardware circuit composed of the gate array can also be implemented by a software program running in the RNIC. In the case where the control method of the control component in RNIC is a microprogram control method with micro storage as the core, the functions implemented by the PM module can be implemented by a software program running in the RNIC; the control method of the control component in the RNIC In the case of a control method mainly based on a logical wiring structure, the functions implemented by the PM module can be implemented by a hardware circuit composed of a gate array. The PM module is used to perform operations related to memory persistence. Among them, the PM module can determine whether the data in the RDMA write request needs to be persisted according to the data and parameters carried in the received RDMA write request, and determine that the data in the RDMA write request needs to be persisted according to the parameters. In the case of data storage, the processor is directly instructed to write the data in the RDMA write request to the non-volatile memory of the computing device to complete the memory persistence of the data.
本申请实施例中,非易失性存储器可以为SCM。SCM具体可以为PRAM,PRAM例如可以为3D Xpoint,ReRAM,MRAM,等等。In the embodiment of the present application, the non-volatile memory may be an SCM. The SCM may specifically be PRAM, and PRAM may be, for example, 3D Xpoint, ReRAM, MRAM, and so on.
图6是本申请实施例提供的RNIC60的一种实现方式的结构示意图,该RNIC可以作为远端设备的RNIC被使用,也可以作为本地设备的RNIC被使用。如图2所示,RNIC60可包括接收模块601、调度模块602、发送模块603以及持久化内存模块604。FIG. 6 is a schematic structural diagram of an implementation manner of an RNIC 60 provided by an embodiment of the present application. The RNIC can be used as an RNIC of a remote device or an RNIC of a local device. As shown in FIG. 2, the RNIC 60 may include a receiving module 601, a scheduling module 602, a sending module 603, and a persistent memory module 604.
接收模块601用于接收外部计算设备发送的消息。在该RNIC60作为本地设备的RNIC的情况下,接收模块601可以用于接收远端设备的RNIC发送的请求或数据。本申请实施例中,在该RNIC60作为本地设备的RNIC的情况下,接收模块601可以用于执行图9-图11所示的方法实施例中第二RNIC与第一RNIC的交互流程中第二RNIC执行的接收操作。在该RNIC60作为远端设备的RNIC的情况下,接收模块601可以用于接收本地设备的RNIC发送的请求或数据。本申请实施例中,在该RNIC作为远端设备的RNIC的情况下,接收模块601可以用于执行图9-图11所示的方法实施例中第一RNIC与第二RNIC的交互流程中第一RNIC执行的接收操作。The receiving module 601 is configured to receive a message sent by an external computing device. In the case that the RNIC 60 is the RNIC of the local device, the receiving module 601 may be configured to receive a request or data sent by the RNIC of the remote device. In the embodiment of the present application, when the RNIC 60 is used as the RNIC of the local device, the receiving module 601 may be configured to execute the second interaction process between the second RNIC and the first RNIC in the method embodiment shown in FIG. 9 to FIG. 11. Receive operation performed by RNIC. In a case where the RNIC 60 is an RNIC of a remote device, the receiving module 601 may be configured to receive a request or data sent by the RNIC of the local device. In the embodiment of the present application, when the RNIC is used as the RNIC of the remote device, the receiving module 601 may be used to execute the first interaction process between the first RNIC and the second RNIC in the method embodiment shown in FIG. 9 to FIG. 11. A receive operation performed by an RNIC.
发送模块603用于向外部计算设备发送消息。在该RNIC60作为本地设备的RNIC的情况下,发送模块603可以用于向远端设备的RNIC发送请求或数据。本申请实施例中,该RNIC60作为本地设备的RNIC的情况下,发送模块603可以用于执行图9-图11所示的方法实施例中第二RNIC与第一RNIC的交互流程中第二RNIC执行的发送操作。在该RNIC60作为远端设备的RNIC的情况下,发送模块603可以用于向本地设备的RNIC发送请求或数据。本申请实施例中,该RNIC60作为远端设备的RNIC的情况下,发送模块603可以用于执行图9-图11所示的方法实施例中第二RNIC与第一RNIC的交互流程中第一RNIC执行的发送操作。The sending module 603 is configured to send a message to an external computing device. In the case that the RNIC 60 is the RNIC of the local device, the sending module 603 may be configured to send a request or data to the RNIC of the remote device. In the embodiment of the present application, when the RNIC60 is used as the RNIC of the local device, the sending module 603 may be used to execute the second RNIC in the interaction process between the second RNIC and the first RNIC in the method embodiment shown in FIG. 9 to FIG. 11. The send operation performed. When the RNIC 60 is used as the RNIC of the remote device, the sending module 603 may be used to send a request or data to the RNIC of the local device. In the embodiment of the present application, when the RNIC60 is used as the RNIC of the remote device, the sending module 603 may be used to execute the first interaction process between the second RNIC and the first RNIC in the method embodiment shown in FIG. 9 to FIG. 11. Send operation performed by RNIC.
调度模块602用于与计算设备的处理器通信并进行相应的数据处理。在该RNIC60作为本地设备的RNIC的情况下,调度模块602用于与本地设备的处理器通信并进行数据处理。本申请实施例中,该RNIC60作为本地设备的RNIC的情况下,调度模块602可以用于执行图9-图11所示的方法实施例中第二RNIC执行的与第二处理器交互的操作或与调度有关的操作。在该RNIC60作为远端设备的RNIC的情况下,调度模块602用于与远端设备的处理器通信并进行数据处理。本申请实施例中,该RNIC60作为远端设备的RNIC的情况下,调度模块602可以用于执行图9-图11所示的方法实 施例中第一RNIC执行的与第一处理器交互的操作或与调度有关的操作。The scheduling module 602 is configured to communicate with a processor of a computing device and perform corresponding data processing. When the RNIC 60 is an RNIC of a local device, the scheduling module 602 is configured to communicate with a processor of the local device and perform data processing. In the embodiment of the present application, when the RNIC 60 is used as the RNIC of the local device, the scheduling module 602 may be configured to perform an operation performed by the second RNIC in the method embodiment shown in FIG. 9 to FIG. 11 to interact with the second processor or Operations related to scheduling. When the RNIC 60 is used as the RNIC of the remote device, the scheduling module 602 is configured to communicate with the processor of the remote device and perform data processing. In the embodiment of the present application, when the RNIC60 is used as the RNIC of the remote device, the scheduling module 602 may be configured to perform operations performed by the first RNIC in the method embodiments shown in FIG. 9 to FIG. 11 to interact with the first processor. Or scheduling-related operations.
PM模块604用于执行与内存持久化有关的操作。本申请实施例中,对于上述第一种数据流向,PM模块604用于在接收模块601接收到RDMA写请求的情况下,判断该RDMA写请求中的第一数据是否为待持久化的数据,在确定该RDMA写请求中的第一数据为待持久化的数据的情况下,在该RDMA写请求对应的SQ中添加RDMA LSB读请求,以便调度模块603可以将该RDMA LSB读请求发送给处理器,以使处理器将缓存在外设总线链路上的数据写入非易失性存储器中,从而将第一数据写入至非易失性存储器中。本申请实施例中,对于上述第二种数据流向,PM模块604用于在接收模块601接收到RDMA写请求的情况下,判断该RDMA写请求中的第一数据是否为待持久化的数据,在确定该RDMA写请求中的第一数据为待持久化的数据的情况下,根据该RDMA写请求中的第一数据生成与RDMA接收请求对应的WQE,然后清除该WQE,再生成该RDMA接收请求对应的CQE,从而使得该RNIC60对应的处理器(即远端设备的处理器)可以根据该CQE将缓存在易失性存储介质中的第一数据写回到非易失性存储器中。The PM module 604 is used to perform operations related to memory persistence. In the embodiment of the present application, for the above-mentioned first data flow, the PM module 604 is configured to determine whether the first data in the RDMA write request is data to be persisted when the receiving module 601 receives the RDMA write request. When it is determined that the first data in the RDMA write request is data to be persisted, an RDMA LSB read request is added to the SQ corresponding to the RDMA write request, so that the scheduling module 603 can send the RDMA LSB read request to the processing A processor, so that the processor writes the data buffered on the peripheral bus link into the non-volatile memory, thereby writing the first data into the non-volatile memory. In the embodiment of the present application, for the above-mentioned second data flow, the PM module 604 is configured to determine whether the first data in the RDMA write request is data to be persisted when the receiving module 601 receives the RDMA write request. When it is determined that the first data in the RDMA write request is data to be persisted, a WQE corresponding to the RDMA reception request is generated according to the first data in the RDMA write request, and then the WQE is cleared, and then the RDMA reception is generated The corresponding CQE is requested, so that the processor corresponding to the RNIC 60 (that is, the processor of the remote device) can write the first data buffered in the volatile storage medium back to the non-volatile memory according to the CQE.
在一种可能的实现方式中,上述接收模块601、调度模块602、发送模块603以及持久化内存模块604所实现的功能可以由运算逻辑部件701、寄存器702、控制部件703以及输入输出接口704相互配合实现,如图7所示,运算逻辑部件701、寄存器702、控制部件703以及输入输出接口704可以通过一个或多个内部总线705连接。运算逻辑部件701可以用于执行运算命令,如加命令、减命令、乘命令、除命令,等等;逻辑运算部件701还可以用于获取逻辑命令,如或逻辑命令、与逻辑命令、非逻辑命令,等等;逻辑运算部件701还可以用于从控制部件703获取控制信号,根据获取到的控制信号从寄存器702中获取该控制信号对应的数据并执行相应的操作。寄存器702为一种存储空间较小的存储器,寄存器702可以用于保存各种指令;寄存器702还可以用于存储在指令执行过程中临时存放的寄存器操作数和中间或最终的操作结果;寄存器还可以用于存储逻辑运算部件701完成控制部件703请求的任务所使用的数据。控制部件703用于对寄存器中保存的指令进行译码,并发出为完成每条指令所要执行的各个操作的控制信号;控制部件703的控制方式可以有两种,一种为以微存储为核心的微程序控制方式,微程序可以保存在寄存器702中,另一种为以逻辑硬布线结构为主的硬件控制方式,控制部件703例如可以由各种与或门阵列组成。输入输出接口704用于发送或接收数据,输入输出接口704可以有多个,其可以分别用于接收处理器发送数据或发送数据给处理器,或者,用于接收外部计算设备发送的数据或发送数据给外部计算设备。In a possible implementation manner, the functions implemented by the receiving module 601, the scheduling module 602, the sending module 603, and the persistent memory module 604 may be performed by the operation logic component 701, the register 702, the control component 703, and the input / output interface 704. In cooperation, as shown in FIG. 7, the operation logic component 701, the register 702, the control component 703, and the input / output interface 704 may be connected through one or more internal buses 705. The operation logic unit 701 can be used to execute operation commands, such as addition, subtraction, multiplication, division, and so on; the operation logic unit 701 can also be used to obtain logical commands such as OR logic commands, AND logic commands, and non-logic Commands, etc .; the logic operation unit 701 may be further configured to obtain a control signal from the control unit 703, obtain data corresponding to the control signal from the register 702 according to the obtained control signal, and perform a corresponding operation. Register 702 is a kind of memory with a small storage space. Register 702 can be used to store various instructions. Register 702 can also be used to store register operands and intermediate or final operation results temporarily stored during instruction execution. It may be used to store data used by the logic operation unit 701 to complete the task requested by the control unit 703. The control unit 703 is used to decode the instructions stored in the register and send out control signals to complete each operation to be performed by each instruction. There are two control modes of the control unit 703. One is to use micro memory as the core. The micro-program control method can be stored in the register 702, and the other is a hardware control method mainly based on a logical hard-wired structure. The control unit 703 can be composed of various AND-OR arrays, for example. The input-output interface 704 is used to send or receive data. There may be multiple input-output interfaces 704, which can be used to receive data sent by the processor or send data to the processor, or used to receive data or send data from an external computing device. Data to external computing devices.
可选地,RNIC还可以包括晶体振荡器、媒体接入控制器、物理接口收发器,等等,本申请实施例不作限制。Optionally, the RNIC may further include a crystal oscillator, a media access controller, a physical interface transceiver, and the like, which are not limited in the embodiments of the present application.
基于前述图6和图7对应的实施例描述的RNIC,可以实现本申请实施例的数据处理方法。Based on the RNIC described in the foregoing embodiments corresponding to FIG. 6 and FIG. 7, the data processing method in this embodiment of the present application can be implemented.
在介绍本申请实施例的数据处理方法之前,为便于理解,首先对本申请实施例所涉及的一些概念进行介绍。Before introducing the data processing method of the embodiment of the present application, for ease of understanding, some concepts involved in the embodiment of the present application are first introduced.
1、RDMA操作的概念1.The concept of RDMA operation
1)RDMA单边操作1) RDMA unilateral operation
RDMA单边操作是指本地设备上的应用访问远端设备的内存时,只有本地设备的处理器参与,不需要远端设备的处理器参与,即只有一边的处理器在工作。只要本地设备明确数据的源地址和目的地址,即可完成对远端设备的内存中的数据的读写操作。本申请实施例中,主要涉及的RDMA单边操作包括RDMA写操作(RDMA-Write)。RDMA unilateral operation means that when the application on the local device accesses the memory of the remote device, only the processor of the local device participates, and the processor of the remote device is not required to participate, that is, only one side of the processor is working. As long as the local device knows the source and destination addresses of the data, it can complete the reading and writing of data in the memory of the remote device. In the embodiment of the present application, the RDMA unilateral operation mainly involves an RDMA write operation (RDMA-Write).
2)RDMA双边操作2) RDMA bilateral operation
RDMA双边操作是指本地设备上的应用访问远端设备的内存时,需要本地设备的处理器和远端设备的处理器均参与,即两个设备的处理器均在工作。本申请实施例,主要涉及的RDMA双边操作包括RDMA发送操作(RDMA-Send)和RDMA接收操作(RDMA-Receive)。如果本地设备要通过RDMA发送操作将数据传输到远端设备的内存中,远端设备必须先发起RDMA接收操作,以用于接收本地设备发起的RDMA发送操作。RDMA bilateral operation means that when the application on the local device accesses the memory of the remote device, the processor of the local device and the processor of the remote device are required to participate, that is, the processors of both devices are working. In the embodiment of the present application, the RDMA bilateral operations mainly involved include an RDMA send operation (RDMA-Send) and an RDMA receive operation (RDMA-Receive). If the local device is to transmit data to the memory of the remote device through the RDMA send operation, the remote device must first initiate an RDMA receive operation to receive the RDMA send operation initiated by the local device.
2、队列的概念2.The concept of queues
本申请实施例中的队列与socket通信中的消息队列概念类似,可以理解是为了进行异步处理而设置的用于存放各种信息或数据的容器。The queue in the embodiment of the present application is similar to the concept of a message queue in socket communication. It can be understood that the queue is a container for storing various information or data for asynchronous processing.
1)工作队列(work queue,WQ)1) Work queue (work queue, WQ)
在RDMA技术中,当两个计算设备需要通信时,会在两个计算设备的RNIC之间建立通道(channel)连接,每条通道的首尾端点是两对队列队(queue pairs,QP)。示例性地,两台计算设备的RNIC之间的通道可以如图8所示,每对QP由发送队列(send queue,SQ)和RQ组成,SQ和RQ中管理着各种类型的消息。QP被直接映射到计算设备的应用(client)的虚拟地址空间,使得计算设备中的应用可以直接通过它访问RNIC。SQ和RQ都可以被称之为WQ,对于要发送数据的计算设备来说,WQ为SQ;对于要接收数据的计算设备来说,WQ为RQ。In RDMA technology, when two computing devices need to communicate, a channel connection is established between the RNICs of the two computing devices, and the end and end of each channel are two pairs of queues (QPs). Exemplarily, the channel between the RNICs of the two computing devices may be as shown in FIG. 8. Each pair of QPs is composed of a send queue (SQ) and RQ. SQ and RQ manage various types of messages. The QP is directly mapped to the virtual address space of the application (client) of the computing device, so that the application in the computing device can directly access the RNIC through it. Both SQ and RQ can be called WQ. For a computing device to send data, WQ is SQ; for a computing device to receive data, WQ is RQ.
计算设备中的应用可以创建工作请求(work request,WR)以利用WR通知QP中的某个WQ,WR中描述了应用的远程操作请求(如远程读操作请求、远程写操作请求等),使得计算设备的RNIC可以确定要处理调度以及执行的操作。在WQ中,WR被转换为WQE的格式,等待RNIC对其进行调度解析。例如,计算设备A的应用希望将存储于地址A的内容传输到地址B中(地址A为计算机A中的地址,地址B为计算机B中的地址),该应用则通过WR将地址A、地址B以及写指令告知计算机A的RNIC,计算机A的RNIC在SQ中添加WQE,WQE中包括该地址A、地址B以及该写指令。An application in a computing device can create a work request (WR) to notify a certain WQ in QP by using WR. WR describes the remote operation request of the application (such as remote read operation request, remote write operation request, etc.), so that The RNIC of the computing device can determine the operations to be scheduled and executed. In WQ, WR is converted to WQE format, waiting for RNIC to schedule and analyze it. For example, the application of computing device A wants to transfer the content stored in address A to address B (address A is the address in computer A, address B is the address in computer B), and the application uses address WR to address A and address B and the write instruction inform the RNIC of computer A. The RNIC of computer A adds WQE to the SQ. The WQE includes the address A, address B, and the write instruction.
计算设备的应用可以通过向该计算设备的RNIC发送RDMA请求作为WR,该计算设备的RNIC接收到该RDMA请求后,在WQ中添加该RDMA请求对应的WQE。对于发送类型的RDMA请求(如RDMA读请求、RDMA写请求、RDMA发送请求),该发送类型的RDMA请求对应的WQE被添加到SQ中,如图8所示;对于接收类型的RDMA请求(如RDMA接收请求),该接收类型的RDMA请求对应的WQE被添加到RQ中。An application of a computing device may send an RDMA request as a WR to the RNIC of the computing device. After receiving the RDMA request, the RNIC of the computing device adds a WQE corresponding to the RDMA request to the WQ. For RDMA requests of the transmission type (such as RDMA read requests, RDMA write requests, and RDMA transmission requests), the WQE corresponding to the RDMA requests of this transmission type is added to the SQ, as shown in Figure 8; for RDMA requests of the reception type (such as RDMA reception request), the WQE corresponding to the RDMA request of the reception type is added to the RQ.
2)完成队列(completion queue entry,CQ)2) completion queue (completion queue entry, CQ)
除了QP外,RDMA技术中还有一种队列,为完成队列(completion queue entry,CQ),CQ用于存放某个操作对应的完成事件,以告知上层应用。例如,计算机A的应用希望将存储于地址A的内容传输到地址B中(地址A为计算机A中的地址,地址B为计算机B中的地址),计算机A的RNIC将地址A的内容发送给了计算机B的RNIC并确定计算机B的RNIC收到了该内容,则计算机A的RNIC生成CQE,计算机A的应用获取到该CQE后,确定将存储于地址A的内容传输到地址B这一事件完成。In addition to QP, there is a queue in RDMA technology. For completion queue (completion queue entry, CQ), CQ is used to store the completion event corresponding to an operation to inform the upper-layer application. For example, the application of computer A wants to transfer the content stored in address A to address B (address A is the address in computer A, address B is the address in computer B), and the RNIC of computer A sends the content of address A to Computer B ’s RNIC and determined that computer B ’s RNIC received the content, computer A ’s RNIC generates a CQE, and after computer A ’s application obtains the CQE, it determines that the content stored in address A is transferred to address B. .
接下来,介绍本申请实施例的数据处理方法。参见图9,图9是本申请实施例提供的一种数据处理方法的流程示意图,其中,第一RNIC和第一处理器分别为第一设备的RNIC和处理器,第一设备为远端设备;第二RNIC和第二处理器分别为第二设备的RNIC和处理器,第二设备为本地设备。如图所示,该方法包括:Next, a data processing method according to an embodiment of the present application is described. Referring to FIG. 9, FIG. 9 is a schematic flowchart of a data processing method according to an embodiment of the present application. The first RNIC and the first processor are the RNIC and the processor of the first device, respectively, and the first device is the remote device. ; The second RNIC and the second processor are the RNIC and the processor of the second device, respectively, and the second device is a local device. As shown, the method includes:
S801,第二处理器向第二RNIC发送RDMA写持久化请求,第二RNIC接收RDMA写持久化请求,RDMA写持久化请求包括数据持久化标记。S801. The second processor sends an RDMA write persistence request to the second RNIC. The second RNIC receives the RDMA write persistence request. The RDMA write persistence request includes a data persistence flag.
RDMA写持久化请求可以理解为第二处理器创建的一个WR,用于描述第二处理器的远程操作请求。本申请实施例中,RDMA写持久化请求用于请求将第一数据存储至第一设备的非易失性存储器中,以完成对第一数据的内存持久化。The RDMA write persistence request can be understood as a WR created by the second processor and is used to describe the remote operation request of the second processor. In the embodiment of the present application, the RDMA write persistence request is used to request that the first data be stored in the non-volatile memory of the first device to complete the memory persistence of the first data.
该RDMA写持久化请求可以包括RDMA操作指令、第一数据的源虚拟内存地址以及第一数据的目的虚拟内存地址。第一数据的源虚拟内存地址为第一数据在第二设备中的虚拟内存地址,该源虚拟内存地址对应的存储空间用于在第二设备中存储该第一数据。第一数据的目的虚拟内存地址为第一设备中的虚拟内存地址,该目的虚拟内存地址对应的存储空间用于在第一数据写入第一设备后存储该第一数据。目的虚拟内存地址为第一处理器通过虚拟内存地址注册过程注册的虚拟内存地址。举例来对源虚拟内存地址和目的虚拟内存地址进行说明,例如,第一数据被保存在第一设备的虚拟内存地址A所对应的存储空间中,第二处理器要将第一数据保存到第二设备的虚拟地址内存地址B所对应的存储空间中,则第一数据的源虚拟地址为第一设备中的虚拟内存地址A,第二数据的目的虚拟内存地址为第二设备中的虚拟内存地址B。The RDMA write persistence request may include an RDMA operation instruction, a source virtual memory address of the first data, and a destination virtual memory address of the first data. The source virtual memory address of the first data is a virtual memory address of the first data in the second device, and a storage space corresponding to the source virtual memory address is used to store the first data in the second device. The destination virtual memory address of the first data is a virtual memory address in the first device, and a storage space corresponding to the destination virtual memory address is used to store the first data after the first data is written to the first device. The destination virtual memory address is a virtual memory address registered by the first processor through a virtual memory address registration process. For example, the source virtual memory address and the destination virtual memory address are described. For example, the first data is stored in a storage space corresponding to the virtual memory address A of the first device, and the second processor saves the first data to the first In the storage space corresponding to the virtual address memory address B of the two devices, the source virtual address of the first data is the virtual memory address A in the first device, and the destination virtual memory address of the second data is the virtual memory in the second device. Address B.
这里,虚拟内存地址是与计算设备的内存中的物理地址存在映射关系的逻辑存储地址,其用于在计算设备中实现程序之间的隔离以及保证程序的正常运行。在计算设备中,一个应用程序在经过编译后会形成多个子程序,这些子程序的地址通常都是从“0”开始的,子程序中的其他地址都是在相对于起始地址(即“0”)计算的,这些地址所形成的地址范围称为地址空间,地址空间中的地址为逻辑存储地址;这些地址对应到计算设备的内存的存储空间中,计算设备的内存的存储空间的地址所形成的地址范围称为内存空间,内存空间中的地址为物理地址。在多个子程序同时运行的情况下,这些子程序的地址都需要从地址“0”开始载入,由于计算机中为0的物理地址只有一个,那么有部分子程序必然无法从“0”开始载入,这样就使得地址空间内的逻辑存储地址和内存空间中的物理地址不一致,如子程序A需要从逻辑存储地址0载入,但实际载入时是从物理地址10开始载入,那么需要通过地址映射将地址空间内的逻辑存储地址转换为内存空间内与之对应的物理地址。Here, the virtual memory address is a logical storage address that has a mapping relationship with a physical address in the memory of the computing device, and is used to implement isolation between programs in the computing device and ensure the normal operation of the program. In a computing device, after an application is compiled, it will form multiple subroutines. The addresses of these subroutines usually start with "0", and other addresses in the subroutine are relative to the starting address (that is, " 0 ”) is calculated, the address range formed by these addresses is called the address space, and the addresses in the address space are logical storage addresses; these addresses correspond to the addresses of the storage space of the computing device ’s memory storage space The resulting address range is called memory space, and the addresses in the memory space are physical addresses. In the case of multiple subprograms running at the same time, the addresses of these subprograms need to be loaded from address "0". Because there is only one physical address of 0 in the computer, some subprograms cannot be loaded from "0". In this way, the logical storage address in the address space and the physical address in the memory space are inconsistent. For example, if subroutine A needs to be loaded from logical storage address 0, but actually loads from physical address 10, then Through the address mapping, the logical storage address in the address space is converted into the corresponding physical address in the memory space.
在RDMA技术中,第一处理器事先分配内存中的一段存储空间,该存储空间用于存储与RDMA操作有关的数据,然后通过虚拟地址地址注册过程将目标页表发送给RNIC,目标页表用于存储该存储空间对应的虚拟内存地址以及物理内存地址的对应关系,使得RNIC可以根据该目标页表确定用于存储与RDMA操作有关的数据的存储空间所对应的虚拟内存地址和物理内存地址,RNIC还可以根据该目标页表确定某一虚拟内存地址对应的物理内存地址。例如,处理器将物理地址为1001~3000的内存空间中的存储空间用于存储与RDMA操作有关的数据,其对应的虚拟内存地址为1~2000,示例性地,处理器将如表1所示的页表发送给RNIC。In RDMA technology, the first processor allocates a piece of storage space in memory in advance. This storage space is used to store data related to RDMA operations, and then sends the target page table to the RNIC through the virtual address address registration process. The correspondence between the virtual memory address and the physical memory address corresponding to the storage space allows the RNIC to determine the virtual memory address and the physical memory address corresponding to the storage space used to store data related to the RDMA operation according to the target page table. The RNIC can also determine the physical memory address corresponding to a virtual memory address according to the target page table. For example, the processor uses the storage space in the memory space with a physical address of 1001 to 3000 to store data related to the RDMA operation, and the corresponding virtual memory address is 1 to 2000. For example, the processor will be as shown in Table 1. The page table shown is sent to RNIC.
表1Table 1
虚拟内存地址Virtual memory address | 物理地址Physical address |
11 | 10011001 |
22 | 10011001 |
…... | …... |
20002000 | 30003000 |
在虚拟内存地址注册过程中,处理器除了将目标页表发送给RNIC外,处理器还指定注册的地址空间的访问标记,访问标记用于表示该地址空间对应的存储空间的属性,如地址空间对应的存储空间的属性为远程可读(remote read),即该存储空间是可以读取数据的存储空间,又如地址空间对应的存储空间的属性为远程可写(remote write),即该存储空间是可以写入数据的存储空间,又如地址空间对应的存储空间的属性为远程可读写,即该存储空间是既可以读取数据的存储空间,又可以写入数据的存储空间,等等。在虚拟内存地址注册过程中,处理器通过将地址空间的起始虚拟内存地址、地址空间的长度以及地址空间的访问标记发送给RNIC,以告知RNIC各个地址空间对应的存储空间的属性以及其包含的虚拟内存地址。举例来进行说明,例如,处理器将物理地址为1001~3000的内存空间中的存储空间用于存储与RDMA操作有关的数据,其对应的虚拟内存地址为1~2000,其中,处理器将虚拟内存地址为1~500的地址空间对应的存储空间指定为可读的存储空间,将501~1000的地址空间对应的存储空间指定为可写的存储空间,将1001~1500的地址空间对应的存储空间指定为可读并且可写的存储空间,则处理器可发送如表2所示的信息给RNIC。In the process of registering the virtual memory address, in addition to sending the target page table to the RNIC, the processor also specifies the access mark of the registered address space. The access mark is used to indicate the attributes of the storage space corresponding to the address space, such as the address space. The attribute of the corresponding storage space is remote read, that is, the storage space is a storage space that can read data, and if the attribute of the corresponding storage space of the address space is remote writable, that is, the storage Space is a storage space where data can be written, and the storage space corresponding to the address space is remotely readable and writable, that is, the storage space is a storage space that can both read data and write data, etc. Wait. During the process of registering the virtual memory address, the processor sends the initial virtual memory address of the address space, the length of the address space, and the access mark of the address space to the RNIC, so as to inform the RNIC of the attributes of the storage space corresponding to each address space and its contents. Virtual memory address. For example, for example, the processor uses the storage space in the memory space with a physical address of 1001 to 3000 to store data related to the RDMA operation, and the corresponding virtual memory address is 1 to 2000. A storage space corresponding to an address space with a memory address of 1 to 500 is designated as a readable storage space, a storage space corresponding to an address space of 501 to 1000 is designated as a writable storage space, and a storage space corresponding to an address space of 1001 to 1500 is designated The space is designated as a readable and writable storage space, and the processor may send the information shown in Table 2 to the RNIC.
表2Table 2
起始虚拟内存地址Starting virtual memory address | 地址空间的长度Address space length | 访问标记Access token |
11 | 500500 | 可读Readable |
501501 | 500500 | 可写Writable |
10011001 | 500500 | 可读并且可写Readable and writable |
在进行虚拟内存地址注册过程之后,第一设备可以将虚拟内存地址注册过程中注册的地址空间的相关信息通过RDMA发送操作发送给第二设备,以使第二设备可以获知第一设备中的用于存储与RMDA操作相关的数据的存储空间所对应的虚拟内存地址,该地址空间的相关信息包括注册的地址空间的起始虚拟内存地址、地址空间的长度以及访问标记。After the virtual memory address registration process is performed, the first device may send the related information of the address space registered during the virtual memory address registration process to the second device through an RDMA sending operation, so that the second device can learn the use of the first device. A virtual memory address corresponding to a storage space storing data related to the RMDA operation. The related information of the address space includes a starting virtual memory address of the registered address space, a length of the address space, and an access mark.
本申请实施例中,数据持久化标记用于指示第一数据为待持久化的数据。根据 RDMA网络的网络特性,数据持久化标记可以有以下两种可能的情况:In the embodiment of the present application, the data persistence flag is used to indicate that the first data is data to be persisted. According to the network characteristics of the RDMA network, data persistence marking can have the following two possible situations:
1)在RDMA网络中,计算设备的RNIC根据解析得到的RDMA操作指令执行相应的操作。基于这一设定,在本申请实施例的第一种可能的实现中,可以在RDMA操作指令中新增写持久化指令,通过该写持久化指令表示该第一数据是需要写入以及内存持久化的数据,即数据持久化标记为写持久化指令。1) In the RDMA network, the RNIC of the computing device performs the corresponding operation according to the RDMA operation instruction obtained by the analysis. Based on this setting, in the first possible implementation of the embodiment of the present application, a write persistence instruction may be added to the RDMA operation instruction, and the write persistence instruction indicates that the first data needs to be written and stored in memory Persistent data, that is, data persistence is marked as a write persistence instruction.
在数据持久化标记为写持久化指令的情况下,该RDMA写持久化请求可以包括写持久化指令、第一数据的源虚拟内存地址以及第一数据的目的虚拟内存地址。When the data persistence is marked as a write persistence instruction, the RDMA write persistence request may include a write persistence instruction, a source virtual memory address of the first data, and a destination virtual memory address of the first data.
2)在本申请实施例的第二种可能的实现中,该数据持久化标记也可以为第一数据对应的目的存储地址,其中,该目的存储地址对应的存储空间用于存储第一数据,该目的存储地址为第一设备中的持久化存储地址,该持久存储地址对应的存储空间用于存储待持久化的数据。2) In a second possible implementation of the embodiment of the present application, the data persistence flag may also be a destination storage address corresponding to the first data, where a storage space corresponding to the destination storage address is used to store the first data, The destination storage address is a persistent storage address in the first device, and a storage space corresponding to the persistent storage address is used to store data to be persisted.
如前所述,第一设备的处理器会通过虚拟内存地址注册过程指定用于存储与RDMA操作有关的数据的存储空间所对应的地址空间以及地址空间的访问标记,并且,在进行虚拟地址注册过程之后,第一设备会将虚拟内存地址注册过程中注册的信息通过RDMA发送操作发送给第二设备。基于这一设定,可以在访问标记中增设远程写持久化标记,远程写持久化标记表示地址空间对应的存储空间用于存储待持久化的数据。那么,该目的存储地址可以为目的虚拟内存地址,该目的虚拟内存地址为第一处理器通过虚拟内存地址注册过程注册的访问标记为写持久化标记的地址空间中的逻辑存储地址。由于该目的存储地址为第一处理器通过虚拟内存地址注册过程注册的访问标记为写持久化标记的地址空间中的逻辑存储地址,访问标记为写持久化的标记的地址空间对应的存储空间用于存储待持久化的数据,则该目的虚拟内存地址对应的存储空间用于存储待持久化的数据。As mentioned earlier, the processor of the first device specifies the address space corresponding to the storage space for storing data related to the RDMA operation and the access mark of the address space through the virtual memory address registration process, and performs virtual address registration After the process, the first device sends the information registered in the virtual memory address registration process to the second device through an RDMA sending operation. Based on this setting, a remote write persistence mark can be added to the access mark. The remote write persistence mark indicates that the storage space corresponding to the address space is used to store data to be persisted. Then, the destination storage address may be a destination virtual memory address, and the destination virtual memory address is a logical storage address in an address space where an access mark registered by the first processor through the virtual memory address registration process is a write persistence mark. Because the destination storage address is a logical storage address in the address space registered by the first processor through the virtual memory address registration process and the access mark is a write-persistent mark, the storage space corresponding to the address space marked by the write-persistent mark is used For storing the data to be persisted, the storage space corresponding to the destination virtual memory address is used to store the data to be persisted.
在数据持久化标记为第一数据的目的虚拟内存地址的情况下,该RDMA写持久化请求可以包括写指令、第一数据的源虚拟内存地址以及第一数据的目的虚拟内存地址,该目的虚拟内存地址为为第一处理器通过虚拟内存地址注册过程注册的访问标记为写持久化标记的地址空间中的逻辑存储地址。When the data persistence is marked as the destination virtual memory address of the first data, the RDMA write persistence request may include a write instruction, a source virtual memory address of the first data, and a destination virtual memory address of the first data. The memory address is a logical storage address in an address space registered as a write-persistent mark for an access mark registered by the first processor through a virtual memory address registration process.
S802,第二RNIC根据RDMA写持久化请求生成RDMA写请求,RDMA写请求包括第一数据和数据持久化标记。S802. The second RNIC generates an RDMA write request according to the RDMA write persistence request. The RDMA write request includes the first data and a data persistence mark.
在一种具体的实现方式中,第二RNIC可以在SQ中创建该RDMA写持久化请求对应的WQE,该RDMA写持久化请求对应的WQE可以包括第一数据的源虚拟内存地址、RDMA操作指令以及第一数据的目的虚拟内存地址;然后在调度到该RDMA写持久化请求对应的WQE时,根据第一数据的源虚拟内存地址从该源虚拟内存地址中获取第一数据,将该第一数据、第一数据的目的虚拟内存地址以及该RDMA操作指令封装进RDMA传输报文中以形成RDMA写请求,其中,RDMA传输报文指在第一RNIC和第二RNIC之间传输的报文。其中,第二RNIC根据RDMA写持久化请求生成RDMA写请求这一操作具体可以由第二RNIC的调度模块执行。In a specific implementation manner, the second RNIC may create a WQE corresponding to the RDMA write persistence request in the SQ, and the WQE corresponding to the RDMA write persistence request may include a source virtual memory address of the first data and an RDMA operation instruction. And the destination virtual memory address of the first data; then when dispatching to the WQE corresponding to the RDMA write persistence request, obtaining the first data from the source virtual memory address according to the source virtual memory address of the first data, The data, the destination virtual memory address of the first data, and the RDMA operation instruction are encapsulated in an RDMA transmission message to form an RDMA write request. The RDMA transmission message refers to a message transmitted between the first RNIC and the second RNIC. The operation of generating a RDMA write request by the second RNIC according to the RDMA write persistence request may be specifically performed by a scheduling module of the second RNIC.
如果数据持久化标记为写持久化指令,则该RDMA写请求可以包括写持久化指令、第一数据的目的虚拟内存地址以及第一数据,该RDMA写请求也可以称之为写持久化请求;如果数据持久化标记为第一数据的目的虚拟内存地址,则该RDMA写请求可以 包括写指令、第一数据的目的虚拟内存地址以及第一数据,该第一数据的目的虚拟内存地址为第一处理器通过虚拟内存地址注册的持久化虚拟内存地址。If the data persistence is marked as a write persistence instruction, the RDMA write request may include a write persistence instruction, a destination virtual memory address of the first data, and the first data, and the RDMA write request may also be referred to as a write persistence request; If the data persistence is marked as the destination virtual memory address of the first data, the RDMA write request may include a write instruction, the destination virtual memory address of the first data, and the first data, and the destination virtual memory address of the first data is the first The persistent virtual memory address that the processor registers with the virtual memory address.
可选地,该RDMA写请求还可以包括第一数据的序列号、第一设备的QP序号。第一数据数据包的序列号用于在第一设备与第二设备的传输过程中唯一地标识该第一数据,便于检测丢失或重复的数据包;远端设备的QP序号用于标识本地设备和远端设备之间的唯一的通道。Optionally, the RDMA write request may further include a serial number of the first data and a QP serial number of the first device. The serial number of the first data packet is used to uniquely identify the first data during the transmission between the first device and the second device, which is convenient for detecting missing or duplicate data packets; the QP serial number of the remote device is used to identify the local device And the only channel between the remote device.
S803,第二RNIC将RDMA写请求发送给第一RNIC,第一RNIC接收RDMA写请求。S803: The second RNIC sends an RDMA write request to the first RNIC, and the first RNIC receives the RDMA write request.
本申请实施例中,第二RNIC可以基于无限带宽(InfiniBand,IB)协议将RDMA写请求发送给第一RNIC;第二RNIC也可以基于融合以太网远程直接内存访问(RDMA over Converged Ethernet,RoCE)协议将RDMA写请求发送给第一RNIC;第二RNIC还可以基于传输控制协议的远程直接内存访问(iWARP)协议将RDMA写请求发送给第一RNIC。In the embodiment of the present application, the second RNIC may send an RDMA write request to the first RNIC based on an InfiniBand (IB) protocol; the second RNIC may also be based on a converged Ethernet remote direct memory access (RDMA over Converged Ethernet, RoCE) The protocol sends the RDMA write request to the first RNIC; the second RNIC may also send the RDMA write request to the first RNIC based on the remote direct memory access (iWARP) protocol of the transmission control protocol.
具体实现中,第二RNIC通过第二RNIC的发送模块将RDMA写请求发送给第一RNIC,第一RNIC通过第一RNIC的接收模块接收该RDMA写请求。In specific implementation, the second RNIC sends the RDMA write request to the first RNIC through the sending module of the second RNIC, and the first RNIC receives the RDMA write request through the receiving module of the first RNIC.
第一RNIC的接收模块在接收到该RDMA写请求后,将该RDMA写请求放入对应于第一RNIC的RQ中,等待第一RNIC的调度模块对其进行调度。在第一RNIC的调度模块调度到该RDMA写请求的情况下,第一RNIC的调度模块对该RDMA写请求进行解析得到第一数据、第一数据的目的虚拟内存地址以及RDMA操作指令。After receiving the RDMA write request, the receiving module of the first RNIC puts the RDMA write request into the RQ corresponding to the first RNIC, and waits for the scheduling module of the first RNIC to schedule it. When the scheduling module of the first RNIC dispatches the RDMA write request, the scheduling module of the first RNIC parses the RDMA write request to obtain first data, a destination virtual memory address of the first data, and an RDMA operation instruction.
在数据持久化标记为写持久化指令的情况下,第一RNIC的调度模块可以将写持久化指令发送给第一RNIC的PM模块,第一RNIC的PM模块根据写持久化指令确定第一数据为待持久化的数据,执行步骤S805;第一RNIC的调度模块根据第一数据的目的虚拟内存地址以及写持久化指令,确定需要将第一数据写入到该目的虚拟内存地址中,第一RNIC的调度模块执行步骤S804。When the data persistence is marked as a write persistence instruction, the scheduling module of the first RNIC may send the write persistence instruction to the PM module of the first RNIC, and the PM module of the first RNIC determines the first data according to the write persistence instruction. For data to be persisted, step S805 is performed; the scheduling module of the first RNIC determines that the first data needs to be written to the destination virtual memory address according to the destination virtual memory address of the first data and the write persistence instruction. The scheduling module of the RNIC executes step S804.
在数据持久化标记为第一数据的目的虚拟内存地址的情况下,第一RNIC的调度模块可以将第一数据的目的虚拟内存地址发送给PM模块,由于第一数据的目的虚拟内存地址为第一处理器通过虚拟内存地址注册过程注册的访问标记为远程写持久标记的地址空间中的逻辑存储地址,第一RNIC的PM模块确定第一数据为待持久化的数据,执行步骤S805;第一RNIC的调度模块根据第一数据的目的虚内存地址以及写指令,确定需要将第一数据写入到该目的虚拟内存地址中,第一RNIC的调度模块执行步骤S804。In the case where the data persistence is marked as the destination virtual memory address of the first data, the scheduling module of the first RNIC may send the destination virtual memory address of the first data to the PM module, because the destination virtual memory address of the first data is the first The access token registered by a processor through the virtual memory address registration process is a logical storage address in the address space of the remote write persistent tag. The PM module of the first RNIC determines that the first data is data to be persisted, and executes step S805; first The scheduling module of the RNIC determines that the first data needs to be written into the destination virtual memory address according to the destination virtual memory address and the write instruction of the first data, and the scheduling module of the first RNIC executes step S804.
S804,第一RNIC向第一处理器发送DMA写请求,第一处理器接收DMA写请求,DMA写请求包括第一数据。S804. The first RNIC sends a DMA write request to the first processor. The first processor receives the DMA write request, and the DMA write request includes the first data.
第一RNIC向第一处理器发送DMA写请求这一操作由可以第一RNIC的调度模块执行。The operation that the first RNIC sends a DMA write request to the first processor is performed by a scheduling module that can be the first RNIC.
这里,第一RNIC向第一处理器发送DMA写请求即为以DMA的方式将第一数据写入到第一设备的非易失性存储器中。Here, when the first RNIC sends a DMA write request to the first processor, the first data is written into the non-volatile memory of the first device in a DMA manner.
S805,第一RNIC指示第一处理器将第一数据保存到第一设备的非易失性存储器中。S805. The first RNIC instructs the first processor to save the first data to a non-volatile memory of the first device.
从图9所示的步骤可看出,远端设备的RNIC接收到根据RDMA写请求之后,根据该RDMA写请求中的数据持久化标记确定RDMA写请求中的数据是要进行内存持久化的数据,直接指示第一处理将第一数据保存到第一设备的非易失性存储器中,这相当于将RDMA写请求和RDMA持久化请求合并在一个请求中,省去了远端设备再发起RDMA持久化请求的操作,由于每一次请求均需要占用一定的网络带宽,省去远端设备发起RDMA持久化请求的操作就节省一次请求占用的带宽,减轻了RDMA网络的负载;另外,由于将RDMA写请求和RDMA持久化请求合并在一个请求中,相当于将数据的写操作和内存持久化操作变成两个需要被连续执行的操作,保证数据写入远端设备后可以被保存到远端设备的非易失性存储器中,避免出现数据不一致的问题。It can be seen from the steps shown in FIG. 9 that after the RNIC of the remote device receives the RDMA write request, it is determined that the data in the RDMA write request is data for memory persistence according to the data persistence mark in the RDMA write request. Directly instructs the first process to save the first data to the non-volatile memory of the first device, which is equivalent to combining the RDMA write request and the RDMA persistence request into one request, eliminating the need for the remote device to initiate RDMA The operation of persistent requests, because each request requires a certain amount of network bandwidth, eliminating the need for the remote device to initiate an RDMA persistent request operation, which saves the bandwidth occupied by a request and reduces the load on the RDMA network. Write requests and RDMA persistence requests are combined into one request, which is equivalent to turning data write operations and memory persistence operations into two operations that need to be performed continuously, ensuring that data can be saved to the remote end after it is written to the remote device The non-volatile memory of the device avoids data inconsistencies.
由前述内容可知,要将RNIC接收到的数据写入至该RNIC对应的计算设备的非易失性存储器中,数据要经过该RNIC对应的处理器,由于总线被占用等原因,数据可能会被暂时缓存在处理器的各级缓存中。由于处理器架构特性的不同,数据在处理器中存在两种数据流向,即第一数据在第一处理器中有两种不同的数据流向。本申请实施例中,结合图1所述的不同的数据流向,第一RNIC指示第一处理器将第一数据保存到第一设备的非易失性存储器中所采用的方式不同,下面对两种不同的数据流向对应的数据处理方法的流程进行具体介绍。参见图10-图11。It can be known from the foregoing that the data received by an RNIC is written into the non-volatile memory of the computing device corresponding to the RNIC, and the data passes through the processor corresponding to the RNIC. Due to the bus being occupied and other reasons, the data may be Temporarily cached in all levels of the processor's cache. Due to different architecture characteristics of the processor, there are two types of data flow in the processor, that is, the first data has two different data flows in the first processor. In the embodiment of the present application, in combination with different data flows described in FIG. 1, the first RNIC instructs the first processor to save the first data to the non-volatile memory of the first device in different ways. The flow of two different data flow corresponding data processing methods will be specifically introduced. See Figures 10-11.
图10是本申请实施例提供的另一种数据处理方法的流程示意图,该流程适用于数据流向为上述第一种数据流向的情况,其中,第一RNIC和第一处理器分别为第一设备的RNIC和处理器,第一设备为远端设备;第二RNIC和第二处理器分别为第二设备的RNIC和处理器,第二设备为本地设备。如图所示,该方法包括以下流程:FIG. 10 is a schematic flowchart of another data processing method according to an embodiment of the present application. The process is applicable to a case where the data flow is the first data flow direction described above, where the first RNIC and the first processor are the first device, respectively. The first device is a remote device; the second RNIC and the second processor are the RNIC and the processor of the second device, and the second device is a local device. As shown, the method includes the following processes:
S901,第二处理器向第二RNIC发送RDMA写持久化请求,第二RNIC接收RDMA写持久化请求,RDMA写持久化请求包括数据持久化标记。S901. The second processor sends an RDMA write persistence request to the second RNIC. The second RNIC receives the RDMA write persistence request. The RDMA write persistence request includes a data persistence flag.
S902,第二RNIC根据RDMA写持久化请求生成RDMA写请求,RDMA写请求包括第一数据和数据持久化标记。S902. The second RNIC generates an RDMA write request according to the RDMA write persistence request. The RDMA write request includes the first data and a data persistence mark.
S903,第二RNIC将RDMA写请求发送给第一RNIC,第一RNIC接收RDMA写请求。S903. The second RNIC sends an RDMA write request to the first RNIC, and the first RNIC receives the RDMA write request.
S904,第一RNIC向第一处理器发送DMA写请求,第一处理器接收DMA写请求,DMA写请求包括第一数据。S904. The first RNIC sends a DMA write request to the first processor. The first processor receives the DMA write request, and the DMA write request includes the first data.
这里,步骤S901~S904的实现方式可参考上述步骤S801~S804的描述,此处不再赘述。其中,第一RNIC的PM模块根据持久化标记确定第一数据为待持久化的数据,第一RNIC的PM模块执行步骤S905。Here, for the implementation of steps S901 to S904, reference may be made to the description of steps S801 to S804, and details are not described herein again. The PM module of the first RNIC determines that the first data is data to be persisted according to the persistence flag, and the PM module of the first RNIC performs step S905.
S905,第一RNIC在与第二RNIC对应的RQ中添加DMA LSB读请求。S905. The first RNIC adds a DMA LSB read request to the RQ corresponding to the second RNIC.
这里,第一RNIC的PM模块可根据RDMA写请求中第一设备的QP确定与第二RNIC对应的RQ,然后在该RQ中添加DMA LSB读请求。DMA LSB读请求为一次DMA读操作对应的请求,该DMA LSB读请求对应的虚拟内存地址可以为第一处理器通过虚拟内存地址注册过程注册的任意一段地址空间中的逻辑存储地址。其中,该DMA LSB读请求对应的虚拟内存地址指该DMA LSB读请求对应的DMA LSB读操作要读取的存储空间所对应的虚拟内存地址。Here, the PM module of the first RNIC may determine the RQ corresponding to the second RNIC according to the QP of the first device in the RDMA write request, and then add a DMA LSB read request to the RQ. A DMA LSB read request is a request corresponding to a DMA read operation, and the virtual memory address corresponding to the DMA LSB read request may be a logical storage address in any segment of the address space registered by the first processor through the virtual memory address registration process. The virtual memory address corresponding to the DMA LSB read request refers to the virtual memory address corresponding to the storage space to be read by the DMA LSB read operation corresponding to the DMA LSB read request.
作为一个可能的实施例,第一RNIC除了通过DMA LSB读请求方式指示第一处理器持久化数据外,第一RNIC也可以通过向第一处理器发送其他DMA读请求,指示第一处理器将缓存在第一设备的外设总线链路中所有数据写入至第一设备的非易失性存储器。其中,上述DMA读请求可以是读取SCM中任意地址的读请求,读请求读取的地址可以是读取SCM中任意一个地址,例如,该地址为用于存储第一数据的地址区间的起始地址、结束地址或除起始地址和结束地址以外的任意一个地址。可选地,上述读请求所要读取的地址也可以是一段地址段,例如,该地址段为用于存储第一数据的一段地址区间,或者是SCM中任意一段存储区间。As a possible embodiment, in addition to the first RNIC instructing the first processor to persist the data through a DMA LSB read request, the first RNIC may also instruct the first processor to send another DMA read request to the first processor. All data buffered in the peripheral bus link of the first device is written to the non-volatile memory of the first device. The above DMA read request may be a read request to read an arbitrary address in the SCM, and the address read by the read request may be any address in the SCM. For example, the address is a starting address range used to store the first data. Start address, end address, or any address other than start address and end address. Optionally, the address to be read by the read request may also be an address segment. For example, the address segment is an address range for storing the first data, or any one of the storage ranges in the SCM.
在获取到该DAM LSB读请求之后,第一RNIC的调度模块执行步骤S906。After acquiring the DAM LSB read request, the scheduling module of the first RNIC executes step S906.
S906,第一RNIC向第一处理器发送DMA LSB读请求,第一处理器接收DMA LSB读请求。S906. The first RNIC sends a DMA LSB read request to the first processor, and the first processor receives the DMA LSB read request.
这里,如果第一处理器的架构为数据在处理器中的流向与IIO中的DDIO功能和RNIC发送给IIO的报文有关的处理器架构,第一RNIC在DMA LSB读请求中将NS标记置为1。Here, if the architecture of the first processor is the flow of data in the processor to the processor architecture related to the DDIO function in IIO and the message sent by RNIC to IIO, the first RNIC sets the NS flag in the DMA LSB read request. Is 1.
通过向第一处理器发起DMA LSB读请求,可以使第一处理器在DMA LSB读请求之前的所有写操作完成,从而使得还未被写入至非易失性存储器中的所有数据写入非易失性存储器中,保证在发起DMA LSB读请求之前通过DMA写请求写入的第一数据能够被写入非易失性存储器中,完成了对第一数据的内存持久化。By issuing a DMA LSB read request to the first processor, all write operations of the first processor before the DMA LSB read request can be completed, so that all data that has not been written to the non-volatile memory is written to non-volatile memory. In the volatile memory, it is guaranteed that the first data written by the DMA write request can be written into the non-volatile memory before the DMA LSB read request is initiated, and the memory persistence of the first data is completed.
进一步地,在完成对第一数据进行内存持久化之后,第一RNIC可通过向第二RNIC发送确认消息以告知第一数据的内存持久化完成。图10所示的方法还可以包括:Further, after completing the memory persistence of the first data, the first RNIC may notify the second RNIC of the completion of the memory persistence of the first data by sending a confirmation message. The method shown in FIG. 10 may further include:
S907,第一RNIC向第二RNIC发送第一数据对应的持久化确认消息,第二RNIC接收第一数据对应的持久化确认消息。S907: The first RNIC sends a persistence confirmation message corresponding to the first data to the second RNIC, and the second RNIC receives the persistence confirmation message corresponding to the first data.
第一数据对应的持久化确认消息包括第二PSN,第二PSN为第一数据的序列号。The persistence confirmation message corresponding to the first data includes a second PSN, and the second PSN is a sequence number of the first data.
具体实现中,第一RNIC通过第一RNIC的发送模块将第一数据对应的持久化确认消息发送给第二RNIC,第二RNIC通过第二RNIC的接收模块接收该第一数据对应的持久化确认消息。In specific implementation, the first RNIC sends a persistence confirmation message corresponding to the first data to the second RNIC through the sending module of the first RNIC, and the second RNIC receives the persistence confirmation corresponding to the first data through the receiving module of the second RNIC. Message.
第一RNIC的调度模块从RDMA写请求中获取第一数据的序列号,作为第二PSN,将第二PSN封装进RDMA传输报文中以形成确认消息,并通过第一RNIC的发送模块发送给第二RNIC。第二RNIC的接收模块在接收到该确认消息后,将该确认消息发送给第二RNIC的调度模块,第二RNIC的调度模块解析该确认消息得到第二PSN,根据第二PSN确定该确认消息为第一数据对应的持久化确认消息。The scheduling module of the first RNIC obtains the serial number of the first data from the RDMA write request, and as the second PSN, encapsulates the second PSN into the RDMA transmission message to form a confirmation message, and sends it to the sending module of the first RNIC to Second RNIC. After receiving the confirmation message, the receiving module of the second RNIC sends the confirmation message to the scheduling module of the second RNIC. The scheduling module of the second RNIC parses the confirmation message to obtain a second PSN, and determines the confirmation message according to the second PSN. A persistent confirmation message corresponding to the first data.
S908,第二RNIC生成RDMA写持久化请求对应的CQE。S908. The second RNIC generates a CQE corresponding to the RDMA write persistence request.
这里,第二RNIC的调度模块根据第一数据对应的持久化确认消息确定第一数据已经存储至第一设备的非易失性存储器中,第二RNIC的调度模块根据第二PSN确定该第一数据对应的WQE,获取该WQE中的内容生成RDMA写持久化请求对应的CQE,该RDMA写持久化请求对应的CQE可以包括第一数据的源虚拟内存地址和/或第二PSN和/或目的虚拟内存地址。Here, the scheduling module of the second RNIC determines that the first data has been stored in the non-volatile memory of the first device according to the persistence confirmation message corresponding to the first data, and the scheduling module of the second RNIC determines the first according to the second PSN WQE corresponding to the data, obtain the content in the WQE, and generate the CQE corresponding to the RDMA write persistence request. The CQE corresponding to the RDMA write persistence request may include the source virtual memory address of the first data and / or the second PSN and / or the destination. Virtual memory address.
S909,第二处理器获取RDMA写持久化请求对应的CQE,确定第一数据已经存储至第一设备的非易失性存储器中。S909. The second processor obtains a CQE corresponding to the RDMA write persistence request, and determines that the first data has been stored in the non-volatile memory of the first device.
这里,第二处理器可根据CQE中的内容确定第一数据已经存储至第一设备的非易失性存储器中。例如,该CQE的内容为第一数据的源虚拟地址,第二处理器确定该源虚拟内存地址中存储的数据已经存储至第一设备的非易失性存储器中,即第一数据已经存储至第一设备的非易失性存储器中。第一数据被存储至第一设备的非易失性存储器中后,第一数据的内存持久化完成。Here, the second processor may determine that the first data has been stored in the non-volatile memory of the first device according to the content in the CQE. For example, the content of the CQE is the source virtual address of the first data, and the second processor determines that the data stored in the source virtual memory address has been stored in the non-volatile memory of the first device, that is, the first data has been stored in In the non-volatile memory of the first device. After the first data is stored in the non-volatile memory of the first device, the memory persistence of the first data is completed.
从图10所示的步骤可看出,在本申请实施例中,对于数据不经过处理器的LLC的情况,本申请实施例只需要本地设备发起一次RDMA写请求即可将该RDMA写请求中的数据写入至远端设备的非易失性存储器中,与图2所示的流程相比,省去了一次单边操作,减小了本地设备的处理器的负载,同时,节省了发起RDMA持久化请求所占用的带宽,减轻了网络负载。As can be seen from the steps shown in FIG. 10, in the embodiment of the present application, for the case where the data does not pass through the LLC of the processor, the embodiment of the present application only needs to initiate an RDMA write request to the RDMA write request in the embodiment of the present application. The data is written to the non-volatile memory of the remote device. Compared with the process shown in Figure 2, one unilateral operation is omitted, the load of the processor of the local device is reduced, and the initiation is saved. The bandwidth occupied by RDMA persistent requests reduces the network load.
图11是本申请实施例提供的另一种数据处理方法的流程示意图,该流程适用于数据流向为上述第二种数据流向的情况,其中,第一RNIC和第一处理器分别为第一设备的RNIC和处理器,第一设备为远端设备;第二RNIC和第二处理器分别为第二设备的RNIC和处理器,第二设备为本地设备。如图所示,该方法包括以下流程:FIG. 11 is a schematic flowchart of another data processing method according to an embodiment of the present application. The process is applicable to a case where the data flow is the second data flow direction, where the first RNIC and the first processor are the first device, respectively. The first device is a remote device; the second RNIC and the second processor are the RNIC and the processor of the second device, and the second device is a local device. As shown, the method includes the following processes:
S1001,第二处理器向第二RNIC发送RDMA写持久化请求,第二RNIC接收RDMA写持久化请求,RDMA写持久化请求包括数据持久化标记。S1001. The second processor sends an RDMA write persistence request to the second RNIC. The second RNIC receives the RDMA write persistence request. The RDMA write persistence request includes a data persistence flag.
S1002,第二RNIC根据RDMA写持久化请求生成RDMA写请求,RDMA写请求包括第一数据和数据持久化标记。S1002. The second RNIC generates an RDMA write request according to the RDMA write persistence request. The RDMA write request includes the first data and a data persistence mark.
S1003,第二RNIC将RDMA写请求发送给第一RNIC,第一RNIC接收RDMA写请求。S1003. The second RNIC sends an RDMA write request to the first RNIC, and the first RNIC receives the RDMA write request.
S1004,第一RNIC向第一处理器发送DMA写请求,第一处理器接收DMA写请求,DMA写请求包括第一数据。S1004. The first RNIC sends a DMA write request to the first processor. The first processor receives the DMA write request, and the DMA write request includes the first data.
这里,步骤S1001~S1004的实现方式可参考上述步骤S801~804的描述,此处不再赘述。其中,第一RNIC的PM模块根据数据持久化标记确定第一数据为待持久的数据后,执行步骤S1006~S1007。Here, for the implementation of steps S1001 to S1004, reference may be made to the description of steps S801 to 804 above, and details are not described herein again. After the PM module of the first RNIC determines that the first data is data to be persisted according to the data persistence flag, steps S1006 to S1007 are performed.
S1005,第一处理器向第一RNIC发送第一RDMA接收请求,第一RNIC接收第一RDMA接收请求。S1005. The first processor sends a first RDMA reception request to the first RNIC, and the first RNIC receives the first RDMA reception request.
由于数据在第一处理器中要经过LLC,需要第一处理器将LLC中的数据写回到非易失性存储器中,即需要第一处理器的参与,而第一处理器为远端设备的处理器,则将LLC中的数据写回到非易失性存储器中要涉及RDMA双边操作。对于RDMA双边操作,远端设备必须先发起RDMA接收操作,则第一处理器向第二RNIC发送第一RDMA接收请求,用于接收第二设备发送的RDMA发送请求。Because the data passes through the LLC in the first processor, the first processor is required to write the data in the LLC back to the non-volatile memory, that is, the first processor is required to participate, and the first processor is a remote device Processor, writing data from LLC back to non-volatile memory involves RDMA bilateral operations. For the RDMA bilateral operation, the remote device must first initiate an RDMA receive operation, then the first processor sends a first RDMA receive request to the second RNIC for receiving the RDMA send request sent by the second device.
S1006,第二RNIC根据第一数据生成第一RDMA接收请求对应的WQE。S1006. The second RNIC generates a WQE corresponding to the first RDMA reception request according to the first data.
在一种可能的实现方式中,该第一RDMA接收请求对应的WQE可包括第一数据的目的虚拟内存地址;在另一种可能的实现方式中,该第一RDMA接收请求对应的CQE可包括第一数据的序列号;在又一种可能的实现方式中,该第一RDMA接收请求对应的CQE可包括第一数据的目的虚拟内存地址和第一数据的序列号。In a possible implementation manner, the WQE corresponding to the first RDMA reception request may include a destination virtual memory address of the first data; in another possible implementation manner, the CQE corresponding to the first RDMA reception request may include The serial number of the first data; in another possible implementation manner, the CQE corresponding to the first RDMA reception request may include a destination virtual memory address of the first data and a serial number of the first data.
S1007,第二RNIC清除第一RDMA接收请求对应的WQE,并生成第一RDMA接收请求对应的CQE。S1007. The second RNIC clears the WQE corresponding to the first RDMA reception request, and generates a CQE corresponding to the first RDMA reception request.
在一种可能的实现方式中,第二RNIC的PM模块可从该第一RDMA接收请求对应的WQE中获取该WQE的内容,然后根据获取到的WQE的内容生成第一RDMA接收请求对应的CQE,该第一RDMA接收请求对应的CQE可包括第一数据的目的虚拟内存地址和/或第一数据的序列号。In a possible implementation manner, the PM module of the second RNIC may obtain the WQE content from the WQE corresponding to the first RDMA reception request, and then generate the CQE corresponding to the first RDMA reception request according to the acquired WQE content. The CQE corresponding to the first RDMA reception request may include a destination virtual memory address of the first data and / or a serial number of the first data.
在步骤S1006~S1007中,第二RNIC的PM模块执行的操作代替了第二设备发起的RDMA发送请求的操作,其实现的效果与第二设备通过发起RDMA发送请求以指示将第一数据存储至第一设备的非易失性存储器中的效果是一样的。In steps S1006 to S1007, the operation performed by the PM module of the second RNIC replaces the operation of the RDMA transmission request initiated by the second device, and the effect achieved is the same as that of the second device by initiating the RDMA transmission request to instruct the first data to be stored in The effect in the non-volatile memory of the first device is the same.
S1008,第一处理器获取第一RDMA接收请求对应的CQE,将第一数据存储至第一设备的非易失性存储器中。S1008. The first processor obtains a CQE corresponding to the first RDMA reception request, and stores the first data in a non-volatile memory of the first device.
在第一RDMA接收请求对应的CQE包括第一数据的目的虚拟内存地址的情况下,第一处理器可以根据该第一数据的目的虚拟内存地址在LLC中找到该第一数据,从而将第一数据存储至第一设备的非易失性存储器中;在第一RDMA接收请求对应的CQE包括第一数据的序列号的情况下,第一处理器可以根据该第一数据的序列号在LLC中找到该第一数据,从而将第一数据存储至第一设备的非易失性存储器中。In the case that the CQE corresponding to the first RDMA reception request includes the destination virtual memory address of the first data, the first processor may find the first data in the LLC according to the destination virtual memory address of the first data, thereby converting the first data Data is stored in the non-volatile memory of the first device; in the case that the CQE corresponding to the first RDMA reception request includes the serial number of the first data, the first processor may be in the LLC according to the serial number of the first data The first data is found, thereby storing the first data in a non-volatile memory of the first device.
在可选的实施方式中,第一RNIC在接收到第一数据后,可向第二RNIC发送确认消息以告知第二RNIC接收到第一数据,则在步骤S1003之后还包括:In an optional implementation manner, after receiving the first data, the first RNIC may send a confirmation message to the second RNIC to inform the second RNIC that the first data is received, and after step S1003, the method further includes:
S1009,第一RNIC向第二RNIC发送第一数据对应的接收确认消息,第二RNIC接收第一数据对应的接收确认消息。S1009: The first RNIC sends a reception confirmation message corresponding to the first data to the second RNIC, and the second RNIC receives a reception confirmation message corresponding to the first data.
这里,第一数据对应的接收确认消息包括第一PSN,第一PSN为第一数据的序列号,第一数据对应的接收确认消息用于指示第一RNIC接收到该第一数据。Here, the reception confirmation message corresponding to the first data includes a first PSN, the first PSN is a serial number of the first data, and the reception confirmation message corresponding to the first data is used to indicate that the first RNIC receives the first data.
具体实现中,第一RNIC通过第一RNIC的发送模块将第一数据对应的接收确认消息发送给第二RNIC,第二RNIC通过第二RNIC的接收模块接收该第一数据对应的接收确认消息。In specific implementation, the first RNIC sends a reception confirmation message corresponding to the first data to the second RNIC through the sending module of the first RNIC, and the second RNIC receives the reception confirmation message corresponding to the first data through the receiving module of the second RNIC.
第一RNIC的调度模块从RDMA写请求中获取第一数据的序列号,作为第一PSN,将第一PSN封装进RDMA传输报文中以形成确认消息,并通过第一RNIC的发送模块发送给第二RNIC。第二RNIC的接收模块在接收到该确认消息后,将该确认消息发送给第二RNIC的调度模块,第二RNIC的调度模块解析该确认消息得到第一PSN,第二RNIC的调度模块根据第一PSN确定该确认消息为第一数据对应的接收确认消息。The scheduling module of the first RNIC obtains the serial number of the first data from the RDMA write request. As the first PSN, the first PSN is encapsulated into an RDMA transmission message to form a confirmation message, and is sent to the first RNIC sending module to Second RNIC. After receiving the confirmation message, the receiving module of the second RNIC sends the confirmation message to the scheduling module of the second RNIC. The scheduling module of the second RNIC parses the confirmation message to obtain the first PSN, and the scheduling module of the second RNIC according to the first A PSN determines that the confirmation message is a reception confirmation message corresponding to the first data.
S1010,第二RNIC生成RDMA写持久化请求对应的CQE。S1010. The second RNIC generates a CQE corresponding to the RDMA write persistence request.
S1011,第二处理器获取RDMA写持久化请求对应的CQE,确定第一数据传输完成。S1011: The second processor obtains a CQE corresponding to the RDMA write persistence request, and determines that the first data transmission is completed.
步骤S1010~S1011的具体实现方式可参考步骤S908~S909的描述,此处不不再赘述。For specific implementations of steps S1010 to S1011, reference may be made to the description of steps S908 to S909, and details are not described herein again.
在前述步骤中,由第一RNIC代替第二设备发起RDMA发送请求操作,则第二处理器可以在确定第一数据传输完成之后,发起RDMA接收请求,以接收第一处理器发起的RDMA发送请求,在步骤S1011之后还可以包括:In the foregoing steps, the first RNIC initiates an RDMA transmission request operation instead of the second device, and then the second processor may initiate an RDMA reception request after determining that the first data transmission is completed, so as to receive the RDMA transmission request initiated by the first processor. , After step S1011, the method may further include:
S1012,第二处理器向第二RNIC发起第二RDMA接收请求,第二RNIC接收第二RDMA接收请求。S1012: The second processor initiates a second RDMA reception request to the second RNIC, and the second RNIC receives the second RDMA reception request.
S1013,第二RNIC生成第二RDMA接收请求对应的WQE。S1013: The second RNIC generates a WQE corresponding to the second RDMA reception request.
在可选的实施方式中,第一处理器在将第一数据存储至第一设备的非易失性存储器之后,第一RNIC可通过向第二RNIC发送确认消息以告知第一数据的内存持久化完成,则在步骤S1008之后还可以包括:In an optional implementation manner, after the first processor stores the first data in the non-volatile memory of the first device, the first RNIC may notify the second RNIC by sending a confirmation message to inform that the memory of the first data is persistent. After the change is completed, after step S1008, the method may further include:
S1014,第一处理器向第一RNIC发送RDMA发送请求,第一RNIC接收RDMA发送请求。S1014: The first processor sends an RDMA transmission request to the first RNIC, and the first RNIC receives the RDMA transmission request.
这里,RDMA发送请求用于指示第一数据已经存储至第一设备的非易失性存储器中,该第一RDMA发送请求可包括第一数据的序列号。Here, the RDMA transmission request is used to indicate that the first data has been stored in the non-volatile memory of the first device, and the first RDMA transmission request may include a serial number of the first data.
S1015,第一RNIC生成RDMA发送请求对应的WQE。S1015: The first RNIC generates a WQE corresponding to the RDMA transmission request.
S1016,第一RNIC向第二RNIC发送第一数据对应的持久化确认消息,第二RNIC接收第一数据对应的持久化确认消息。S1016: The first RNIC sends a persistence confirmation message corresponding to the first data to the second RNIC, and the second RNIC receives the persistence confirmation message corresponding to the first data.
这里,第一数据对应的持久化确认消息包括第二PSN,第二PSN为第一数据的序列号。Here, the persistence confirmation message corresponding to the first data includes a second PSN, and the second PSN is a sequence number of the first data.
具体实现中,第一RNIC通过第一RNIC的发送模块将第一数据对应的持久化确认消息发送给第二RNIC,第二RNIC通过第二RNIC的接收模块接收该第一数据对应的持久化确认消息。In specific implementation, the first RNIC sends a persistence confirmation message corresponding to the first data to the second RNIC through the sending module of the first RNIC, and the second RNIC receives the persistence confirmation corresponding to the first data through the receiving module of the second RNIC. Message.
第一RNIC的调度模块从第一RDMA发送请求中获取第一数据包的序列号,作为第二PSN,然后将第二PSN封装进RDMA传输报文中以形成确认消息,并通过第一RNIC的发送模块发送给第二RNIC。第二RNIC的接收模块在接收到该确认消息后,将该确认消息发送给第二RNIC的调度模块,第二RNIC的调度模块解析该确认消息得到第二PSN,第二RNIC的调度模块确定该确认消息为第一数据对应的持久化确认消息。The scheduling module of the first RNIC obtains the sequence number of the first data packet from the first RDMA transmission request as the second PSN, and then encapsulates the second PSN into the RDMA transmission message to form a confirmation message, and passes the first RNIC's The sending module sends to the second RNIC. After receiving the confirmation message, the receiving module of the second RNIC sends the confirmation message to the scheduling module of the second RNIC. The scheduling module of the second RNIC parses the confirmation message to obtain a second PSN. The scheduling module of the second RNIC determines the The confirmation message is a persistent confirmation message corresponding to the first data.
在接收到第一RNIC发送的第一数据对应的持久化确认消息的情况下,第二RNIC执行步骤S1018。When receiving the persistence confirmation message corresponding to the first data sent by the first RNIC, the second RNIC executes step S1018.
S1017,第一RNIC清除RDMA发送请求对应的WQE,并生成RDMA发送请求对应的CQE。S1017: The first RNIC clears the WQE corresponding to the RDMA transmission request, and generates a CQE corresponding to the RDMA transmission request.
S1018,第二RNIC生成第二RDMA接收请求对应的CQE。S1018: The second RNIC generates a CQE corresponding to the second RDMA reception request.
S1019,第二处理器获取第二RDMA接收请求对应的CQE,确定第一数据已经存储至第一设备的非易失性存储器中。S1019: The second processor obtains a CQE corresponding to the second RDMA reception request, and determines that the first data has been stored in the non-volatile memory of the first device.
第一数据被存储至第一设备的非易失性存储器中后,第一数据的内存持久化完成。After the first data is stored in the non-volatile memory of the first device, the memory persistence of the first data is completed.
从图11所示的步骤可看出,在本申请实施例中,对于数据经过远端设备的处理器的LLC的情况,本地设备的处理器在发送单边操作之后只需要再发送一次双边操作就可以将数据保存至远端设备的非易失性存储器中,与图3所示的流程相比,省去了一次双边操作,减小了本地设备的处理器的负载,同时,节省了发起RDMA持久化请求所占用的带宽,减轻了网络负载。It can be seen from the steps shown in FIG. 11 that in the embodiment of the present application, for the case where data passes through the LLC of the processor of the remote device, the processor of the local device only needs to send the bilateral operation once after sending the unilateral operation The data can be saved to the non-volatile memory of the remote device. Compared with the process shown in Figure 3, a bilateral operation is omitted, the load on the processor of the local device is reduced, and the initiation is saved. The bandwidth occupied by RDMA persistent requests reduces the network load.
在上述图10与图11所示的过程中,涉及的是将一个数据保存到第一设备的非易失性存储器的过程,上述方案还可以用于将多个数据保存到第一设备非易失性存储器中。RDMA传输协议中规定,本地设备的RNIC需在确定上一个数据的传输过程已经完成才会将下一个数据发送给远端设备的RNIC,即,本地设备的RNIC在接收到远端设备的RNIC发送的针对上一个数据的ACK报文才会向远端设备发送下一个数据。在 一种可能的实现方式中,可以使两个数据中的前一个数据的内存持久化过程和后一个数据的写入过程并行进行,以提高传输效率。The processes shown in FIG. 10 and FIG. 11 above are related to a process of saving one data to the non-volatile memory of the first device. The above solution can also be used to save multiple data to the first device. Volatile memory. The RDMA transmission protocol stipulates that the RNIC of the local device must send the next data to the RNIC of the remote device only after determining that the previous data transmission process has been completed, that is, the RNIC of the local device receives the RNIC sent from the remote device. Only the ACK message for the previous data will send the next data to the remote device. In a possible implementation manner, the memory persistence process of the former data and the write process of the latter data of the two data can be performed in parallel to improve the transmission efficiency.
使两个数据中的前一个数据的内存持久化过程和后一个数据的写入过程并行进行的具体实现方式可以为:第一RNIC在接收到该RDMA写请求之后,向第二RNIC发送第二确认消息,第二确认消息中携带第一数据的PSN;第二RNIC接收到该第二确认消息后,由于是第一次接收到PSN为第一数据的PSN的确认消息,第二RNIC根据该第一数据的PSN确定接收到第一数据对应的接收确认消息,进而确定第一RNIC接收到第一数据,第二RNIC向第一RNIC发送第一数据的下一个数据所对应的RDMA写请求;另外,第二RNIC缓存该第二确认消息,在接收到PSN为第一数据的PSN的第一确认消息的情况下,第二RNIC根据该第一确认消息中的第一数据的PSN以及之前缓存的第二确认消息中的第一数据的PSN确定是第二次接收到PSN为第一数据的PSN的确认消息,第二RNIC确定接收到第一数据对应的持久化确认,执行上述步骤S908或S1018。A specific implementation method for making the memory persistence process of the former data and the write process of the latter data of the two data in parallel can be: after receiving the RDMA write request, the first RNIC sends a second data to the second RNIC. Confirmation message, the second confirmation message carries the PSN of the first data; after receiving the second confirmation message, the second RNIC receives the confirmation message of the PSN whose PSN is the first data for the first time, and the second RNIC according to the The PSN of the first data determines that a reception confirmation message corresponding to the first data is received, and further determines that the first RNIC receives the first data, and the second RNIC sends an RDMA write request corresponding to the next data of the first data to the first RNIC; In addition, the second RNIC buffers the second confirmation message. When receiving the first confirmation message of the PSN of which the PSN is the first data, the second RNIC caches the PSN of the first data in the first confirmation message and the previous cache. The PSN of the first data in the second confirmation message is determined to be the second time that the PSN for which the PSN is the first data is received, and the second RNIC determines to receive the persistent confirmation corresponding to the first data, and executes Said step S908 or S1018.
在RDMA传输协议中,确认消息仅用于表示确认的意思,而该确认消息具体用于表示对什么请求的确认需要根据在接收到确认消息之前所发送的请求或者接收到的确认消息的顺序进行判断,由于写入操作是发生在内存持久化操作之前,那么第一次接收到的携带第一数据的序列号的确认消息必然是第一数据的接收确认消息,在接收到第一数据的接收确认消息后,通过缓存该接收确认消息,则第二设备的RNIC不需要等到接收到第一数据的持久化确认消息就可以向第一设备RNIC发送第一数据的下一个数据,通过比较接收到确认消息中的PSN的方式可以使一个数据的内存持久化和该数据的下一个数据的写入操作能够并行进行,提高将多个数据保存到远端设备的非易失性存储器中的效率,减小时延。In the RDMA transmission protocol, the acknowledgment message is only used to indicate the meaning of acknowledgment, and the acknowledgment message is specifically used to indicate that the confirmation of what request needs to be performed according to the request sent or the order of the received confirmation message It is judged that because the write operation occurs before the memory persistence operation, the first confirmation message carrying the serial number of the first data must be the first data reception confirmation message. After receiving the first data reception After the acknowledgement message is buffered, the RNIC of the second device can send the next data of the first data to the first device RNIC without waiting for the persistent acknowledgement message of the first data to be received. The way of confirming the PSN in the message can make one data's memory persistent and the next data's write operation can be performed in parallel, improving the efficiency of saving multiple data in the non-volatile memory of the remote device, Reduce latency.
作为一个可能的实施例,第二设备发送的RDMA写请求中也可以不包括数据持久化标记,当第一设备接收到该RDMA写请求时,第一设备中的第一RNIC在与第二设备的第二RNIC对应的接收队列中添加DMA读请求,并将该DMA读请求发送给第一处理器,指示第一处理器将第一数据存储至SCM。可以使第一处理器在DMA读请求之前的所有写操作完成,从而使得还未被写入至非易失性存储器中的所有数据写入非易失性存储器中,保证在发起读请求之前通过DMA写请求写入的第一数据能够被写入非易失性存储器中,完成了对第一数据的内存持久化。As a possible embodiment, the RDMA write request sent by the second device may not include a data persistence flag. When the first device receives the RDMA write request, the first RNIC in the first device is communicating with the second device. A DMA read request is added to the receiving queue corresponding to the second RNIC, and the DMA read request is sent to the first processor, instructing the first processor to store the first data to the SCM. All write operations of the first processor before the DMA read request can be completed, so that all data that has not been written to the non-volatile memory is written to the non-volatile memory, and guaranteed to pass before the read request is initiated The first data written by the DMA write request can be written into the non-volatile memory, and the memory persistence of the first data is completed.
上述方法可以在计算设备的RNIC和处理器上实现,为了便于更好地实施本申请实施例的上述方法,本申请实施例还提供了相应的计算设备。The foregoing method may be implemented on a RNIC and a processor of a computing device. In order to facilitate better implementation of the foregoing method in the embodiment of the present application, the embodiment of the present application further provides a corresponding computing device.
参见图12,图12是本申请实施例提供的一种计算设备的组成结构示意图,该计算设备130包括RNIC131、处理器132和非易失性存储器133。其中,RNIC131的结构可以如图6所示,处理器132的结构可以如图1中的处理器102所示。非易失性存储器133可以为SCM。12, FIG. 12 is a schematic structural diagram of a computing device according to an embodiment of the present application. The computing device 130 includes an RNIC 131, a processor 132, and a non-volatile memory 133. The structure of the RNIC 131 may be as shown in FIG. 6, and the structure of the processor 132 may be shown as the processor 102 in FIG. 1. The non-volatile memory 133 may be an SCM.
RNIC131用于执行上述图9-图11所示的方法实施例中第一RNIC所执行的步骤,处理器132用于执行上述图9-图11所示的方法实施例中第一处理器所执行的步骤。The RNIC 131 is configured to execute the steps performed by the first RNIC in the method embodiments shown in FIG. 9 to FIG. 11, and the processor 132 is configured to perform the steps performed by the first processor in the method embodiments shown in FIG. 9 to FIG. 11. A step of.
参见图13,图13是本申请实施例提供的一种计算设备的组成结构示意图,该计算设备140包括RNIC141,处理器142和非易失性存储器143。其中,RNIC141的结 构可以如图6所示,处理器142的结构可以如图1中的处理器102所示。非易失性存储器133包括可以为SCM、NVRAM、NVDIMM。Referring to FIG. 13, FIG. 13 is a schematic structural diagram of a computing device according to an embodiment of the present application. The computing device 140 includes an RNIC 141, a processor 142, and a non-volatile memory 143. The structure of the RNIC 141 may be as shown in FIG. 6, and the structure of the processor 142 may be shown as the processor 102 in FIG. 1. The non-volatile memory 133 includes SCM, NVRAM, and NVDIMM.
RNIC141用于执行上述图9-图11所示的方法实施例中第二RNIC所执行的步骤,处理器142用于执行上述图9-图11所示的方法实施例中第二处理器所执行的步骤。The RNIC 141 is configured to execute the steps performed by the second RNIC in the method embodiments shown in FIG. 9 to FIG. 11, and the processor 142 is configured to execute the steps performed by the second processor in the method embodiments shown in FIG. 9 to FIG. 11. A step of.
本申请实施例还提供一种处理器,该处理器的结构可以如图1中的处理器所示,该处理器用于执行上述图9-图11所示的方法实施例中第二处理器所执行的步骤。An embodiment of the present application further provides a processor. The structure of the processor may be as shown in the processor in FIG. 1, and the processor is configured to execute the second processor in the method embodiment shown in FIG. 9 to FIG. 11. Steps to perform.
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载或执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘(solid state drive,SSD)。The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented using software, the above embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on a computer, the processes or functions according to the embodiments of the present application are wholly or partially generated. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be from a website site, a computer, a server, or a data center. Transmission via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server, or data center. The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server, a data center, and the like, including one or more sets of available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium. The semiconductor medium may be a solid state drive (SSD).
需说明,本申请实施例所涉及的第一、第二、第三、第四以及各种数字编号仅为描述方便进行的区分,并不用来限制本申请实施例的范围。It should be noted that the first, second, third, fourth, and various numerical numbers involved in the embodiments of the present application are only for the convenience of description and are not used to limit the scope of the embodiments of the present application.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above is only a specific implementation of this application, but the scope of protection of this application is not limited to this. Any person skilled in the art can easily think of changes or replacements within the technical scope disclosed in this application. It should be covered by the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.
Claims (30)
- 一种数据处理方法,其特征在于,包括:A data processing method, comprising:第一远程直接访存网卡RNIC接收第二RNIC发送的远程直接访存RDMA写请求,所述RDMA写请求包括第一数据和数据持久化标记,所述RDMA写请求用于请求将所述第一数据写入第一设备,所述数据持久化标记用于指示所述第一数据为待持久化的数据,所述第一RNIC为所述第一设备的RNIC,所述第二RNIC为第二设备的RNIC,所述第一设备和所述第二设备基于RDMA方式通信;The first remote direct access memory card RNIC receives a remote direct access RDMA write request sent by the second RNIC. The RDMA write request includes first data and a data persistence flag, and the RDMA write request is used to request that the first Data is written to a first device, the data persistence flag is used to indicate that the first data is data to be persisted, the first RNIC is an RNIC of the first device, and the second RNIC is a second RNIC of the device, the first device and the second device communicate based on an RDMA mode;所述第一RNIC向第一处理器发送直接内存访问DMA写请求,所述DMA写请求包括所述第一数据,所述DMA写请求用于指示所述第一处理器将所述第一数据写入所述第一设备,所述第一处理器为所述第一设备的处理器,所述第一RNIC和所述第一处理器基于DMA方式通信;The first RNIC sends a direct memory access DMA write request to a first processor, the DMA write request includes the first data, and the DMA write request is used to instruct the first processor to send the first data Write to the first device, the first processor is a processor of the first device, and the first RNIC and the first processor communicate based on a DMA method;所述第一RNIC根据所述数据持久化标记指示所述第一处理器将所述第一数据存储至所述第一设备的非易失性存储器中。The first RNIC instructs the first processor to store the first data in a non-volatile memory of the first device according to the data persistence flag.
- 根据权利要求1所述的方法,其特征在于,所述数据持久化标记为写持久化指令。The method according to claim 1, wherein the data persistence mark is a write persistence instruction.
- 根据权利要求1所述的方法,其特征在于,所述数据持久化标记为所述第一数据对应的目的存储地址,所述目的存储地址对应的存储空间用于存储所述第一数据,所述目的存储地址为所述第一设备中的持久存储地址,所述持久存储地址对应的存储空间用于存储待持久化的数据。The method according to claim 1, wherein the data persistence mark is a destination storage address corresponding to the first data, and a storage space corresponding to the destination storage address is used to store the first data, so The destination storage address is a persistent storage address in the first device, and a storage space corresponding to the persistent storage address is used to store data to be persisted.
- 根据权利要求1-3任一项所述的方法,其特征在于,所述第一RNIC根据所述数据持久化标记指示所述第一处理器将所述第一数据存储至所述第一设备的非易失性存储器中包括:The method according to any one of claims 1-3, wherein the first RNIC instructs the first processor to store the first data to the first device according to the data persistence flag The non-volatile memory includes:所述第一RNIC根据所述数据持久化标记在与所述第二RNIC对应的接收队列中添加DMA最低有效位读请求;Adding, by the first RNIC, a DMA least significant bit read request in a receiving queue corresponding to the second RNIC according to the data persistence flag;所述第一RNIC向所述第一处理器发送所述DMA最低有效位读请求,所述DMA最低有效位读请求用于指示所述第一处理器将缓存在所述第一设备的外设总线链路中的所有数据存储至所述第一设备的非易失性存储器中。Sending, by the first RNIC, the DMA least significant bit read request to the first processor, where the DMA least significant bit read request is used to instruct the first processor to cache in a peripheral of the first device All data in the bus link is stored in a non-volatile memory of the first device.
- 根据权利要求1-3任一项所述的方法,其特征在于,所述第一RNIC根据所述数据持久化标记指示所述第一处理器将所述第一数据存储至所述第一设备的非易失性存储器中包括:The method according to any one of claims 1-3, wherein the first RNIC instructs the first processor to store the first data to the first device according to the data persistence flag The non-volatile memory includes:所述第一RNIC在根据所述数据持久化标记以及所述第一数据生成RDMA接收请求对应的工作队列项WQE之后,清除所述WQE,所述RDMA接收请求用于接收所述第二设备发起的RDMA发送请求;After the first RNIC generates a work queue item WQE corresponding to an RDMA reception request according to the data persistence flag and the first data, the first RNIC clears the WQE, and the RDMA reception request is used to receive the second device initiated RDMA send request;所述第一RNIC生成所述RDMA接收请求对应的完成队列项CQE,所述RDMA接收请求对应的CQE用于指示所述第一处理器将缓存在所述第一处理器的易失性存 储介质中的第一数据存储至所述第一设备的非易失性存储器中。The first RNIC generates a completion queue item CQE corresponding to the RDMA reception request, and the CQE corresponding to the RDMA reception request is used to instruct the first processor to cache in a volatile storage medium of the first processor The first data is stored in a non-volatile memory of the first device.
- 一种数据处理方法,其特征在于,包括:A data processing method, comprising:第二远程直接访存网卡RNIC接收第二处理器发送的远程直接访存RDMA写持久化请求,所述RDMA写持久化请求包括数据持久化标记,所述RDMA写持久化请求用于请求将第一数据存储至第一设备的非易失性存储器中,所述数据持久化标记用于指示所述第一数据为待持久化的数据,所述第二RNIC为第二设备的RNIC,所述第二处理器为所述第二设备的处理器,所述第一设备和所述第二设备基于RDMA方式通信,所述第二RNIC和所述第二处理器基于DMA方式通信;The second remote direct access memory card RNIC receives the remote direct access RDMA write persistence request sent by the second processor. The RDMA write persistence request includes a data persistence tag, and the RDMA write persistence request is used to request that the first A data is stored in a non-volatile memory of a first device, the data persistence flag is used to indicate that the first data is data to be persisted, the second RNIC is an RNIC of a second device, and the A second processor is a processor of the second device, the first device and the second device communicate based on an RDMA method, and the second RNIC and the second processor communicate based on a DMA method;所述第二RNIC根据所述RDMA写持久化请求生成RDMA写请求,所述RDMA写请求包括所述第一数据和所述数据持久化标记;Generating, by the second RNIC, an RDMA write request according to the RDMA write persistence request, where the RDMA write request includes the first data and the data persistence flag;所述第二RNIC向第一RNIC发送所述RDMA写请求,所述RDMA写请求用于请求将所述第一数据写入所述第一设备,所述第一RNIC为所述第一设备的RNIC。Sending, by the second RNIC, the RDMA write request to the first RNIC, where the RDMA write request is used to request the first data to be written to the first device, and the first RNIC is the first device's RNIC.
- 根据权利要求6所述的方法,其特征在于,所述数据持久化标记为写持久化指令。The method according to claim 6, wherein the data persistence mark is a write persistence instruction.
- 根据权利要求6所述的方法,其特征在于,所述数据持久化标记为所述第一数据对应的目的存储地址,所述目的存储地址对应的存储空间用于存储所述第一数据,所述目的存储地址为所述第一设备中的持久存储地址,所述持久存储地址对应的存储空间用于存储待持久化的数据。The method according to claim 6, wherein the data persistence mark is a destination storage address corresponding to the first data, and a storage space corresponding to the destination storage address is used to store the first data, so The destination storage address is a persistent storage address in the first device, and a storage space corresponding to the persistent storage address is used to store data to be persisted.
- 根据权利要求6-8任一项所述的方法,其特征在于,所述第二RNIC向第一RNIC发送所述RDMA写请求之后,还包括:The method according to any one of claims 6-8, wherein after the second RNIC sends the RDMA write request to the first RNIC, the method further comprises:当接收到所述第一RNIC发送的所述第一数据对应的接收确认消息时,所述第二RNIC缓存所述第一数据对应的接收确认消息,所述第一数据对应的接收确认消息包括第一数据包序列号PSN,所述第一PSN为所述第一数据的序列号,所述第一数据对应的接收确认消息用于指示所述第一RNIC接收到所述第一数据;When receiving the reception confirmation message corresponding to the first data sent by the first RNIC, the second RNIC buffers the reception confirmation message corresponding to the first data, and the reception confirmation message corresponding to the first data includes A first data packet sequence number PSN, the first PSN is a sequence number of the first data, and a reception confirmation message corresponding to the first data is used to indicate that the first RNIC receives the first data;所述第二RNIC接收所述第一RNIC发送的第一确认消息,所述第一确认消息包括第二PSN;Receiving, by the second RNIC, a first confirmation message sent by the first RNIC, where the first confirmation message includes a second PSN;当所述第一PSN与所述第二PSN相同时,所述第二RNIC生成所述RDMA写持久化请求对应的完成项CQE,所述RDMA写持久化请求对应的CQE用于通知所述第二处理器所述第一数据已经存储至所述第一设备的非易失性存储器中。When the first PSN is the same as the second PSN, the second RNIC generates a completion item CQE corresponding to the RDMA write persistence request, and the CQE corresponding to the RDMA write persistence request is used to notify the first The first data of the second processor has been stored in a non-volatile memory of the first device.
- 根据权利要求6-8任一项所述的方法,其特征在于,所述第二RNIC向第一RNIC发送所述RDMA写请求之后,还包括:The method according to any one of claims 6-8, wherein after the second RNIC sends the RDMA write request to the first RNIC, the method further comprises:当接收到所述第一RNIC发送的所述第一数据对应的接收确认消息时,所述第二RNIC缓存所述第一数据对应的接收确认消息,所述第一数据对应的接收确认消息包括第一PSN,所述第一PSN为所述第一数据的序列号,所述第一数据对应的接收确认消 息用于指示所述第一RNIC接收到所述第一数据;When receiving the reception confirmation message corresponding to the first data sent by the first RNIC, the second RNIC buffers the reception confirmation message corresponding to the first data, and the reception confirmation message corresponding to the first data includes A first PSN, where the first PSN is a serial number of the first data, and a reception confirmation message corresponding to the first data is used to instruct the first RNIC to receive the first data;所述第二RNIC生成所述RDMA写持久化请求对应的CQE,所述RDMA写持久化请求对应的CQE用于通知所述第二处理器所述第一数据已经写入所述第一设备;Generating, by the second RNIC, a CQE corresponding to the RDMA write persistence request, and the CQE corresponding to the RDMA write persistence request is used to notify the second processor that the first data has been written to the first device;所述第二RNIC接收所述第二处理器发送的RDMA接收请求;Receiving, by the second RNIC, an RDMA reception request sent by the second processor;所述第二RNIC接收所述第一RNIC发送的第一确认消息,所述第一确认消息包括第二PSN;Receiving, by the second RNIC, a first confirmation message sent by the first RNIC, where the first confirmation message includes a second PSN;当所述第二PSN与所述第一PSN相同时,所述第二RNIC生成所述RDMA接收请求对应的CQE,所述RDMA接收请求对应的CQE用于通知所述第二处理器所述第一数据已经存储至所述第一设备的非易失性存储器中。When the second PSN is the same as the first PSN, the second RNIC generates a CQE corresponding to the RDMA reception request, and the CQE corresponding to the RDMA reception request is used to notify the second processor that the first A data has been stored in a non-volatile memory of the first device.
- 一种远程直接访存网卡RNIC,其特征在于,包括:A remote direct access memory card RNIC is characterized in that it includes:接收模块,用于接收第二RNIC发送的远程直接访存RDMA写请求,所述RDMA写请求包括第一数据和数据持久化标记,所述RDMA写请求用于请求将所述第一数据写入第一设备,所述数据持久化标记用于指示所述第一数据为待持久化的数据,所述RNIC为所述第一设备的RNIC,所述第二RNIC为第二设备的RNIC,所述第一设备和所述第二设备基于RDMA方式通信;A receiving module, configured to receive a remote direct access RDMA write request sent by a second RNIC, the RDMA write request includes first data and a data persistence flag, and the RDMA write request is used to request that the first data is written to The first device, the data persistence flag is used to indicate that the first data is data to be persisted, the RNIC is the RNIC of the first device, and the second RNIC is the RNIC of the second device, so The first device and the second device communicate based on an RDMA manner;调度模块,用于向第一处理器发送直接内存访问DMA写请求,所述DMA写请求包括所述第一数据,所述DMA写请求用于指示所述第一处理器将所述第一数据写入所述第一设备,所述第一处理器为所述第一设备的处理器,所述RNIC和所述第一处理器基于DMA方式通信;A scheduling module, configured to send a direct memory access DMA write request to a first processor, where the DMA write request includes the first data, and the DMA write request is used to instruct the first processor to send the first data Write to the first device, the first processor is a processor of the first device, and the RNIC and the first processor communicate based on a DMA method;持久化内存模块,用于根据所述数据持久化标记指示所述第一处理器将所述第一数据存储至所述第一设备的非易失性器中。A persistent memory module, configured to instruct the first processor to store the first data in a non-volatile memory of the first device according to the data persistence flag.
- 根据权利要求11所述的RNIC,其特征在于,所述数据持久化标记为写持久化指令。The RNIC according to claim 11, wherein the data persistence mark is a write persistence instruction.
- 根据权利要求11所述的RNIC,其特征在于,所述数据持久化标记为所述第一数据对应的目的存储地址,所述目的存储地址对应的存储空间用于存储所述第一数据,所述目的存储地址为所述第一设备中的持久存储地址,所述持久存储地址对应的存储空间用于存储待持久化的数据。The RNIC according to claim 11, wherein the data persistence mark is a destination storage address corresponding to the first data, and a storage space corresponding to the destination storage address is used to store the first data, so The destination storage address is a persistent storage address in the first device, and a storage space corresponding to the persistent storage address is used to store data to be persisted.
- 根据权利要求11-13任一项所述的RNIC,其特征在于,所述持久化内存模块具体用于:The RNIC according to any one of claims 11-13, wherein the persistent memory module is specifically configured to:根据所述数据持久化标记在与所述第二RNIC对应的接收队列中添加DMA最低有效位读请求;Adding a DMA least significant bit read request to a receiving queue corresponding to the second RNIC according to the data persistence flag;向所述第一处理器发送所述DMA最低有效位读请求,所述DMA最低有效位读请求用于指示所述第一处理器将缓存在所述第一设备的外设总线链路中的所有数据写入至所述第一设备的非易失性存储器中。Sending the DMA least significant bit read request to the first processor, where the DMA least significant bit read request is used to instruct the first processor to buffer in a peripheral bus link of the first device All data is written into a non-volatile memory of the first device.
- 根据权利要求11-13任一项所述的RNIC,其特征在于,所述持久化内存模块具体用于:The RNIC according to any one of claims 11-13, wherein the persistent memory module is specifically configured to:在根据所述数据持久化标记以及所述第一数据生成RDMA请求对应的所述第一RNIC在根据所述第一数据生成RDMA接收请求对应的工作队列项WQE之后,清除所述WQE,所述RDMA接收请求用于接收所述第二设备发起的RDMA发送请求;After the first RNIC corresponding to generating the RDMA request according to the data persistence flag and the first data generates the work queue item WQE corresponding to the RDMA receiving request according to the first data, clearing the WQE, the The RDMA receiving request is used to receive an RDMA sending request initiated by the second device;所述第一RNIC生成所述RDMA接收请求对应的完成队列项CQE,所述RDMA接收请求对应的CQE用于指示所述第一处理器将缓存在所述第一处理器的易失性存储介质中的第一数据存储至所述第一设备的非易失性存储器中。The first RNIC generates a completion queue item CQE corresponding to the RDMA reception request, and the CQE corresponding to the RDMA reception request is used to instruct the first processor to cache in a volatile storage medium of the first processor The first data is stored in a non-volatile memory of the first device.
- 一种远程直接访存网卡RNIC,其特征在于,包括:A remote direct access memory card RNIC is characterized in that it includes:调度模块,用于接收第二处理器发送的远程直接访存RDMA写持久化请求,所述RDMA写持久化请求包括数据持久化标记,所述RDMA写持久化请求用于请求将第一数据存储至第一设备的非易失性存储器中,所述数据持久化标记用于指示所述第一数据为待持久化的数据,所述RNIC为第二设备的RNIC,所述第二处理器为所述第二设备的处理器,所述第一设备和所述第二设备基于RDMA方式通信,所述RNIC和所述第二处理器基于DMA方式通信;A scheduling module, configured to receive a remote direct access memory RDMA write persistence request sent by a second processor, the RDMA write persistence request includes a data persistence tag, and the RDMA write persistence request is used to request to store the first data In the non-volatile memory to the first device, the data persistence flag is used to indicate that the first data is data to be persisted, the RNIC is an RNIC of a second device, and the second processor is A processor of the second device, the first device and the second device communicate based on an RDMA method, and the RNIC and the second processor communicate based on a DMA method;所述调度模块,还用于根据所述RDMA写持久化请求生成RDMA写请求,所述RDMA写请求包括所述第一数据和所述数据持久化标记;The scheduling module is further configured to generate an RDMA write request according to the RDMA write persistence request, where the RDMA write request includes the first data and the data persistence flag;发送模块,用于向第一RNIC发送所述RDMA写请求,所述RDMA写请求用于请求将所述第一数据写入所述第一设备,所述第一RNIC为所述第一设备的RNIC。A sending module, configured to send the RDMA write request to a first RNIC, where the RDMA write request is used to request the first data to be written to the first device, and the first RNIC is a RNIC.
- 根据权利要求16所述的RNIC,其特征在于,所述数据持久化标记为写持久化指令。The RNIC according to claim 16, wherein the data persistence flag is a write persistence instruction.
- 根据权利要求16所述的RNIC,其特征在于,所述数据持久化标记为所述第一数据对应的目的存储地址,所述目的存储地址对应的存储空间用于存储所述第一数据,所述目的存储地址为所述第一设备中的持久存储地址,所述持久存储地址对应的存储空间用于存储待持久化的数据。The RNIC according to claim 16, wherein the data persistence mark is a destination storage address corresponding to the first data, and a storage space corresponding to the destination storage address is used to store the first data, so The destination storage address is a persistent storage address in the first device, and a storage space corresponding to the persistent storage address is used to store data to be persisted.
- 根据权利要求16-18任一项所述的RNIC,其特征在于,所述RNIC还包括接收模块,所述调度模块还用于:The RNIC according to any one of claims 16 to 18, wherein the RNIC further comprises a receiving module, and the scheduling module is further configured to:在所述接收模块接收到所述第一RNIC发送的所述第一数据对应的接收确认消息的情况下,缓存所述第一数据对应的接收确认消息,所述第一数据对应的接收确认消息包括第一数据包序列号PSN,所述第一PSN为所述第一数据的序列号,所述第一数据对应的接收确认消息用于指示所述第一RNIC接收到所述第一数据;When the receiving module receives a reception confirmation message corresponding to the first data sent by the first RNIC, buffering the reception confirmation message corresponding to the first data, and the reception confirmation message corresponding to the first data Including a first data packet sequence number PSN, where the first PSN is a sequence number of the first data, and a reception confirmation message corresponding to the first data is used to instruct the first RNIC to receive the first data;所述接收模块还用于接收所述第一RNIC发送的第一确认消息,所述第一确认消息包括第二PSN;The receiving module is further configured to receive a first confirmation message sent by the first RNIC, where the first confirmation message includes a second PSN;所述调度模块还用于:在所述第一PSN与所述第二PSN相同的情况下,生成所述RDMA写持久化请求对应的完成项CQE,所述RDMA写持久化请求对应的CQE 用于通知所述第二处理器所述第一数据已经存储至所述第一设备的非易失性存储器。The scheduling module is further configured to: when the first PSN is the same as the second PSN, generate a completion item CQE corresponding to the RDMA write persistence request, and the CQE corresponding to the RDMA write persistence request is For notifying the second processor that the first data has been stored in the non-volatile memory of the first device.
- 根据权利要求16-18任一项所述的RNIC,所述RNIC还包括接收模块,所述调度模块还用于:The RNIC according to any one of claims 16 to 18, further comprising a receiving module, and the scheduling module is further configured to:在所述接收模块接收到所述第一RNIC发送的所述第一数据对应的接收确认消息的情况下,缓存所述第一数据对应的接收确认消息,所述第一数据对应的接收确认消息包括第一PSN,所述第一PSN为所述第一数据的序列号,所述第一数据对应的接收确认消息用于指示所述第一RNIC接收到所述第一数据;When the receiving module receives a reception confirmation message corresponding to the first data sent by the first RNIC, buffering the reception confirmation message corresponding to the first data, and the reception confirmation message corresponding to the first data Including a first PSN, where the first PSN is a serial number of the first data, and a reception confirmation message corresponding to the first data is used to instruct the first RNIC to receive the first data;生成所述RDMA写持久化请求对应的CQE,所述RDMA写持久化请求对应的CQE用于通知所述第二处理器所述第一数据已经写入所述第一设备;Generating a CQE corresponding to the RDMA write persistence request, and the CQE corresponding to the RDMA write persistence request is used to notify the second processor that the first data has been written to the first device;接收所述第二处理器发送的RDMA接收请求;Receiving an RDMA reception request sent by the second processor;所述接收模块还用于:接收所述第一RNIC发送的第一确认消息,所述第一确认消息包括第二PSN;The receiving module is further configured to receive a first confirmation message sent by the first RNIC, where the first confirmation message includes a second PSN;所述调度模块还用于:在所述第二PSN与所述第一PSN相同的情况下,生成所述RDMA接收请求对应的CQE,所述RDMA接收请求对应的CQE用于通知所述第二处理器所述第一数据已经存储至所述第一设备的非易失性存储器中。The scheduling module is further configured to: when the second PSN is the same as the first PSN, generate a CQE corresponding to the RDMA reception request, and the CQE corresponding to the RDMA reception request is used to notify the second The processor's first data has been stored in a non-volatile memory of the first device.
- 一种数据处理方法,其特征在于,包括:A data processing method, comprising:第一远程直接访存网卡RNIC接收第二RNIC发送的远程直接访存RDMA写请求,所述RDMA写请求包括第一数据和数据持久化标记,所述数据持久化标记用于指示所述第一数据为待持久化的数据,所述RDMA写请求用于请求将所述第一数据写入第一设备,所述第一RNIC为所述第一设备的RNIC,所述第二RNIC为第二设备的RNIC,所述第一设备和所述第二设备基于RDMA方式通信;The first remote direct access memory card RNIC receives a remote direct access RDMA write request sent by the second RNIC, the RDMA write request includes first data and a data persistence tag, and the data persistence tag is used to indicate the first The data is data to be persisted, and the RDMA write request is used to request the first data to be written to the first device, the first RNIC is the RNIC of the first device, and the second RNIC is the second RNIC of the device, the first device and the second device communicate based on an RDMA mode;所述第一RNIC向第一处理器发送直接内存访问DMA写请求,所述DMA写请求包括所述第一数据,所述DMA写请求用于指示所述第一处理器将所述第一数据写入所述第一设备,所述第一处理器为所述第一设备的处理器,所述第一RNIC和所述第一处理器基于DMA方式通信;The first RNIC sends a direct memory access DMA write request to a first processor, the DMA write request includes the first data, and the DMA write request is used to instruct the first processor to send the first data Write to the first device, the first processor is a processor of the first device, and the first RNIC and the first processor communicate based on a DMA method;所述第一RNIC根据所述数据持久化标记指示所述第一处理器将所述第一数据存储至所述第一设备的非易失性存储器。The first RNIC instructs the first processor to store the first data to a non-volatile memory of the first device according to the data persistence flag.
- 根据权利要求21所述权要,所述第一RNIC根据所述数据持久化标记指示所述第一处理器将所述第一数据存储至所述第一设备的非易失性存储器,包括:The claim according to claim 21, wherein the first RNIC instructs the first processor to store the first data to a non-volatile memory of the first device according to the data persistence flag, comprising:在与所述第二RNIC对应的接收队列中添加DMA读请求;Adding a DMA read request to a receiving queue corresponding to the second RNIC;向所述第一处理器发送所述DMA读请求,所述DMA读请求用于指示所述第一处理器将缓存在所述第一设备的外设总线链路中的所有数据存储至所述第一设备的非易失性存储器中。Sending the DMA read request to the first processor, where the DMA read request is used to instruct the first processor to store all data buffered in a peripheral bus link of the first device to the first processor In the non-volatile memory of the first device.
- 根据权利要求21所述的方法,其特征在于,所述第一RNIC指示所述第一处理器将所述第一数据存储至所述第一设备的非易失性存储器,包括:The method according to claim 21, wherein the first RNIC instructs the first processor to store the first data to a non-volatile memory of the first device, comprising:所述第一RNIC在根据所述第一数据生成RDMA接收请求对应的工作队列项Generating, by the first RNIC, a work queue entry corresponding to an RDMA reception request according to the first dataWQE之后,清除所述WQE,所述RDMA接收请求用于接收所述第二设备发起的RDMA发送请求;After WQE, the WQE is cleared, and the RDMA reception request is used to receive an RDMA transmission request initiated by the second device;所述第一RNIC生成所述RDMA接收请求对应的完成队列项CQE,所述RDMA接收请求对应的CQE用于指示所述第一处理器将缓存在所述第一处理器的易失性存储介质中的第一数据存储至所述第一设备的非易失性存储器。The first RNIC generates a completion queue item CQE corresponding to the RDMA reception request, and the CQE corresponding to the RDMA reception request is used to instruct the first processor to cache in a volatile storage medium of the first processor The first data is stored in a non-volatile memory of the first device.
- 一种远程直接访存网卡RNIC,其特征在于,包括:A remote direct access memory card RNIC is characterized in that it includes:第一远程直接访存网卡RNIC接收第二RNIC发送的远程直接访存RDMA写请求,所述RDMA写请求包括第一数据,所述RDMA写请求用于请求将所述第一数据写入第一设备,所述第一RNIC为所述第一设备的RNIC,所述第二RNIC为第二设备的RNIC,所述第一设备和所述第二设备基于RDMA方式通信;The first remote direct access memory card RNIC receives a remote direct access RDMA write request sent by the second RNIC, the RDMA write request includes first data, and the RDMA write request is used to request that the first data be written to the first Device, the first RNIC is the RNIC of the first device, the second RNIC is the RNIC of the second device, and the first device and the second device communicate based on the RDMA method;所述第一RNIC向第一处理器发送直接内存访问DMA写请求,所述DMA写请求包括所述第一数据,所述DMA写请求用于指示所述第一处理器将所述第一数据写入所述第一设备,所述第一处理器为所述第一设备的处理器,所述第一RNIC和所述第一处理器基于DMA方式通信;The first RNIC sends a direct memory access DMA write request to a first processor, the DMA write request includes the first data, and the DMA write request is used to instruct the first processor to send the first data Write to the first device, the first processor is a processor of the first device, and the first RNIC and the first processor communicate based on a DMA method;所述第一RNIC指示所述第一处理器将所述第一数据存储至所述第一设备的非易失性存储器。The first RNIC instructs the first processor to store the first data to a non-volatile memory of the first device.
- 根据权利要求24所述的方法,其特征在于,所述第一RNIC指示所述第一处理器将所述第一数据存储至所述第一设备的非易失性存储器中包括:The method according to claim 24, wherein the first RNIC instructs the first processor to store the first data in a non-volatile memory of the first device comprises:所述第一RNIC根据所述数据持久化标记在与所述第二RNIC对应的接收队列中添加DMA最低有效位读请求;Adding, by the first RNIC, a DMA least significant bit read request in a receiving queue corresponding to the second RNIC according to the data persistence flag;所述第一RNIC向所述第一处理器发送所述DMA最低有效位读请求,所述DMA最低有效位读请求用于指示所述第一处理器将缓存在所述第一设备的外设总线链路中的所有数据存储至所述第一设备的非易失性存储器中。Sending, by the first RNIC, the DMA least significant bit read request to the first processor, where the DMA least significant bit read request is used to instruct the first processor to cache in a peripheral of the first device All data in the bus link is stored in a non-volatile memory of the first device.
- 一种数据处理方法,其特征在于,包括:A data processing method, comprising:第一远程直接访存网卡RNIC接收第二RNIC发送的远程直接访存RDMA写请求,所述RDMA写请求包括第一数据,所述RDMA写请求用于请求将所述第一数据写入第一设备,所述第一RNIC为所述第一设备的RNIC,所述第二RNIC为第二设备的RNIC,所述第一设备和所述第二设备基于RDMA方式通信;The first remote direct access memory card RNIC receives a remote direct access RDMA write request sent by the second RNIC, the RDMA write request includes first data, and the RDMA write request is used to request that the first data be written to the first Device, the first RNIC is the RNIC of the first device, the second RNIC is the RNIC of the second device, and the first device and the second device communicate based on the RDMA method;所述第一RNIC向第一处理器发送直接内存访问DMA写请求,所述DMA写请求包括所述第一数据,所述DMA写请求用于指示所述第一处理器将所述第一数据写入所述第一设备,所述第一处理器为所述第一设备的处理器,所述第一RNIC和所述第一处理器基于DMA方式通信;The first RNIC sends a direct memory access DMA write request to a first processor, the DMA write request includes the first data, and the DMA write request is used to instruct the first processor to send the first data Write to the first device, the first processor is a processor of the first device, and the first RNIC and the first processor communicate based on a DMA method;所述第一RNIC指示所述第一处理器将所述第一数据存储至所述第一设备的非易失性存储器。The first RNIC instructs the first processor to store the first data to a non-volatile memory of the first device.
- 根据权利要求26所述数据处理方法,所述第一RNIC指示所述第一处理器将所述第一数据存储至所述第一设备的非易失性存储器,包括:The data processing method according to claim 26, wherein the first RNIC instructs the first processor to store the first data to a non-volatile memory of the first device, comprising:在与所述第二RNIC对应的接收队列中添加DMA读请求;Adding a DMA read request to a receiving queue corresponding to the second RNIC;向所述第一处理器发送所述DMA读请求,所述DMA读请求用于指示所述第一处理器将缓存在所述第一设备的外设总线链路中的所有数据存储至所述第一设备的非易失性存储器。Sending the DMA read request to the first processor, where the DMA read request is used to instruct the first processor to store all data buffered in a peripheral bus link of the first device to the first processor Non-volatile memory of the first device.
- 一种第一设备,其特征在于,包括处理器、非易失性存储器以及远程直接访存网卡,所述远程直接访存网卡用于执行如权利要求1-5和21-27任一项所述的数据处理方法的操作步骤。A first device, characterized in that it comprises a processor, a non-volatile memory, and a remote direct access network card, the remote direct access network card being configured to perform the operations according to any one of claims 1-5 and 21-27. The operation steps of the data processing method described above.
- 一种第二设备,其特征在于,包括处理器、非易失性存储器以及远程直接访存网卡,所述远程直接访存网卡用于执行如权利要求6-10任一项所述的数据处理方法的操作步骤。A second device, comprising a processor, a non-volatile memory, and a remote direct access network card, wherein the remote direct access network card is configured to perform data processing according to any one of claims 6 to 10. Steps of the method.
- 一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述权利要求1-10和21-27任一所述的数据处理方法的操作步骤。A computer-readable storage medium having instructions stored in the computer-readable storage medium, which when run on a computer, causes the computer to perform the operations of the data processing method according to any one of claims 1-10 and 21-27 step.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810674724.2A CN110647480B (en) | 2018-06-26 | 2018-06-26 | Data processing method, remote direct access network card and equipment |
CN201810674724.2 | 2018-06-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020001459A1 true WO2020001459A1 (en) | 2020-01-02 |
Family
ID=68984718
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/092937 WO2020001459A1 (en) | 2018-06-26 | 2019-06-26 | Data processing method, remote direct memory access network card, and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110647480B (en) |
WO (1) | WO2020001459A1 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112597094A (en) * | 2020-12-24 | 2021-04-02 | 联想长风科技(北京)有限公司 | Device and method for improving RDMA transmission efficiency |
EP3913488A1 (en) * | 2020-05-19 | 2021-11-24 | Huawei Technologies Co., Ltd. | Data processing method and device |
CN114024874A (en) * | 2021-10-29 | 2022-02-08 | 浪潮商用机器有限公司 | RDMA (remote direct memory Access) -based data transmission method, device, equipment and storage medium |
CN114328730A (en) * | 2021-12-28 | 2022-04-12 | 浪潮卓数大数据产业发展有限公司 | Data history recording method, device and storage medium supporting high concurrency |
US11755513B2 (en) | 2020-05-19 | 2023-09-12 | Huawei Technologies Co., Ltd. | Data processing and writing method based on virtual machine memory identification field and devise |
EP4155925A4 (en) * | 2020-06-28 | 2023-12-06 | Huawei Technologies Co., Ltd. | Data transmission method, processor system, and memory access system |
EP4227816A4 (en) * | 2020-10-28 | 2024-04-10 | Huawei Technologies Co., Ltd. | Network interface card, controller, storage device, and packet transmission method |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111262917A (en) * | 2020-01-13 | 2020-06-09 | 苏州浪潮智能科技有限公司 | Remote data moving device and method based on FPGA cloud platform |
CN111367721A (en) * | 2020-03-06 | 2020-07-03 | 西安奥卡云数据科技有限公司 | Efficient remote copying system based on nonvolatile memory |
CN111381780A (en) * | 2020-03-06 | 2020-07-07 | 西安奥卡云数据科技有限公司 | Efficient byte access storage system for persistent storage |
CN111404931B (en) * | 2020-03-13 | 2021-03-30 | 清华大学 | Remote data transmission method based on persistent memory |
CN111581136B (en) * | 2020-05-08 | 2023-04-14 | 上海琪埔维半导体有限公司 | DMA controller and implementation method thereof |
CN113971138A (en) * | 2020-07-24 | 2022-01-25 | 华为技术有限公司 | Data access method and related equipment |
CN112083881B (en) * | 2020-08-24 | 2022-10-18 | 云南财经大学 | Integrated astronomical data acquisition and storage system based on persistent memory |
WO2022061763A1 (en) * | 2020-09-25 | 2022-03-31 | 华为技术有限公司 | Data storage method and apparatus |
CN114610678A (en) * | 2020-12-08 | 2022-06-10 | 华为技术有限公司 | File access method, storage node and network card |
US11994997B2 (en) | 2020-12-23 | 2024-05-28 | Intel Corporation | Memory controller to manage quality of service enforcement and migration between local and pooled memory |
CN116569154A (en) * | 2020-12-30 | 2023-08-08 | 华为技术有限公司 | Data transmission method and related device |
CN113590196B (en) * | 2021-07-29 | 2024-05-31 | 深圳宏芯宇电子股份有限公司 | Data processing method and device based on 3D xPoint memory and readable storage medium |
CN114285676B (en) * | 2021-11-24 | 2023-10-20 | 中科驭数(北京)科技有限公司 | Intelligent network card, network storage method and medium of intelligent network card |
CN114356219A (en) * | 2021-12-08 | 2022-04-15 | 阿里巴巴(中国)有限公司 | Data processing method, storage medium and processor |
CN114153401A (en) * | 2021-12-09 | 2022-03-08 | 建信金融科技有限责任公司 | Remote persistence method and remote persistence system for data of sending end or receiving end |
CN116610598A (en) * | 2022-02-08 | 2023-08-18 | 华为技术有限公司 | Data storage system, data storage method, data storage device and related equipment |
CN114979022B (en) * | 2022-05-20 | 2023-07-28 | 北京百度网讯科技有限公司 | Method, device, adapter and storage medium for realizing remote direct data access |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101021791A (en) * | 2007-03-12 | 2007-08-22 | 华为技术有限公司 | Method and apparatus for realizing distributed object persistence and compiling unit |
CN105408880A (en) * | 2013-07-31 | 2016-03-16 | 甲骨文国际公司 | Direct access to persistent memory of shared storage |
CN105938458A (en) * | 2016-04-13 | 2016-09-14 | 上海交通大学 | Software-defined heterogeneous hybrid memory management method |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100083247A1 (en) * | 2008-09-26 | 2010-04-01 | Netapp, Inc. | System And Method Of Providing Multiple Virtual Machines With Shared Access To Non-Volatile Solid-State Memory Using RDMA |
CN102681952B (en) * | 2012-05-12 | 2015-02-18 | 北京忆恒创源科技有限公司 | Method for writing data into memory equipment and memory equipment |
CN110457236B (en) * | 2014-03-28 | 2020-06-30 | 三星电子株式会社 | Storage system and method for executing and verifying write protection of storage system |
US10489158B2 (en) * | 2014-09-26 | 2019-11-26 | Intel Corporation | Processors, methods, systems, and instructions to selectively fence only persistent storage of given data relative to subsequent stores |
CN105472023B (en) * | 2014-12-31 | 2018-11-20 | 华为技术有限公司 | A kind of method and device of direct distance input and output |
CN106775434B (en) * | 2015-11-19 | 2019-11-29 | 华为技术有限公司 | A kind of implementation method, terminal, server and the system of NVMe networking storage |
-
2018
- 2018-06-26 CN CN201810674724.2A patent/CN110647480B/en active Active
-
2019
- 2019-06-26 WO PCT/CN2019/092937 patent/WO2020001459A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101021791A (en) * | 2007-03-12 | 2007-08-22 | 华为技术有限公司 | Method and apparatus for realizing distributed object persistence and compiling unit |
CN105408880A (en) * | 2013-07-31 | 2016-03-16 | 甲骨文国际公司 | Direct access to persistent memory of shared storage |
CN105938458A (en) * | 2016-04-13 | 2016-09-14 | 上海交通大学 | Software-defined heterogeneous hybrid memory management method |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3913488A1 (en) * | 2020-05-19 | 2021-11-24 | Huawei Technologies Co., Ltd. | Data processing method and device |
US11755513B2 (en) | 2020-05-19 | 2023-09-12 | Huawei Technologies Co., Ltd. | Data processing and writing method based on virtual machine memory identification field and devise |
EP4155925A4 (en) * | 2020-06-28 | 2023-12-06 | Huawei Technologies Co., Ltd. | Data transmission method, processor system, and memory access system |
EP4227816A4 (en) * | 2020-10-28 | 2024-04-10 | Huawei Technologies Co., Ltd. | Network interface card, controller, storage device, and packet transmission method |
CN112597094A (en) * | 2020-12-24 | 2021-04-02 | 联想长风科技(北京)有限公司 | Device and method for improving RDMA transmission efficiency |
CN112597094B (en) * | 2020-12-24 | 2024-05-31 | 联想长风科技(北京)有限公司 | Device and method for improving RDMA transmission efficiency |
CN114024874A (en) * | 2021-10-29 | 2022-02-08 | 浪潮商用机器有限公司 | RDMA (remote direct memory Access) -based data transmission method, device, equipment and storage medium |
CN114024874B (en) * | 2021-10-29 | 2023-03-14 | 浪潮商用机器有限公司 | RDMA (remote direct memory Access) -based data transmission method, device, equipment and storage medium |
CN114328730A (en) * | 2021-12-28 | 2022-04-12 | 浪潮卓数大数据产业发展有限公司 | Data history recording method, device and storage medium supporting high concurrency |
Also Published As
Publication number | Publication date |
---|---|
CN110647480B (en) | 2023-10-13 |
CN110647480A (en) | 2020-01-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020001459A1 (en) | Data processing method, remote direct memory access network card, and device | |
US11025544B2 (en) | Network interface for data transport in heterogeneous computing environments | |
US10331595B2 (en) | Collaborative hardware interaction by multiple entities using a shared queue | |
US9696942B2 (en) | Accessing remote storage devices using a local bus protocol | |
CN111506534B (en) | Multi-core bus architecture with non-blocking high performance transaction credit system | |
US9727503B2 (en) | Storage system and server | |
US20220263913A1 (en) | Data center cluster architecture | |
WO2018076793A1 (en) | Nvme device, and methods for reading and writing nvme data | |
US10540306B2 (en) | Data copying method, direct memory access controller, and computer system | |
TW201905714A (en) | Method of operating computing system, computing system, vehicle and computer readable medium for direct i/o operation with co-processor memory on a storage device | |
TWI506444B (en) | Processor and method to improve mmio request handling | |
US8352656B2 (en) | Handling atomic operations for a non-coherent device | |
US20230017643A1 (en) | Composable infrastructure enabled by heterogeneous architecture, delivered by cxl based cached switch soc | |
CN115495389B (en) | Memory controller, calculation memory device, and operation method of calculation memory device | |
US10613999B2 (en) | Device, system and method to access a shared memory with field-programmable gate array circuitry without first storing data to computer node | |
CN113468090B (en) | PCIe communication method and device, electronic equipment and readable storage medium | |
CN111880925A (en) | Techniques for providing out-of-band processor telemetry | |
EP4235441A1 (en) | System, method and apparatus for peer-to-peer communication | |
WO2014202003A1 (en) | Data transmission method, device and system of data storage system | |
CN118435178A (en) | Hardware management of direct memory access commands | |
CN108234147B (en) | DMA broadcast data transmission method based on host counting in GPDSP | |
WO2020247240A1 (en) | Extended memory interface | |
CN116483259A (en) | Data processing method and related device | |
TW202008172A (en) | Memory system | |
WO2022133656A1 (en) | Data processing apparatus and method, and related device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19827324 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19827324 Country of ref document: EP Kind code of ref document: A1 |