WO2022151820A1 - Data transmission system, data transmission method, and network device - Google Patents

Data transmission system, data transmission method, and network device Download PDF

Info

Publication number
WO2022151820A1
WO2022151820A1 PCT/CN2021/129667 CN2021129667W WO2022151820A1 WO 2022151820 A1 WO2022151820 A1 WO 2022151820A1 CN 2021129667 W CN2021129667 W CN 2021129667W WO 2022151820 A1 WO2022151820 A1 WO 2022151820A1
Authority
WO
WIPO (PCT)
Prior art keywords
network device
data
pieces
message
host
Prior art date
Application number
PCT/CN2021/129667
Other languages
French (fr)
Chinese (zh)
Inventor
卢胜文
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022151820A1 publication Critical patent/WO2022151820A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/06Notations for structuring of protocol data, e.g. abstract syntax notation one [ASN.1]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45583Memory management, e.g. access or allocation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances

Definitions

  • the present application relates to the field of network communication technologies, and in particular, to a data transmission system, a data transmission method, and a network device.
  • the Remote Direct Memory Access (RDMA) protocol is a protocol that transfers data directly from one system to another in memory over a network without operating system intervention.
  • the RDMA protocol encapsulates the data to be transmitted into one or more RDMA packets, and sends the one or more RDMA packets from the sender to the receiver.
  • RDMA transfer is to transfer data from one system to the memory of another system through a send queue (SQ), wherein the data in the SQ only includes the data of one virtual machine (VM).
  • SQL send queue
  • the present application provides a data transmission system, a data transmission method and a network device for improving transmission efficiency.
  • a first aspect of the present application provides a data transmission system, the system includes: the data transmission system includes a first network device and a second network device, the first network device is set on the first host, and the second network device is set on the second network device On the host, there are N virtual machines running on the first host; the first network device is used to obtain N data, the N data comes from the N VMs, and the N data and the N data are converted according to the remote direct memory access RDMA protocol.
  • the write address of the data is encapsulated into a packet, and the packet is sent to the second network device, where N is an integer greater than 1; the second network device is used to receive the packet and decapsulate the packet to obtain N data and N data write addresses are stored on the second host according to the N data write addresses.
  • N data of N VMs running on the first host need to be transferred to the second host through RDMA, wherein the N VMs are VMs that can normally extract data, and the first network in the first host
  • the device may encapsulate the above-mentioned N pieces of data and the write addresses of the N pieces of data according to the RDMA protocol, and then send the encapsulated packet to the second network device on the second host.
  • the second network device may decapsulate the packet, extract the N data and the write address of the N data, and store the N data in the location indicated by the write address.
  • the first network device can directly send the data of the multiple virtual machines to the second network device to improve transmission efficiency.
  • the first network device is configured to acquire the identifiers and memory addresses of the N VMs; and acquire N pieces of data according to the identifiers and memory addresses of the N VMs.
  • the first network device can directly obtain N pieces of data according to the VM identifiers and VM memory addresses of the N pieces of data that it wants to RDMA to transmit to the second network device, and the write address received by the second network device includes VM. ID and VM memory address, the second network device can directly use the write address to store N pieces of data without going through a chip logical address (CLA), which can reduce the processing flow.
  • CLA chip logical address
  • an abnormal VM exists on the first host, and data cannot be obtained according to the identifier and memory address of the abnormal VM; the first network device is configured to obtain data according to the identifier and memory address of the abnormal VM and some of the N VMs.
  • the identifier and memory address of the VM are obtained, and M pieces of data are obtained, where M is a positive integer less than N; the M pieces of data are encapsulated into an exception message according to the RDMA protocol.
  • the first network device cannot obtain data according to the identification and memory address of the abnormal VM, that is, according to the identification and memory of the abnormal VM.
  • An exception message generated by encapsulating the address, the identifiers of some of the VMs in the N VMs, and the memory address cannot be sent to the second network device through a queue pair (queue pair, QP) message.
  • the first network device is configured to generate a packet sequence, where the packet sequence includes an abnormal packet and at least one packet.
  • the first network device when an abnormal VM exists in the N VMs, the first network device needs to generate a message sequence from the abnormal message and at least one of the above messages, so that the first network device can generate a message sequence according to the message of the message sequence.
  • the sequence number sends the abnormal message and the message in sequence, so as to improve the practicability of the solution.
  • the first network device is configured to modify the packet sequence, wherein modifying the packet sequence includes deleting abnormal packets, and adding padding packets to the packet sequence.
  • the present application can delete abnormal packets in the packet sequence, and then supplement the packet sequence with padding packets, which are invalid packets, so as to keep the data length unchanged and avoid retransmission due to inconsistent data lengths.
  • the second network device is configured to receive the modified packet sequence, determine the padding packet in the modified packet sequence, and delete the padding packet.
  • the second network device after receiving the above-mentioned modified packet sequence, can determine the invalid packet by checking, that is, determine the padding packet in the modified packet sequence, and then delete the modified packet.
  • the padding message is used to improve the reliability of message transmission.
  • a second aspect of the present application provides a data transmission method, the method includes: a first network device acquires N pieces of data, the first network device is set on a first host, and N virtual machines VM run on the first host, and N The data comes from N VMs, where N is an integer greater than 1; the first network device encapsulates the N data and the write address of the N data into one message according to the remote direct memory access RDMA protocol; the first network device Send the message to the second network device.
  • an abnormal VM runs on the first host, and the abnormal VM cannot obtain data according to the identifier and memory address of the abnormal VM.
  • the method further includes: the first network device according to the identifier and memory address of the abnormal VM and the The identifiers and memory addresses of some VMs in the N VMs are obtained, and M pieces of data are obtained, wherein M is a positive integer less than N; the first network device encapsulates the M pieces of data into an exception message according to the RDMA protocol.
  • the method further includes: the first network device generates a packet sequence, where the packet sequence includes an abnormal packet and at least one packet.
  • the method further includes: the first network device modifies the packet sequence, wherein modifying the packet sequence includes deleting abnormal packets, and adding padding packets to the packet sequence.
  • a third aspect of the present application provides a data transmission method, the method includes: a second network device receives a message from a first network device, the first network device is set on the first host, and the second network device is set on a second network device On the host, there are N virtual machines running on the first host, and the message is a message generated by encapsulating the N data and the write address of the N data according to the remote direct memory access RDMA protocol, and the N data comes from the N data.
  • VM where N is an integer greater than 1
  • the second network device decapsulates the packet to obtain N data and N data write addresses
  • the second network device decapsulates the N data in the first N pieces of data are stored on the second host.
  • the method further includes: the second network device receives a modified packet sequence, where the modified packet sequence includes a padding packet; and the second network device determines a packet in the modified packet sequence. Padding packets, and deleting padding packets.
  • a fourth aspect of the present application provides a network device, comprising: an acquisition unit configured to acquire N pieces of data, the network device is set on a first host, N virtual machines VM are running on the first host, and the N pieces of data come from N VMs, where N is an integer greater than 1; the encapsulation unit is used to encapsulate the N data and the write addresses of the N data into a message according to the remote direct memory access RDMA protocol; the sending unit is used to encapsulate the message The message is sent to the second network device.
  • the network device is configured to execute the method of the second aspect or any one of the implementation manners of the second aspect.
  • a fifth aspect of the present application provides a network device, including: a receiving unit configured to receive a message from a first network device, the first network device is set on the first host, the network device is set on the second host, and the first network device is set on the second host.
  • a receiving unit configured to receive a message from a first network device, the first network device is set on the first host, the network device is set on the second host, and the first network device is set on the second host.
  • N virtual machine VMs running on a host
  • the message is a message generated by encapsulating N data and N data write addresses according to the remote direct memory access RDMA protocol.
  • the N data comes from N VMs, where, N is an integer greater than 1; the decapsulation unit is used to decapsulate the message to obtain the N data and the write address of the N data; the storage unit is used to store the N data in the second according to the write address of the N data N pieces of data are stored on the host.
  • the network device is configured to execute the method of the third aspect or any one of the implementation manners of the third aspect.
  • a sixth aspect of the present application provides a network device, including: a processor, a memory, and a communication interface, where the processor is configured to execute instructions stored in the memory, so that the network device executes the second aspect or any one of the second aspects
  • the method provided by the optional manner, the communication interface is used for receiving or sending an indication.
  • the network device including: a processor, a memory, and a communication interface, where the processor is configured to execute instructions stored in the memory, so that the network device executes the second aspect or any one of the second aspects
  • the communication interface is used for receiving or sending an indication.
  • a seventh aspect of the present application provides a network device, including: a processor, a memory, and a communication interface, where the processor is configured to execute instructions stored in the memory, so that the network device executes the third aspect or any one of the third aspects
  • the method provided by the optional manner, the communication interface is used for receiving or sending an indication.
  • the network device including: a processor, a memory, and a communication interface, where the processor is configured to execute instructions stored in the memory, so that the network device executes the third aspect or any one of the third aspects
  • the communication interface is used for receiving or sending an indication.
  • An eighth aspect of the present application provides a computer-readable storage medium, where a program is stored in the computer-readable storage medium.
  • the computer executes the program, the computer executes the second aspect or any optional manner of the second aspect. method.
  • a ninth aspect of the present application provides a computer-readable storage medium, where a program is stored in the computer-readable storage medium, and when the computer executes the program, the computer executes the third aspect or any optional manner provided by the third aspect. method.
  • a tenth aspect of the present application provides a computer program product.
  • the computer program product When the computer program product is executed on a computer, the computer executes the method provided in the second aspect or any optional manner of the second aspect.
  • An eleventh aspect of the present application provides a computer program product.
  • the computer program product When the computer program product is executed on a computer, the computer executes the method provided in the third aspect or any optional manner of the third aspect.
  • FIG. 1 is a system frame diagram of a data transmission method provided by an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of a data transmission system provided by an embodiment of the present application.
  • FIG. 3 is an embodiment of a data transmission method provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a CLA address provided in an embodiment of the present application.
  • FIG. 5 is another embodiment of a data transmission method provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a combined strip provided in an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a padding message provided by an embodiment of the present application.
  • FIG. 8 is another schematic diagram of filling a filling message provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a network device provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a network device provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a network device provided by an embodiment of the application.
  • FIG. 12 is a schematic structural diagram of a network device provided by an embodiment of the present application.
  • Embodiments of the present application provide a data transmission system, a data transmission method, and a network device, which are used to reduce processing flow and memory occupation.
  • the embodiments of the present application will be described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Those of ordinary skill in the art know that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
  • the data transmission method of the embodiment of the present application is mainly applicable to the application scenario of RDMA transmission.
  • the system does not perform a data copy action, which reduces the amount of time spent in the kernel space and in the processing of network communication. The number of user space context switches. Without any kernel memory participation, RDMA requests are sent from the application running in user space to the local network card, and then sent to the remote network card through the network. Therefore, RDMA transmission does not require the participation of the operating system and will not increase the system load.
  • FIG. 1 is a system frame diagram of a data transmission method according to an embodiment of the present application.
  • the figure shows a transmission scenario of RDMA transmission: the first application fetches data from the memory to generate an RDMA message, and sends the RDMA message to the It is sent to the local network card through the cache, and then transmitted to the remote network card through the network.
  • the remote network card caches the received RDMA message, and the second application program takes out the data from the cache and writes it into the memory.
  • the process of reading the data in the memory of the second application by the first application is similar to the description of the above-mentioned writing process, and details are not repeated here.
  • the network card includes an RDMA-capable network interface card (RDMA network interface card, RNIC) or a host channel adapter (host channel adapter, HCA).
  • RDMA transfer is to transfer data from one system to the memory of another system through a send queue (SQ), wherein the data in the SQ only includes the data of a virtual machine (VM), when a service
  • SQ send queue
  • VM virtual machine
  • an embodiment of the present application provides a data transmission system, the structure of which can be seen in FIG. 2 , the data transmission system includes a first network device 21 and a second network device 22 , and the first network device 21 is arranged on the first network device 21 .
  • the second network device 22 is set on the second host 221, and N virtual machines 2111 run on the first host 211.
  • the first network device 21 is used to obtain N pieces of data, and the N pieces of data come from N pieces of VM 2111. According to the remote direct memory access RDMA protocol, the N pieces of data and the write addresses of the N pieces of data are encapsulated into a message, and the report The message is sent to the second network device 22, where N is an integer greater than 1.
  • the second network device 22 is configured to receive the message, decapsulate the message to obtain N data and N data write addresses, and store N data on the second host 221 according to the N data write addresses data.
  • N data of N VMs 2111 running on the first host 211 need to be transferred to the second host 221 through RDMA, wherein the N VMs 2111 are VMs that can normally extract data, and the first host 211
  • a network device 21 may encapsulate the above-mentioned N pieces of data and the write addresses of the N pieces of data according to the RDMA protocol, and then send the packet generated by encapsulation to the second network device 22 on the second host 221 .
  • the second network device 22 may decapsulate the packet, extract the N pieces of data and the write addresses of the N pieces of data, and store the N pieces of data in the positions indicated by the write addresses.
  • the first network device 21 can directly send data of multiple virtual machines to the second network device 22, which improves transmission efficiency.
  • the first network device 21 is used to acquire the identifiers and memory addresses of the N VMs 2111; and acquire N pieces of data according to the identifiers and memory addresses of the N VMs.
  • the first network device 21 can directly acquire N pieces of data according to the VM identifiers and VM memory addresses of the N pieces of data that it wants to RDMA to transmit to the second network device 22, and the write address received by the second network device 22 includes the VM identifier. and the VM memory address, the second network device 22 can directly use the write address to store N pieces of data without going through a chip logical address (chip logical address, CLA), which can reduce the processing flow.
  • chip logical address, CLA chip logical address
  • the first network device 21 is used for the identification and memory address of the abnormal VM and some VMs in the N VMs 2111.
  • the identifier and memory address are obtained, and M pieces of data are obtained, where M is a positive integer less than N; the M pieces of data are encapsulated into an exception message according to the RDMA protocol.
  • the first network device 21 cannot obtain data according to the identification and memory address of the abnormal VM, that is, according to the identification and memory address of the abnormal VM and
  • the abnormal packets generated by the identification and memory address encapsulation of some VMs in the N VMs cannot be sent to the second network device 22 through a queue pair (queue pair, QP) message.
  • the first network device 21 is configured to generate a packet sequence, where the packet sequence includes an abnormal packet and at least one packet.
  • the first network device 21 needs to generate a message sequence from the abnormal message and at least one of the foregoing messages, so that the first network device 21 can generate a message sequence according to the message sequence number of the message sequence.
  • the above-mentioned abnormal message and the above-mentioned message are sent in sequence, so as to improve the implementability of the solution.
  • the first network device 21 is configured to modify the packet sequence, where modifying the packet sequence includes deleting abnormal packets and adding padding packets to the packet sequence.
  • the present application can delete abnormal packets in the packet sequence, and then supplement the packet sequence with padding packets, which are invalid packets, so as to keep the data length unchanged and avoid retransmission due to inconsistent data lengths.
  • the second network device 22 is configured to receive the modified packet sequence, determine the padding packet in the modified packet sequence, and delete the padding packet.
  • the second network device 22 can determine the invalid packet by checking, that is, determine the padding packet in the modified packet sequence, and then delete the padding packet. message to improve the reliability of message transmission.
  • the first network device may directly send data of multiple VMs to the second network device, thereby improving transmission efficiency.
  • the VMs running on the first host where the first network device is set may all be VMs that can normally extract data, and may also include abnormal VMs that cannot extract data, which will be described separately below.
  • the VMs running on the first host are all VMs that can extract data normally.
  • the first network device acquires N pieces of data.
  • N (N is an integer greater than 1) pieces of data are data that the first network device needs to transmit and write to the second network device through RDMA, and the N pieces of data store the N data of the first host of the first network device.
  • the first network device may obtain the N pieces of data from the N VMs according to the identifiers and memory addresses of the N VMs where the N pieces of data are located.
  • the N VMs are all VMs that can normally extract data.
  • the first network device encapsulates the N pieces of data and the write addresses of the N pieces of data into one packet according to the RDMA protocol.
  • the above-mentioned encapsulated message may be an RDMA write message
  • the RDMA write message includes the following three types: RDMA write First, RDMA write Middle, and RDMA write Last
  • the RDMA extended transport header (RDMAextended transport header, RETH) is a field in the RDMA message format, which carries the destination address of the message data
  • the base transport header (BTH) is a field in the RDMA message format, including PSN. Therefore, the RDMA write First message includes the packet sequence number (PSN) and the write address, while the BTH field exists in the RDMA write Middle and RDMA write Last messages.
  • PSN packet sequence number
  • BTH packet sequence number
  • the first network device may complete the creation of the QP for RDMA in advance, which may be created either on a virtual machine manager (Hypervisor) of the host, or on another central processing unit (central processing unit, CPU) or On the device, at the same time, the RDMA hardware device (RNIC or HCA or other hardware device with RDMA capability) itself can directly access the memory space of the VM.
  • the RDMA transmission hardware is limited to the CLA
  • the first network device creates a CLA for RDMA hardware access in advance.
  • R_Key used for reading the memory of the remote device, L_key and R_Key can also use the same value
  • the corresponding CLA is unique.
  • the registered registry includes L_key, R_key, CLA start address, VM identifier, VM memory address and length, wherein there are multiple VM identifiers, and each VM identifier in the multiple VM identifiers corresponds to a VM memory address respectively and length.
  • the first network device may query the registry based on the CLA and length of the target data to obtain the VM identifier, VM memory address and length corresponding to the CLA and the length, and then extract the target data from the position corresponding to the VM identifier, the VM memory address and the length. to encapsulate.
  • the CLA address of the target data can be referred to as shown in FIG. 4 .
  • One CLA address includes addresses of data of multiple VMs.
  • VM1 includes data sg11 , sg12 and sg13
  • VM2 includes data sg21 , sg22 and sg23
  • VM3 includes data sg31, sg32 and sg33
  • the address of a CLA may include addresses adr11, adr21 and adr31, where address adr11 may indicate data sg11 in VM1, address adr21 may indicate data sg21 in VM2, and address adr31 may indicate VM3 Data in sg31.
  • the registry may only include L_key, R_key, VM identifier, VM memory address and length, and the first network device may directly use the VM identifier, VM memory address, and length of the target data to extract the target data from the corresponding VM. to encapsulate.
  • security verification needs to be performed through the registry for the VM identifier, the VM memory address and the length, to avoid when the indicated read-write position exceeds the memory area indicated by the VM identifier, the VM memory address and the length in the registry, The case of reading and writing VMs in other memory areas other than the registry.
  • the VM identifier in this embodiment may be a physical function (Physical Function, PF)/virtual function (Virtual Function, VF) based on a single-root I/O virtualization (SR-IOV) device.
  • Identification based on Scalable I/O virtualization (Scalable-IOV) devices can be assignable device interfaces (ADI), or other identification processes that can identify different VMs or address domains Address space identifier (Process Address Space identifier, PASID).
  • the storage may be the memory space of the application program in the second network device, and the target data may be data stored in the memory space of the application program in the first network device.
  • the maximum transmission unit (maximum transmission unit, MTU) is the maximum data packet size that can be transmitted in each RDMA transmission in the RDMA protocol, and the number of the above-mentioned first packets can be determined according to the data size of the target data and the MTU.
  • the first network device when the first network device transmits data to the second network device through RDMA transmission, the first network device may directly write data to the second network device RDMA, or the second network device may write data to the first network device.
  • the network device RDMA reads data.
  • the second network device reads data from the first network device RDMA, before step 301 , the second network device also needs to send a data read request to the first network device to trigger step 301 .
  • the first network device sends the packet to the second network device.
  • the first network device may directly send the packet to the second network device according to the PSN sequence of the packet.
  • the second network device receives the message.
  • the second network device decapsulates the packet to obtain N pieces of data and N pieces of data write addresses.
  • the second network device may directly decapsulate the packet, for example, disassemble the protocol packet, process the information in the packet header, and extract the N data in the payload and the write address of the N data.
  • the second network device stores N pieces of data on the second host according to the write addresses of the N pieces of data.
  • the second network device may store N pieces of data in the VM indicated by the write address.
  • the write address is the CLA address and length
  • the second network device may query the registration according to the CLA address and length.
  • the table is stored by matching the corresponding VM ID, VM memory address and length, or the write address is the VM ID, VM memory address and length, and the second network device directly stores N pieces of data according to the VM ID, VM memory address and length .
  • the first network device encapsulates N data of N VMs running on the first host into a packet, and sends the packet to the second network device.
  • the first network device can directly The data of each VM is sent to the second network device without performing multiple RDMA transmissions, which can improve transmission efficiency.
  • the VMs running on the first host include abnormal VMs that cannot extract data.
  • FIG. 5 another embodiment of the data transmission method provided by the embodiment of the present application includes:
  • the first network device acquires N pieces of data.
  • the first network device encapsulates the N pieces of data and the write addresses of the N pieces of data into one packet according to the RDMA protocol.
  • steps 501 and 502 reference may be made to the relevant descriptions of steps 301 and 302 in the data transmission method shown in FIG. 3, and details are not repeated here.
  • the first network device acquires M pieces of data according to the identifier and memory address of the abnormal VM and the identifiers and memory addresses of some VMs in the N VMs.
  • the abnormal VM is a VM from which data cannot be extracted normally.
  • the abnormal VM may be a VM failure, shutdown or restart, etc.
  • the data transmitted by the first network device through RDMA involves the data in the abnormal VM, and the RDMA command initiated by the first network device For example, if direct memory access (DMA) fails, data cannot be obtained according to the ID and memory address of the abnormal VM, and M data can only be obtained according to the ID and memory address of some VMs in N VMs.
  • DMA direct memory access
  • the first network device encapsulates the M pieces of data into an exception packet according to the RDMA protocol.
  • the first network device When the first network device encapsulates the data of the abnormal VM and the data of some VMs in the N VMs according to the RDMA protocol, since the data of the abnormal VM cannot be obtained, the encapsulated message is the abnormal message, and the first network device can directly return After the command is completed, configure the indication information for the abnormal message.
  • the indication information can indicate that the abnormal message is an error message.
  • the indication can be configured in the completion queue element (complete queue element, CQE) of the completion queue (complete queue, CQ). information.
  • sequence of steps 501 to 502 and steps 503 to 504 is not limited.
  • the first network device generates a packet sequence, where the packet sequence includes an abnormal packet and at least one packet.
  • the first network device RDMA When the first network device RDMA transmits data to the second network device, the data is sequentially transmitted according to the PSNs of the transmission packets.
  • the first device sorts the normally encapsulated packets and the abnormal packets to form a packet sequence, which is a normal packet. Configure PSN for encapsulated packets and abnormal packets.
  • the first network device modifies the packet sequence, where the modification of the packet sequence includes deleting an abnormal packet and adding a padding packet to the packet sequence.
  • the first network device may process the factors in the packet sequence that affect the related state received by the second network device.
  • the first network device sends the packet sequence to the second device through a QP message, and the first network device sends the packet sequence to the second device through a QP message.
  • a network device can modify the PSN in the message sequence, skip the error message and send the next message, so as to avoid the second network device finding that the PSN is discontinuous and thinking that the middle message is lost, so it tries to retransmit repeatedly. After several unsuccessful attempts, it is considered that the QP is faulty or the QP is disconnected. Exemplarily, as shown in the schematic diagram of the combined strips shown in FIG.
  • VM0 is represented by a thin solid line box.
  • the data D000, D001 and D020 of VM1, the data D110 and D111 of VM1, the data D210, D220 and D221 of VM2, the data D300, D301 and D310 of VM3, the data of VM4 and VM5 are represented by solid boxes. That is, the data strips from VM0 to VM5 can be combined to obtain MSG0 to MSG5 messages as shown in FIG. 6 , and the first network device can send MSG0 to the second network device according to the sequence in the send queue (SQ) in the figure. Message to MSG5.
  • SQ send queue
  • MSG2 and MSG3 cannot directly obtain data from VM2, which causes MSG2 and MSG3 to fail to send, and the connection will be broken after multiple retransmissions, resulting in other normal messages. If it cannot be sent, in this embodiment of the present application, the subsequent data of MSG4 and MSG5 can be moved forward to replace the data of MSG2 and MSG3, as shown in Figure 7, that is, the data of MSG4 is now numbered as MSG2', and the data of MSG5 is currently numbered. is MSG3'.
  • the padding message is supplemented as the data of the new MSG4 and MSG5 to obtain a new SQ' to ensure that the data length is correct, so that the RDMA message can be sent to the second network device normally.
  • the padding data in the padding packet may be other VM data or blank data.
  • the embodiment of the present application may also be a way of supplementing the message that does not need to reorder the PSN of the message, as shown in FIG. 8 ,
  • the first network device directly replaces the target VM data with data of other VMs or blank data to obtain a new SQ'.
  • the first network device sends the modified packet sequence to the second network device.
  • the first network device sequentially sends the message and the padding message to the second network device according to the PSN of the modified message sequence.
  • the second network device sequentially receives the message and the padding message in the modified message sequence according to the PSN sequence. Fill the message.
  • the second network device determines a padding packet in the modified packet sequence, and deletes the padding packet.
  • the second network device After the second network device receives the modified packet sequence, it can pass the data integrity field (DIF) check. For example, the data in the padding packet comes from other VMs that do not belong to the N VMs, then It can be verified as an invalid packet that does not conform to the VM type, or the padding packet includes blank data, which can be directly verified as an invalid packet.
  • the upper-layer data processing program will remove the invalid packet and save only the modified packet. the above message in the sequence.
  • the second network device decapsulates the packet sequence after deleting the padding packet, so as to obtain N data and N data write addresses.
  • the packet sequence after deleting the padding packet only includes the packet in step 502, and the packet can be decapsulated, such as disassembling the protocol packet, processing the information in the packet header, and taking out the N in the payload. write address of data and N data.
  • the second network device stores N pieces of data on the second host according to the write addresses of the N pieces of data.
  • the second network device may store N pieces of data in the VM indicated by the corresponding write address, for example, may query the registry to match the corresponding VM identifier and VM memory according to the CLA address and length.
  • the address and length are stored, or directly stored according to the VM ID, VM memory address and length.
  • the first network device encapsulates N data of N VMs running on the first host into a packet, and sends the packet to the second network device.
  • the first network device can directly When the data of each VM is sent to the second network device, it is not necessary to perform multiple RDMA transmissions, which can improve transmission efficiency.
  • the first network device modifies the message sequence including the abnormal message, deletes the abnormal message and adds a padding message, and sends the modified message sequence to the second network device to prevent the second network device from discovering the sequence. If the number is discontinuous, it is considered that the message in the middle is lost, which leads to repeated attempts to retransmit. After several unsuccessful attempts, it is considered that the QP is faulty or the QP chain is disconnected. In this way, the transmission efficiency can be improved.
  • FIG. 9 is a schematic diagram of an embodiment of a network device 90 in an embodiment of the present application.
  • an embodiment of the present application provides a network device, where the network device includes:
  • the obtaining unit 901 is used to obtain N pieces of data, the network device is set on the first host, and N virtual machines VMs are running on the first host, and the N pieces of data come from the N pieces of VM, where N is an integer greater than 1;
  • an encapsulation unit 902 configured to encapsulate the N data and the write addresses of the N data into a message according to the remote direct memory access RDMA protocol;
  • the sending unit 903 is configured to send the message to the second network device.
  • the obtaining unit 901 is specifically configured to: obtain identifiers and memory addresses of N VMs; and obtain N pieces of data according to the identifiers and memory addresses of N VMs.
  • the obtaining unit 901 is further configured to: obtain M pieces of data according to the identifier and memory address of the abnormal VM and the identifiers and memory addresses of some VMs in the N VMs, the abnormal VM runs on the first host, and the abnormal VM The identity and memory address of the VM cannot obtain data, where M is a positive integer less than N;
  • the encapsulation unit 902 is further configured to encapsulate the M pieces of data into an exception packet according to the RDMA protocol.
  • the network device 90 further includes a generating unit 904, and the generating unit 904 is specifically configured to: generate a packet sequence, where the packet sequence includes an abnormal packet and at least one packet.
  • the network device 90 further includes a modification unit 905, and the modification unit 905 is specifically configured to: modify the packet sequence, wherein modifying the packet sequence includes deleting abnormal packets and adding padding packets to the packet sequence.
  • the network device may perform the operations performed by the first network device in the foregoing embodiments shown in FIG. 3 and FIG. 5 , and details are not repeated here.
  • FIG. 10 is a schematic diagram of another embodiment of the network device 100 in the embodiment of the present application.
  • an embodiment of the present application provides a network device, where the network device includes:
  • the receiving unit 1001 is configured to receive a message from a first network device, the first network device is set on a first host, the network device is set on a second host, and N virtual machines VM are running on the first host, and the message A message generated by encapsulating N data and N data write addresses according to the remote direct memory access RDMA protocol, where N data comes from N VMs, where N is an integer greater than 1;
  • a decapsulation unit 1002 configured to decapsulate the message to obtain N data and N data write addresses;
  • the storage unit 1003 is configured to store N pieces of data on the second host according to the write addresses of the N pieces of data.
  • the receiving unit 1001 is further configured to: receive a modified message sequence, where the modified message sequence includes a padding message;
  • the network device 100 further includes a deletion unit 1004, and the deletion unit 1004 is specifically configured to: determine the padding message in the modified message sequence, and delete the padding message.
  • the network device may perform the operations performed by the second network device in the foregoing embodiments shown in FIG. 3 and FIG. 5 , and details are not described herein again.
  • the network device 110 includes: a processor 1101 , a communication interface 1102 , a storage system 1103 and a bus 1104 .
  • the processor 1101 , the communication interface 1102 , and the storage system 1103 are connected to each other through a bus 1104 .
  • the processor 1101 is configured to control and manage the actions of the network device 110.
  • the processor 1101 is configured to execute the steps performed by the first network device in the method embodiments of FIG. 3 and FIG. 5 .
  • the communication interface 1102 is used to support the communication of the network device 110 .
  • the storage system 1103 is used to store program codes and data of the network device 110 .
  • the processor 1101 may be a central processing unit, a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array, or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute the various exemplary logical blocks, modules and circuits described in connection with this disclosure.
  • the processor 1101 may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and the like.
  • the bus 1104 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the sending unit 903 in the network device 90 is equivalent to the communication interface 1102 in the network device 110 , and the acquiring unit 901 , the encapsulating unit 902 , the generating unit 904 and the modifying unit 905 in the network device 90 are equivalent to the processor 1101 in the network device 110 .
  • the network device 110 in this embodiment may correspond to the first network device in the foregoing method embodiments in FIG. 3 and FIG. 5 , and the communication interface 1102 in the network device 110 may implement the first network device in the foregoing method embodiments in FIG. 3 and FIG. 5 .
  • the functions possessed by the network device and/or the various steps implemented are not repeated here.
  • the network device 120 includes: a processor 1201 , a communication interface 1202 , a storage system 1203 and a bus 1204 .
  • the processor 1201 , the communication interface 1202 , and the storage system 1203 are connected to each other through a bus 1204 .
  • the processor 1201 is configured to control and manage the actions of the network device 120.
  • the processor 1201 is configured to execute the steps performed by the second network device in the method embodiments of FIG. 3 and FIG. 5 .
  • the communication interface 1202 is used to support the communication of the network device 120 .
  • the storage system 1203 is used to store program codes and data of the network device 120 .
  • the processor 1201 may be a central processing unit, a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array, or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute the various exemplary logical blocks, modules and circuits described in connection with this disclosure.
  • the processor 1201 may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and the like.
  • the bus 1204 may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like.
  • PCI peripheral component interconnect standard
  • EISA Extended Industry Standard Architecture
  • the receiving unit 1001 in the network device 100 is equivalent to the communication interface 1202 in the network device 120 , and the decapsulating unit 1002 , the storage unit 1003 and the deleting unit 1004 in the network device 100 may be equivalent to the processor 1201 .
  • the network device 120 in this embodiment may correspond to the second network device in the foregoing method embodiments in FIGS. 3 and 5 , and the processor 1201 and the communication interface 1202 in the network device 120 may implement the foregoing method embodiments in FIGS. 3 and 5 .
  • the functions and/or various steps performed by the second network device in are not repeated here.
  • a computer-readable storage medium is also provided, where computer-executable instructions are stored in the computer-readable storage medium.
  • the processor of the device executes the computer-executable instructions
  • the device executes the above-mentioned FIG. 3 and Steps of the data transmission method performed by the first network device in FIG. 5 .
  • a computer-readable storage medium is also provided, where computer-executable instructions are stored in the computer-readable storage medium.
  • the processor of the device executes the computer-executable instructions
  • the device executes the above-mentioned FIG. 3 and Steps of the data transmission method performed by the second network device in FIG. 5 .
  • a computer program product includes computer-executable instructions, and the computer-executable instructions are stored in a computer-readable storage medium; when a processor of a device executes the computer-executable instructions , the device executes the steps of the data transmission method executed by the first network device in the above-mentioned FIG. 3 and FIG. 5 .
  • a computer program product includes computer-executable instructions, and the computer-executable instructions are stored in a computer-readable storage medium; when a processor of a device executes the computer-executable instructions , the device executes the steps of the data transmission method executed by the second network device in the above-mentioned FIG. 3 and FIG. 5 .
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, read-only memory), random access memory (RAM, random access memory), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The present application discloses a data transmission system, a data transmission method and a network device, which are used for improving transmission efficiency. Said system is applied to remote direct memory access (RDMA) data transmission. Said system comprises: N pieces of data of N VMs running on a first host need to be transmitted to a second host by means of RDMA, wherein the N VMs are VMs that can normally extract data, and a first network device in the first host may encapsulate the N pieces of data and write addresses of the N pieces of data according to an RDMA protocol, and then send a message generated by means of encapsulation to a second network device on the second host. The second network device may decapsulate the message, extract the N pieces of data and the write addresses of the N pieces of data therefrom and store the N pieces of data at locations indicated by the write addresses.

Description

数据传输系统、数据传输方法以及网络设备Data transmission system, data transmission method and network device
本申请要求于2021年1月4日提交中国专利局、申请号为202110049814.4、发明名称为“数据传输系统、数据传输方法以及网络设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on January 4, 2021 with the application number 202110049814.4 and the invention titled "Data Transmission System, Data Transmission Method and Network Equipment", the entire contents of which are incorporated by reference in in this application.
技术领域technical field
本申请涉及网络通信技术领域,尤其涉及一种数据传输系统、数据传输方法以及网络设备。The present application relates to the field of network communication technologies, and in particular, to a data transmission system, a data transmission method, and a network device.
背景技术Background technique
远程直接存储器存取(remote direct memory access,RDMA)协议是一种不需要操作系统干预,通过网络直接将数据从一个系统传输至另一个系统的存储器中的协议。RDMA协议将待传数据封装成一个或多个RDMA报文,将一个或多个RDMA报文从发送端发送至接收端。The Remote Direct Memory Access (RDMA) protocol is a protocol that transfers data directly from one system to another in memory over a network without operating system intervention. The RDMA protocol encapsulates the data to be transmitted into one or more RDMA packets, and sends the one or more RDMA packets from the sender to the receiver.
RDMA传输是通过一次发送队列(send queue,SQ)将数据从一个系统传输至另一个系统的存储器,其中,该SQ中的数据只包括一个虚拟机(virtual machine,VM)的数据。RDMA transfer is to transfer data from one system to the memory of another system through a send queue (SQ), wherein the data in the SQ only includes the data of one virtual machine (VM).
当一个业务涉及多个VM的数据时,无法一次性将多个VM的数据传输到另一个系统的存储器,影响传输效率。When a service involves data of multiple VMs, the data of multiple VMs cannot be transferred to the memory of another system at one time, which affects the transmission efficiency.
发明内容SUMMARY OF THE INVENTION
本申请提供了一种数据传输系统、数据传输方法以及网络设备,用于提高传输效率。The present application provides a data transmission system, a data transmission method and a network device for improving transmission efficiency.
本申请第一方面提供了一种数据传输系统,该系统包括:数据传输系统包括第一网络设备和第二网络设备,第一网络设备设置于第一主机上,第二网络设备设置于第二主机上,第一主机上运行有N个虚拟机VM;第一网络设备,用于获取N个数据,N个数据来自N个VM,根据远程直接存储器存取RDMA协议将N个数据和N个数据的写入地址封装成一个报文,将报文发送至第二网络设备,其中,N为大于1的整数;第二网络设备,用于接收报文,对报文进行解封装,以获得N个数据和N个数据的写入地址,根据N个数据的写入地址在第二主机上存储N个数据。A first aspect of the present application provides a data transmission system, the system includes: the data transmission system includes a first network device and a second network device, the first network device is set on the first host, and the second network device is set on the second network device On the host, there are N virtual machines running on the first host; the first network device is used to obtain N data, the N data comes from the N VMs, and the N data and the N data are converted according to the remote direct memory access RDMA protocol. The write address of the data is encapsulated into a packet, and the packet is sent to the second network device, where N is an integer greater than 1; the second network device is used to receive the packet and decapsulate the packet to obtain N data and N data write addresses are stored on the second host according to the N data write addresses.
上述第一方面中,第一主机上运行的N个VM的N个数据需要通过RDMA传输到第二主机上,其中,N个VM为可以正常提取数据的VM,第一主机中的第一网络设备可以根据RDMA协议封装上述N个数据和该N个数据的写入地址,然后将封装生成的报文发送给第二主机上的第二网络设备。第二网络设备可以对该报文进行解封装,提取其中的N个数据和N个数据的写入地址,将N个数据存储在写入地址指示的位置。第一网络设备可以直接将多个虚拟机的数据发送给第二网络设备,提高传输效率。In the above-mentioned first aspect, N data of N VMs running on the first host need to be transferred to the second host through RDMA, wherein the N VMs are VMs that can normally extract data, and the first network in the first host The device may encapsulate the above-mentioned N pieces of data and the write addresses of the N pieces of data according to the RDMA protocol, and then send the encapsulated packet to the second network device on the second host. The second network device may decapsulate the packet, extract the N data and the write address of the N data, and store the N data in the location indicated by the write address. The first network device can directly send the data of the multiple virtual machines to the second network device to improve transmission efficiency.
在一种可能的实施方式中,第一网络设备,用于获取N个VM的标识和内存地址;根据N个VM的标识和内存地址,获取N个数据。In a possible implementation manner, the first network device is configured to acquire the identifiers and memory addresses of the N VMs; and acquire N pieces of data according to the identifiers and memory addresses of the N VMs.
上述可能的实施方式中,第一网络设备可以直接根据想要RDMA传输到第二网络设备的N个数据的VM标识和VM内存地址获取N个数据,第二网络设备接收的写入地址包括VM 标识和VM内存地址,第二网络设备可以直接采用写入地址存储N个数据,无需经过芯片逻辑地址(chip logical address,CLA),可以减少处理流程。In the above possible implementation manner, the first network device can directly obtain N pieces of data according to the VM identifiers and VM memory addresses of the N pieces of data that it wants to RDMA to transmit to the second network device, and the write address received by the second network device includes VM. ID and VM memory address, the second network device can directly use the write address to store N pieces of data without going through a chip logical address (CLA), which can reduce the processing flow.
在一种可能的实施方式中,第一主机上存在异常VM,根据异常VM的标识和内存地址无法获取数据;第一网络设备,用于根据异常VM的标识和内存地址以及N个VM中部分VM的标识和内存地址,获取M个数据,其中,M为小于N的正整数;根据RDMA协议将M个数据封装成一个异常报文。In a possible implementation manner, an abnormal VM exists on the first host, and data cannot be obtained according to the identifier and memory address of the abnormal VM; the first network device is configured to obtain data according to the identifier and memory address of the abnormal VM and some of the N VMs. The identifier and memory address of the VM are obtained, and M pieces of data are obtained, where M is a positive integer less than N; the M pieces of data are encapsulated into an exception message according to the RDMA protocol.
上述可能的实施方式中,如果N个VM中存在异常VM,例如VM发生故障、停机或重启等,第一网络设备无法根据异常VM的标识和内存地址获取数据,即根据异常VM的标识和内存地址以及N个VM中部分VM的标识和内存地址封装生成的异常报文无法通过队列对(queue pair,QP)消息发送到第二网络设备。In the above-mentioned possible implementation manner, if there is an abnormal VM in the N VMs, for example, the VM fails, shuts down or restarts, etc., the first network device cannot obtain data according to the identification and memory address of the abnormal VM, that is, according to the identification and memory of the abnormal VM. An exception message generated by encapsulating the address, the identifiers of some of the VMs in the N VMs, and the memory address cannot be sent to the second network device through a queue pair (queue pair, QP) message.
在一种可能的实施方式中,第一网络设备,用于生成报文序列,报文序列包括异常报文和至少一个报文。In a possible implementation manner, the first network device is configured to generate a packet sequence, where the packet sequence includes an abnormal packet and at least one packet.
上述可能的实施方式中,当N个VM中存在异常VM时,第一网络设备需要将上述异常报文和至少一个上述报文生成报文序列,以便第一网络设备根据报文序列的报文序列号依次发送上述异常报文和上述报文,提高方案的可实施性。In the above possible implementation manner, when an abnormal VM exists in the N VMs, the first network device needs to generate a message sequence from the abnormal message and at least one of the above messages, so that the first network device can generate a message sequence according to the message of the message sequence. The sequence number sends the abnormal message and the message in sequence, so as to improve the practicability of the solution.
在一种可能的实施方式中,第一网络设备,用于修改报文序列,其中,修改报文序列包括删除异常报文,并在报文序列中添加填充报文。In a possible implementation manner, the first network device is configured to modify the packet sequence, wherein modifying the packet sequence includes deleting abnormal packets, and adding padding packets to the packet sequence.
上述可能的实施方式中,报文序列中存在异常报文,异常报文无法发送到第二网络设备,导致第二网络设备不断请求消息重传,而多次重传失败会导致第一网络设备和第二网络设备之间断链,RDMA传输失败。本申请可以删除报文序列中的异常报文,然后为报文序列补充填充报文,该填充报文为无效的报文,以使得数据的长度不变,避免数据长度不符导致重传。In the above possible implementation manner, there is an abnormal message in the message sequence, and the abnormal message cannot be sent to the second network device, causing the second network device to continuously request message retransmission, and multiple retransmission failures will cause the first network device. The link with the second network device is disconnected, and the RDMA transmission fails. The present application can delete abnormal packets in the packet sequence, and then supplement the packet sequence with padding packets, which are invalid packets, so as to keep the data length unchanged and avoid retransmission due to inconsistent data lengths.
在一种可能的实施方式中,第二网络设备,用于接收修改后的报文序列,确定修改后的报文序列中的填充报文,删除填充报文。In a possible implementation manner, the second network device is configured to receive the modified packet sequence, determine the padding packet in the modified packet sequence, and delete the padding packet.
上述可能的实施方式中,第二网络设备接收到上述修改后的报文序列后,可以通过校验确定其中的无效的报文,即确定修改后的报文序列中的填充报文,然后删除该填充报文,以提高报文传输的可靠性。In the above-mentioned possible implementation manner, after receiving the above-mentioned modified packet sequence, the second network device can determine the invalid packet by checking, that is, determine the padding packet in the modified packet sequence, and then delete the modified packet. The padding message is used to improve the reliability of message transmission.
本申请第二方面提供了一种数据传输方法,该方法包括:第一网络设备获取N个数据,第一网络设备设置于第一主机上,第一主机上运行有N个虚拟机VM,N个数据来自N个VM,其中,N为大于1的整数;第一网络设备根据远程直接存储器存取RDMA协议将N个数据和N个数据的写入地址封装成一个报文;第一网络设备将报文发送至第二网络设备。A second aspect of the present application provides a data transmission method, the method includes: a first network device acquires N pieces of data, the first network device is set on a first host, and N virtual machines VM run on the first host, and N The data comes from N VMs, where N is an integer greater than 1; the first network device encapsulates the N data and the write address of the N data into one message according to the remote direct memory access RDMA protocol; the first network device Send the message to the second network device.
在一种可能的实施方式中,第一主机上运行有异常VM,异常VM根据异常VM的标识和内存地址无法获取数据,该方法还包括:第一网络设备根据异常VM的标识和内存地址以及N个VM中部分VM的标识和内存地址,获取M个数据,其中,M为小于N的正整数;第一网络设备根据RDMA协议将M个数据封装成一个异常报文。In a possible implementation manner, an abnormal VM runs on the first host, and the abnormal VM cannot obtain data according to the identifier and memory address of the abnormal VM. The method further includes: the first network device according to the identifier and memory address of the abnormal VM and the The identifiers and memory addresses of some VMs in the N VMs are obtained, and M pieces of data are obtained, wherein M is a positive integer less than N; the first network device encapsulates the M pieces of data into an exception message according to the RDMA protocol.
在一种可能的实施方式中,该方法还包括:第一网络设备生成报文序列,报文序列包括异常报文和至少一个报文。In a possible implementation manner, the method further includes: the first network device generates a packet sequence, where the packet sequence includes an abnormal packet and at least one packet.
在一种可能的实施方式中,该方法还包括:第一网络设备修改报文序列,其中,修改报文序列包括删除异常报文,并在报文序列中添加填充报文。In a possible implementation manner, the method further includes: the first network device modifies the packet sequence, wherein modifying the packet sequence includes deleting abnormal packets, and adding padding packets to the packet sequence.
本申请第三方面提供了一种数据传输方法,该方法包括:第二网络设备接收来自第一网络设备的报文,第一网络设备设置于第一主机上,第二网络设备设置于第二主机上,第一主机上运行有N个虚拟机VM,报文为根据远程直接存储器存取RDMA协议将N个数据和N 个数据的写入地址封装生成的报文,N个数据来自N个VM,其中,N为大于1的整数;第二网络设备对报文进行解封装,以获得N个数据和N个数据的写入地址;第二网络设备根据N个数据的写入地址在第二主机上存储N个数据。A third aspect of the present application provides a data transmission method, the method includes: a second network device receives a message from a first network device, the first network device is set on the first host, and the second network device is set on a second network device On the host, there are N virtual machines running on the first host, and the message is a message generated by encapsulating the N data and the write address of the N data according to the remote direct memory access RDMA protocol, and the N data comes from the N data. VM, where N is an integer greater than 1; the second network device decapsulates the packet to obtain N data and N data write addresses; the second network device decapsulates the N data in the first N pieces of data are stored on the second host.
在一种可能的实施方式中,该方法还包括:第二网络设备接收修改后的报文序列,修改后的报文序列包括填充报文;第二网络设备确定修改后的报文序列中的填充报文,并删除填充报文。In a possible implementation manner, the method further includes: the second network device receives a modified packet sequence, where the modified packet sequence includes a padding packet; and the second network device determines a packet in the modified packet sequence. Padding packets, and deleting padding packets.
本申请第四方面提供了一种网络设备,包括:获取单元,用于获取N个数据,网络设备设置于第一主机上,第一主机上运行有N个虚拟机VM,N个数据来自N个VM,其中,N为大于1的整数;封装单元,用于根据远程直接存储器存取RDMA协议将N个数据和N个数据的写入地址封装成一个报文;发送单元,用于将报文发送至第二网络设备。A fourth aspect of the present application provides a network device, comprising: an acquisition unit configured to acquire N pieces of data, the network device is set on a first host, N virtual machines VM are running on the first host, and the N pieces of data come from N VMs, where N is an integer greater than 1; the encapsulation unit is used to encapsulate the N data and the write addresses of the N data into a message according to the remote direct memory access RDMA protocol; the sending unit is used to encapsulate the message The message is sent to the second network device.
该网络设备用于执行前述第二方面的方法或第二方面任意一种实施方式。The network device is configured to execute the method of the second aspect or any one of the implementation manners of the second aspect.
本申请第五方面提供了一种网络设备,包括:接收单元,用于接收来自第一网络设备的报文,第一网络设备设置于第一主机上,网络设备设置于第二主机上,第一主机上运行有N个虚拟机VM,报文为根据远程直接存储器存取RDMA协议将N个数据和N个数据的写入地址封装生成的报文,N个数据来自N个VM,其中,N为大于1的整数;解封装单元,用于对报文进行解封装,以获得N个数据和N个数据的写入地址;存储单元,用于根据N个数据的写入地址在第二主机上存储N个数据。A fifth aspect of the present application provides a network device, including: a receiving unit configured to receive a message from a first network device, the first network device is set on the first host, the network device is set on the second host, and the first network device is set on the second host. There are N virtual machine VMs running on a host, and the message is a message generated by encapsulating N data and N data write addresses according to the remote direct memory access RDMA protocol. The N data comes from N VMs, where, N is an integer greater than 1; the decapsulation unit is used to decapsulate the message to obtain the N data and the write address of the N data; the storage unit is used to store the N data in the second according to the write address of the N data N pieces of data are stored on the host.
该网络设备用于执行前述第三方面的方法或第三方面任意一种实施方式。The network device is configured to execute the method of the third aspect or any one of the implementation manners of the third aspect.
本申请第六方面提供了一种网络设备,包括:处理器、存储器、以及通信接口,该处理器用于执行该存储器中存储的指令,使得网络设备执行上述第二方面或第二方面任一种可选方式所提供的方法,该通信接口用于接收或发送指示。第六方面提供的网络设备的具体细节可参见上述第二方面或第二方面任一种可选方式,此处不再赘述。A sixth aspect of the present application provides a network device, including: a processor, a memory, and a communication interface, where the processor is configured to execute instructions stored in the memory, so that the network device executes the second aspect or any one of the second aspects The method provided by the optional manner, the communication interface is used for receiving or sending an indication. For the specific details of the network device provided in the sixth aspect, reference may be made to the second aspect or any optional manner of the second aspect, which will not be repeated here.
本申请第七方面提供了一种网络设备,包括:处理器、存储器、以及通信接口,该处理器用于执行该存储器中存储的指令,使得网络设备执行上述第三方面或第三方面任一种可选方式所提供的方法,该通信接口用于接收或发送指示。第七方面提供的网络设备的具体细节可参见上述第三方面或第三方面任一种可选方式,此处不再赘述。A seventh aspect of the present application provides a network device, including: a processor, a memory, and a communication interface, where the processor is configured to execute instructions stored in the memory, so that the network device executes the third aspect or any one of the third aspects The method provided by the optional manner, the communication interface is used for receiving or sending an indication. For the specific details of the network device provided in the seventh aspect, reference may be made to the third aspect or any optional manner of the third aspect, which will not be repeated here.
本申请第八方面提供了一种计算机可读存储介质,该计算机可读存储介质中保存有程序,当该计算机执行程序时,执行前述第二方面或第二方面任一种可选方式提供的方法。An eighth aspect of the present application provides a computer-readable storage medium, where a program is stored in the computer-readable storage medium. When the computer executes the program, the computer executes the second aspect or any optional manner of the second aspect. method.
本申请第九方面提供了一种计算机可读存储介质,该计算机可读存储介质中保存有程序,当该计算机执行程序时,执行前述第三方面或第三方面任一种可选方式提供的方法。A ninth aspect of the present application provides a computer-readable storage medium, where a program is stored in the computer-readable storage medium, and when the computer executes the program, the computer executes the third aspect or any optional manner provided by the third aspect. method.
本申请第十方面提供了一种计算机程序产品,当该计算机程序产品在计算机上执行时,该计算机执行前述第二方面或第二方面任一种可选方式提供的方法。A tenth aspect of the present application provides a computer program product. When the computer program product is executed on a computer, the computer executes the method provided in the second aspect or any optional manner of the second aspect.
本申请第十一方面提供了一种计算机程序产品,当该计算机程序产品在计算机上执行时,该计算机执行前述第三方面或第三方面任一种可选方式提供的方法。An eleventh aspect of the present application provides a computer program product. When the computer program product is executed on a computer, the computer executes the method provided in the third aspect or any optional manner of the third aspect.
附图说明Description of drawings
图1为本申请实施例提供的数据传输方法的系统框架图;FIG. 1 is a system frame diagram of a data transmission method provided by an embodiment of the present application;
图2为本申请实施例提供的数据传输系统结构示意图;2 is a schematic structural diagram of a data transmission system provided by an embodiment of the present application;
图3为本申请实施例提供的数据传输方法一实施例;FIG. 3 is an embodiment of a data transmission method provided by an embodiment of the present application;
图4为本申请实施例提供的CLA地址示意图;FIG. 4 is a schematic diagram of a CLA address provided in an embodiment of the present application;
图5为本申请实施例提供的数据传输方法另一实施例;FIG. 5 is another embodiment of a data transmission method provided by an embodiment of the present application;
图6为本申请实施例提供的组合条带示意图;6 is a schematic diagram of a combined strip provided in an embodiment of the present application;
图7为本申请实施例提供的填充报文的一填充示意图;7 is a schematic diagram of a padding message provided by an embodiment of the present application;
图8为本申请实施例提供的填充报文的另一填充示意图;8 is another schematic diagram of filling a filling message provided by an embodiment of the present application;
图9为本申请实施例提供的网络设备的结构示意图;FIG. 9 is a schematic structural diagram of a network device provided by an embodiment of the present application;
图10为本申请实施例提供的网络设备的结构示意图;FIG. 10 is a schematic structural diagram of a network device provided by an embodiment of the present application;
图11为本申请实施例提供的网络设备的结构示意图;FIG. 11 is a schematic structural diagram of a network device provided by an embodiment of the application;
图12为本申请实施例提供的网络设备的结构示意图。FIG. 12 is a schematic structural diagram of a network device provided by an embodiment of the present application.
具体实施方式Detailed ways
本申请实施例提供一种数据传输系统、数据传输方法以及网络设备,用于减少处理流程和内存占用。下面结合附图,对本申请的实施例进行描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。Embodiments of the present application provide a data transmission system, a data transmission method, and a network device, which are used to reduce processing flow and memory occupation. The embodiments of the present application will be described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Those of ordinary skill in the art know that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second" and the like in the description and claims of the present application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It is to be understood that data so used may be interchanged under appropriate circumstances so that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those expressly listed Rather, those steps or units may include other steps or units not expressly listed or inherent to these processes, methods, products or devices.
本申请实施例的数据传输方法主要适用于RDMA传输的应用场景中,当一个应用程序发起RDMA读/写请求时,系统并不执行数据复制动作,这就减少了处理网络通信时在内核空间和用户空间进行上下文切换的次数。在不需要任何内核内存参与的条件下,RDMA请求从运行在用户空间中的应用程序发送到本地网卡,然后经过网络传送到远端网卡,因此,RDMA传输不需要操作系统参与,不会增加系统负载。如图1所示为本申请实施例数据传输方法的一个系统框架图,图中示出了RDMA传输的一个传输场景:第一应用程序从内存中取出数据生成RDMA报文,并将RDMA报文经缓存发送至本地网卡中,然后通过网络传输至远端网卡中,远端网卡将接收到的RDMA报文进行缓存,第二应用程序从缓存中取出数据写入到内存中。同样,第一应用程序读取第二应用程序内存中的数据的过程与上述写入过程的描述类似,此处不再赘述。另外,网卡包括有RDMA功能的网络接口卡(RDMA network interface card,RNIC)或主机通道适配卡(host channel adapter,HCA)。The data transmission method of the embodiment of the present application is mainly applicable to the application scenario of RDMA transmission. When an application program initiates an RDMA read/write request, the system does not perform a data copy action, which reduces the amount of time spent in the kernel space and in the processing of network communication. The number of user space context switches. Without any kernel memory participation, RDMA requests are sent from the application running in user space to the local network card, and then sent to the remote network card through the network. Therefore, RDMA transmission does not require the participation of the operating system and will not increase the system load. FIG. 1 is a system frame diagram of a data transmission method according to an embodiment of the present application. The figure shows a transmission scenario of RDMA transmission: the first application fetches data from the memory to generate an RDMA message, and sends the RDMA message to the It is sent to the local network card through the cache, and then transmitted to the remote network card through the network. The remote network card caches the received RDMA message, and the second application program takes out the data from the cache and writes it into the memory. Likewise, the process of reading the data in the memory of the second application by the first application is similar to the description of the above-mentioned writing process, and details are not repeated here. In addition, the network card includes an RDMA-capable network interface card (RDMA network interface card, RNIC) or a host channel adapter (host channel adapter, HCA).
RDMA传输是通过一次发送队列(send queue,SQ)将数据从一个系统传输至另一个系统的存储器,其中,该SQ中的数据只包括一个虚拟机(virtual machine,VM)的数据,当一个业务涉及多个VM的数据时,无法一次性将多个VM的数据传输到另一个系统的存储器,影响传输效率。RDMA transfer is to transfer data from one system to the memory of another system through a send queue (SQ), wherein the data in the SQ only includes the data of a virtual machine (VM), when a service When data of multiple VMs is involved, the data of multiple VMs cannot be transferred to the memory of another system at one time, which affects the transfer efficiency.
为了解决上述问题,本申请实施例提供了一种数据传输系统,其结构可参见图2,该数据传输系统包括第一网络设备21和第二网络设备22,第一网络设备21设置于第一主机上211,第二网络设备22设置于第二主机上221,第一主机211上运行有N个虚拟机VM 2111。In order to solve the above problem, an embodiment of the present application provides a data transmission system, the structure of which can be seen in FIG. 2 , the data transmission system includes a first network device 21 and a second network device 22 , and the first network device 21 is arranged on the first network device 21 . On the host 211, the second network device 22 is set on the second host 221, and N virtual machines 2111 run on the first host 211.
第一网络设备21,用于获取N个数据,N个数据来自N个VM 2111,根据远程直接存储器存取RDMA协议将N个数据和N个数据的写入地址封装成一个报文,将报文发送至第 二网络设备22,其中,N为大于1的整数。The first network device 21 is used to obtain N pieces of data, and the N pieces of data come from N pieces of VM 2111. According to the remote direct memory access RDMA protocol, the N pieces of data and the write addresses of the N pieces of data are encapsulated into a message, and the report The message is sent to the second network device 22, where N is an integer greater than 1.
第二网络设备22,用于接收报文,对报文进行解封装,以获得N个数据和N个数据的写入地址,根据N个数据的写入地址在第二主机221上存储N个数据。The second network device 22 is configured to receive the message, decapsulate the message to obtain N data and N data write addresses, and store N data on the second host 221 according to the N data write addresses data.
具体的,第一主机211上运行的N个VM 2111的N个数据需要通过RDMA传输到第二主机221上,其中,N个VM 2111为可以正常提取数据的VM,第一主机211中的第一网络设备21可以根据RDMA协议封装上述N个数据和该N个数据的写入地址,然后将封装生成的报文发送给第二主机221上的第二网络设备22。第二网络设备22可以对该报文进行解封装,提取其中的N个数据和N个数据的写入地址,将N个数据存储在写入地址指示的位置。第一网络设备21可以直接将多个虚拟机的数据发送给第二网络设备22,提高了传输效率。Specifically, N data of N VMs 2111 running on the first host 211 need to be transferred to the second host 221 through RDMA, wherein the N VMs 2111 are VMs that can normally extract data, and the first host 211 A network device 21 may encapsulate the above-mentioned N pieces of data and the write addresses of the N pieces of data according to the RDMA protocol, and then send the packet generated by encapsulation to the second network device 22 on the second host 221 . The second network device 22 may decapsulate the packet, extract the N pieces of data and the write addresses of the N pieces of data, and store the N pieces of data in the positions indicated by the write addresses. The first network device 21 can directly send data of multiple virtual machines to the second network device 22, which improves transmission efficiency.
可选的,第一网络设备21,用于获取N个VM 2111的标识和内存地址;根据N个VM的标识和内存地址,获取N个数据。Optionally, the first network device 21 is used to acquire the identifiers and memory addresses of the N VMs 2111; and acquire N pieces of data according to the identifiers and memory addresses of the N VMs.
具体的,第一网络设备21可以直接根据想要RDMA传输到第二网络设备22的N个数据的VM标识和VM内存地址获取N个数据,第二网络设备22接收的写入地址包括VM标识和VM内存地址,第二网络设备22可以直接采用写入地址存储N个数据,无需经过芯片逻辑地址(chip logical address,CLA),可以减少处理流程。Specifically, the first network device 21 can directly acquire N pieces of data according to the VM identifiers and VM memory addresses of the N pieces of data that it wants to RDMA to transmit to the second network device 22, and the write address received by the second network device 22 includes the VM identifier. and the VM memory address, the second network device 22 can directly use the write address to store N pieces of data without going through a chip logical address (chip logical address, CLA), which can reduce the processing flow.
可选的,第一主机211上运行有异常VM,根据异常VM的标识和内存地址无法获取数据;第一网络设备21,用于根据异常VM的标识和内存地址以及N个VM 2111中部分VM的标识和内存地址,获取M个数据,其中,M为小于N的正整数;根据RDMA协议将M个数据封装成一个异常报文。Optionally, there is an abnormal VM running on the first host 211, and data cannot be obtained according to the identification and memory address of the abnormal VM; the first network device 21 is used for the identification and memory address of the abnormal VM and some VMs in the N VMs 2111. The identifier and memory address are obtained, and M pieces of data are obtained, where M is a positive integer less than N; the M pieces of data are encapsulated into an exception message according to the RDMA protocol.
具体的,如果N个VM 2111中存在异常VM,例如VM发生故障、停机或重启等,第一网络设备21无法根据异常VM的标识和内存地址获取数据,即根据异常VM的标识和内存地址以及N个VM中部分VM的标识和内存地址封装生成的异常报文无法通过队列对(queue pair,QP)消息发送到第二网络设备22。Specifically, if there is an abnormal VM in the N VMs 2111, for example, the VM fails, shuts down or restarts, etc., the first network device 21 cannot obtain data according to the identification and memory address of the abnormal VM, that is, according to the identification and memory address of the abnormal VM and The abnormal packets generated by the identification and memory address encapsulation of some VMs in the N VMs cannot be sent to the second network device 22 through a queue pair (queue pair, QP) message.
可选的,第一网络设备21,用于生成报文序列,报文序列包括异常报文和至少一个报文。Optionally, the first network device 21 is configured to generate a packet sequence, where the packet sequence includes an abnormal packet and at least one packet.
具体的,当N个VM中存在异常VM时,第一网络设备21需要将上述异常报文和至少一个上述报文生成报文序列,以便第一网络设备21根据报文序列的报文序列号依次发送上述异常报文和上述报文,提高方案的可实施性。Specifically, when an abnormal VM exists in the N VMs, the first network device 21 needs to generate a message sequence from the abnormal message and at least one of the foregoing messages, so that the first network device 21 can generate a message sequence according to the message sequence number of the message sequence. The above-mentioned abnormal message and the above-mentioned message are sent in sequence, so as to improve the implementability of the solution.
可选的,第一网络设备21,用于修改报文序列,其中,修改报文序列包括删除异常报文,并在报文序列中添加填充报文。Optionally, the first network device 21 is configured to modify the packet sequence, where modifying the packet sequence includes deleting abnormal packets and adding padding packets to the packet sequence.
具体的,报文序列中存在异常报文,异常报文无法发送到第二网络设备22,导致第二网络设备22不断请求消息重传,而多次重传失败会导致第一网络设备21和第二网络设备22之间断链,RDMA传输失败。本申请可以删除报文序列中的异常报文,然后为报文序列补充填充报文,该填充报文为无效的报文,以使得数据的长度不变,避免数据长度不符导致重传。Specifically, there is an abnormal message in the message sequence, and the abnormal message cannot be sent to the second network device 22, causing the second network device 22 to continuously request message retransmission, and multiple retransmission failures will cause the first network device 21 and The link between the second network devices 22 is disconnected, and the RDMA transmission fails. The present application can delete abnormal packets in the packet sequence, and then supplement the packet sequence with padding packets, which are invalid packets, so as to keep the data length unchanged and avoid retransmission due to inconsistent data lengths.
可选的,第二网络设备22,用于接收修改后的报文序列,确定修改后的报文序列中的填充报文,删除填充报文。Optionally, the second network device 22 is configured to receive the modified packet sequence, determine the padding packet in the modified packet sequence, and delete the padding packet.
具体的,第二网络设备22接收到上述修改后的报文序列后,可以通过校验确定其中的无效的报文,即确定修改后的报文序列中的填充报文,然后删除该填充报文,以提高报文传输的可靠性。Specifically, after receiving the above-mentioned modified packet sequence, the second network device 22 can determine the invalid packet by checking, that is, determine the padding packet in the modified packet sequence, and then delete the padding packet. message to improve the reliability of message transmission.
本申请实施例的技术方案中第一网络设备可以直接将多个VM的数据发送给第二网络设备,提高传输效率。In the technical solutions of the embodiments of the present application, the first network device may directly send data of multiple VMs to the second network device, thereby improving transmission efficiency.
设置第一网络设备的第一主机上运行的VM可以都为能正常提取数据的VM,也可以包 括不能提取数据的异常VM,下面分别进行描述。The VMs running on the first host where the first network device is set may all be VMs that can normally extract data, and may also include abnormal VMs that cannot extract data, which will be described separately below.
一、第一主机上运行的VM都为能正常提取数据的VM。1. The VMs running on the first host are all VMs that can extract data normally.
请参阅图3所示的数据传输方法的一个实施例:Please refer to an embodiment of the data transmission method shown in FIG. 3:
301、第一网络设备获取N个数据。301. The first network device acquires N pieces of data.
本实施例中,N(N为大于1的整数)个数据为第一网络设备需要通过RDMA传输写入第二网络设备的数据,该N个数据存储设置第一网络设备的第一主机的N个VM中,第一网络设备可以根据N个数据的所在的N个VM的标识和内存地址,向该N个VM获取上述N个数据。本实施例中N个VM都为可以正常提取数据的VM。In this embodiment, N (N is an integer greater than 1) pieces of data are data that the first network device needs to transmit and write to the second network device through RDMA, and the N pieces of data store the N data of the first host of the first network device. Among the VMs, the first network device may obtain the N pieces of data from the N VMs according to the identifiers and memory addresses of the N VMs where the N pieces of data are located. In this embodiment, the N VMs are all VMs that can normally extract data.
302、第一网络设备根据RDMA协议将N个数据和N个数据的写入地址封装成一个报文。302. The first network device encapsulates the N pieces of data and the write addresses of the N pieces of data into one packet according to the RDMA protocol.
本实施例中,上述封装的报文可以是RDMA write报文,RDMA write报文包括如下三种类型:RDMA write First、RDMA write Middle和RDMA write Last,RDMA扩展传输报文头(RDMAextended transport header,RETH)是RDMA报文格式中的一个字段,该字段中携带有报文数据的目的地址,基本传输报文头(base transport header,BTH)是RDMA报文格式中的一个字段,其中包括PSN。因此,RDMA write First报文中包括报文序列号(packet sequence number,PSN)和写入地址,而RDMA write Middle和RDMA write Last报文中存在BTH字段。In this embodiment, the above-mentioned encapsulated message may be an RDMA write message, and the RDMA write message includes the following three types: RDMA write First, RDMA write Middle, and RDMA write Last, and the RDMA extended transport header (RDMAextended transport header, RETH) is a field in the RDMA message format, which carries the destination address of the message data, and the base transport header (BTH) is a field in the RDMA message format, including PSN. Therefore, the RDMA write First message includes the packet sequence number (PSN) and the write address, while the BTH field exists in the RDMA write Middle and RDMA write Last messages.
本实施例中,第一网络设备可以预先完成RDMA的QP创建,其既可以创建在主机的虚拟机管理器(Hypervisor)上,也可以创建在其它的中央处理器(central processing unit,CPU)或设备上,同时,RDMA的硬件设备(RNIC或HCA或其它具有RDMA能力的硬件设备)本身能够直接访问VM的内存空间。当RDMA传输硬件上限制于CLA时,第一网络设备预先创建一个用于RDMA硬件访问的CLA,为了保证CLA在使用时不冲突,至少要保证在一个L_key(用于本端设备的内存读取)/R_Key(用于远端的设备的内存读取,L_key和R_Key也可以统一使用同一个值)对应的CLA是唯一的。然后进行RDMA的内存注册,将要读取的VM的内存映射到CLA中,RDMA的数据操作命令都可以基于CLA进行。其中,注册的注册表包括L_key、R_key、CLA起始地址、VM标识、VM内存地址和长度,其中,VM标识为多个,以及多个VM标识中的每个VM标识分别对应一个VM内存地址和长度。第一网络设备可以基于目标数据的CLA和长度,查询注册表,以获得该CLA和长度对应的VM标识、VM内存地址和长度,然后从VM标识、VM内存地址和长度对应的位置提取目标数据进行封装。本实施例中,目标数据的CLA地址可以参照图4所示,一个CLA地址包括多个VM的数据的地址,示例性的,VM1包括数据sg11、sg12和sg13,VM2包括数据sg21、sg22和sg23,VM3包括数据sg31、sg32和sg33,一个CLA的地址可以包括地址adr11、adr21和adr31,其中,地址adr11可以指示VM1中的数据sg11,地址adr21可以指示VM2中的数据sg21,地址adr31可以指示VM3中的数据sg31。In this embodiment, the first network device may complete the creation of the QP for RDMA in advance, which may be created either on a virtual machine manager (Hypervisor) of the host, or on another central processing unit (central processing unit, CPU) or On the device, at the same time, the RDMA hardware device (RNIC or HCA or other hardware device with RDMA capability) itself can directly access the memory space of the VM. When the RDMA transmission hardware is limited to the CLA, the first network device creates a CLA for RDMA hardware access in advance. )/R_Key (used for reading the memory of the remote device, L_key and R_Key can also use the same value) The corresponding CLA is unique. Then perform RDMA memory registration, map the memory of the VM to be read into the CLA, and all RDMA data operation commands can be performed based on the CLA. The registered registry includes L_key, R_key, CLA start address, VM identifier, VM memory address and length, wherein there are multiple VM identifiers, and each VM identifier in the multiple VM identifiers corresponds to a VM memory address respectively and length. The first network device may query the registry based on the CLA and length of the target data to obtain the VM identifier, VM memory address and length corresponding to the CLA and the length, and then extract the target data from the position corresponding to the VM identifier, the VM memory address and the length. to encapsulate. In this embodiment, the CLA address of the target data can be referred to as shown in FIG. 4 . One CLA address includes addresses of data of multiple VMs. Exemplarily, VM1 includes data sg11 , sg12 and sg13 , and VM2 includes data sg21 , sg22 and sg23 , VM3 includes data sg31, sg32 and sg33, and the address of a CLA may include addresses adr11, adr21 and adr31, where address adr11 may indicate data sg11 in VM1, address adr21 may indicate data sg21 in VM2, and address adr31 may indicate VM3 Data in sg31.
可选的,注册表中也可以只包括L_key、R_key、VM标识、VM内存地址和长度,第一网络设备可以直接采用目标数据的VM标识、VM内存地址和长度,到相应的VM提取目标数据进行封装。本实施例中,对于VM标识、VM内存地址和长度需要通过注册表进行安全校验,避免当其指示的读写位置超出注册表中的VM标识、VM内存地址和长度指示的内存区域时,对注册表外的其他内存区域的VM进行读写的情况。Optionally, the registry may only include L_key, R_key, VM identifier, VM memory address and length, and the first network device may directly use the VM identifier, VM memory address, and length of the target data to extract the target data from the corresponding VM. to encapsulate. In this embodiment, security verification needs to be performed through the registry for the VM identifier, the VM memory address and the length, to avoid when the indicated read-write position exceeds the memory area indicated by the VM identifier, the VM memory address and the length in the registry, The case of reading and writing VMs in other memory areas other than the registry.
本实施例中的VM标识,基于单根输入/输出虚拟化(single-root I/O virtualization,SR-IOV)设备可以是物理功能(Physical Function,PF)/虚拟功能(Virtual Function,VF)的标识(ID),基于弹性IO虚拟化(Scalable I/O virtualization,Scalable-IOV)设备可以是可分配的设备接口(assignable device interfaces,ADI),或者其它的能够标识不同VM或地址域的标识进程地址 空间标识符(Process Address Space identifier,PASID)。The VM identifier in this embodiment may be a physical function (Physical Function, PF)/virtual function (Virtual Function, VF) based on a single-root I/O virtualization (SR-IOV) device. Identification (ID), based on Scalable I/O virtualization (Scalable-IOV) devices can be assignable device interfaces (ADI), or other identification processes that can identify different VMs or address domains Address space identifier (Process Address Space identifier, PASID).
可选的,存储器可以是第二网络设备中应用程序的内存空间,目标数据可以是第一网络设备中应用程序的内存空间中存储的数据。Optionally, the storage may be the memory space of the application program in the second network device, and the target data may be data stored in the memory space of the application program in the first network device.
最大传输单元(maximum transmission unit,MTU)是RDMA协议中每一次RDMA传输可以传输的最大数据包大小,上述第一报文的数量可以根据目标数据的数据量大小和MTU确定。The maximum transmission unit (maximum transmission unit, MTU) is the maximum data packet size that can be transmitted in each RDMA transmission in the RDMA protocol, and the number of the above-mentioned first packets can be determined according to the data size of the target data and the MTU.
可选的,本实施例中,第一网络设备通过RDMA传输向第二网络设备传输数据可以是第一网络设备直接向第二网络设备RDMA写入数据,也可以是第二网络设备向第一网络设备RDMA读取数据。当第二网络设备向第一网络设备RDMA读取数据时,步骤301之前,第二网络设备还需要向第一网络设备发送数据读取请求,以触发步骤301。Optionally, in this embodiment, when the first network device transmits data to the second network device through RDMA transmission, the first network device may directly write data to the second network device RDMA, or the second network device may write data to the first network device. The network device RDMA reads data. When the second network device reads data from the first network device RDMA, before step 301 , the second network device also needs to send a data read request to the first network device to trigger step 301 .
303、第一网络设备将报文发送至第二网络设备。303. The first network device sends the packet to the second network device.
本实施例中,当上述报文封装成功时,第一网络设备可以直接根据该报文的PSN顺序向第二网络设备发送该报文。相应的,第二网络设备接收该报文。In this embodiment, when the above-mentioned packet is successfully encapsulated, the first network device may directly send the packet to the second network device according to the PSN sequence of the packet. Correspondingly, the second network device receives the message.
304、第二网络设备对报文进行解封装,以获得N个数据和N个数据的写入地址。304. The second network device decapsulates the packet to obtain N pieces of data and N pieces of data write addresses.
第二网络设备接收到上述报文后,可以直接对该报文进行解封装,如拆解协议包,处理包头中的信息,取出净荷中的N个数据和N个数据的写入地址。After receiving the above-mentioned packet, the second network device may directly decapsulate the packet, for example, disassemble the protocol packet, process the information in the packet header, and extract the N data in the payload and the write address of the N data.
305、第二网络设备根据N个数据的写入地址在第二主机上存储N个数据。305. The second network device stores N pieces of data on the second host according to the write addresses of the N pieces of data.
第二网络设备在得到上述写入地址后,可以将N个数据存储其写入地址指示的VM中,例如写入地址为CLA地址和长度,第二网络设备可以根据CLA地址和长度,查询注册表以匹配相应的VM标识、VM内存地址和长度进行存储,或者写入地址为VM标识、VM内存地址和长度,第二网络设备直接根据VM标识、VM内存地址和长度对N个数据进行存储。After obtaining the above write address, the second network device may store N pieces of data in the VM indicated by the write address. For example, the write address is the CLA address and length, and the second network device may query the registration according to the CLA address and length. The table is stored by matching the corresponding VM ID, VM memory address and length, or the write address is the VM ID, VM memory address and length, and the second network device directly stores N pieces of data according to the VM ID, VM memory address and length .
本申请实施例中,第一网络设备将第一主机上运行的N个VM的N个数据封装成一个报文,并将该报文发送给第二网络设备,第一网络设备可以直接将多个VM的数据发送给第二网络设备,不需要执行多次RDMA传输,可以提高传输效率。In this embodiment of the present application, the first network device encapsulates N data of N VMs running on the first host into a packet, and sends the packet to the second network device. The first network device can directly The data of each VM is sent to the second network device without performing multiple RDMA transmissions, which can improve transmission efficiency.
二、第一主机上运行的VM包括不能提取数据的异常VM。2. The VMs running on the first host include abnormal VMs that cannot extract data.
请参阅图5,如图5所示本申请实施例提供的数据传输方法的另一个实施例包括:Please refer to FIG. 5. As shown in FIG. 5, another embodiment of the data transmission method provided by the embodiment of the present application includes:
501、第一网络设备获取N个数据。501. The first network device acquires N pieces of data.
502、第一网络设备根据RDMA协议将N个数据和N个数据的写入地址封装成一个报文。502. The first network device encapsulates the N pieces of data and the write addresses of the N pieces of data into one packet according to the RDMA protocol.
步骤501和步骤502可以参照图3所示的数据传输方法中步骤301和步骤302的相关描述,此处不再赘述。For steps 501 and 502, reference may be made to the relevant descriptions of steps 301 and 302 in the data transmission method shown in FIG. 3, and details are not repeated here.
503、第一网络设备根据异常VM的标识和内存地址以及N个VM中部分VM的标识和内存地址,获取M个数据。503. The first network device acquires M pieces of data according to the identifier and memory address of the abnormal VM and the identifiers and memory addresses of some VMs in the N VMs.
本实施例中,异常VM为无法正常提取数据的VM,该异常VM可以是VM故障、停机或重启等,第一网络设备进行RDMA传输的数据涉及到该异常VM中的数据,发起的RDMA命令例如直接内存访问(direct memory access,DMA)失败,根据异常VM的标识和内存地址无法得到数据,只能根据N个VM中部分VM的标识和内存地址获取到M个数据。In this embodiment, the abnormal VM is a VM from which data cannot be extracted normally. The abnormal VM may be a VM failure, shutdown or restart, etc. The data transmitted by the first network device through RDMA involves the data in the abnormal VM, and the RDMA command initiated by the first network device For example, if direct memory access (DMA) fails, data cannot be obtained according to the ID and memory address of the abnormal VM, and M data can only be obtained according to the ID and memory address of some VMs in N VMs.
504、第一网络设备根据RDMA协议将M个数据封装成一个异常报文。504. The first network device encapsulates the M pieces of data into an exception packet according to the RDMA protocol.
第一网络设备根据RDMA协议在封装异常VM的数据和N个VM中部分VM的数据时,由于异常VM的数据无法获取,该封装的报文即为异常报文,第一网络设备可以直接返回命令完成,对异常报文配置指示信息,该指示信息可以指示异常报文为出错的报文,例如可以在完成队列(complete queue,CQ)的完成队列元素(complete queue element,CQE)中配置 指示信息。When the first network device encapsulates the data of the abnormal VM and the data of some VMs in the N VMs according to the RDMA protocol, since the data of the abnormal VM cannot be obtained, the encapsulated message is the abnormal message, and the first network device can directly return After the command is completed, configure the indication information for the abnormal message. The indication information can indicate that the abnormal message is an error message. For example, the indication can be configured in the completion queue element (complete queue element, CQE) of the completion queue (complete queue, CQ). information.
本实施例中,步骤501至步骤502和步骤503至步骤504的前后顺序不作限定。In this embodiment, the sequence of steps 501 to 502 and steps 503 to 504 is not limited.
505、第一网络设备生成报文序列,报文序列包括异常报文和至少一个报文。505. The first network device generates a packet sequence, where the packet sequence includes an abnormal packet and at least one packet.
第一网络设备向第二网络设备RDMA传输数据时是按照传输报文的PSN依次传输的,本实施例中,第一设备将正常封装的报文和异常报文排序构成报文序列,为正常封装的报文和异常报文配置PSN。When the first network device RDMA transmits data to the second network device, the data is sequentially transmitted according to the PSNs of the transmission packets. In this embodiment, the first device sorts the normally encapsulated packets and the abnormal packets to form a packet sequence, which is a normal packet. Configure PSN for encapsulated packets and abnormal packets.
506、第一网络设备修改报文序列,其中,修改报文序列包括删除异常报文,并在报文序列中添加填充报文。506. The first network device modifies the packet sequence, where the modification of the packet sequence includes deleting an abnormal packet and adding a padding packet to the packet sequence.
本实施例中,第一网络设备可以将报文序列中影响第二网络设备接收的相关状态的因素进行处理,示例性的,第一网络设备通过QP消息向第二设备发送报文序列,第一网络设备可以修改报文序列中的PSN,跳过该出错消息进行下一个消息的发送,以避免第二网络设备发现PSN不连续认为中间的消息丢失了,以至于反复尝试重传,在经过数次尝试不成功后,认为QP故障或QP断链的情况。示例性的,如图6所示的组合条带示意图,以VM0、VM1、VM2、VM3、VM4(图中未示出)和VM5(图中未示出),VM0包括以细实线框表示的数据D000、D001和D020,VM1包括数据D110和D111,VM2包括数据D210、D220和D221,VM3包括数据D300、D301和D310,VM4和VM5包括的数据以实心框表示。即对VM0至VM5的数据组合条带可以得到如图6中所示的MSG0至MSG5消息,第一网络设备可以按照图中发送队列(send queue,SQ)中的顺序向第二网络设备发送MSG0至MSG5的消息。In this embodiment, the first network device may process the factors in the packet sequence that affect the related state received by the second network device. Exemplarily, the first network device sends the packet sequence to the second device through a QP message, and the first network device sends the packet sequence to the second device through a QP message. A network device can modify the PSN in the message sequence, skip the error message and send the next message, so as to avoid the second network device finding that the PSN is discontinuous and thinking that the middle message is lost, so it tries to retransmit repeatedly. After several unsuccessful attempts, it is considered that the QP is faulty or the QP is disconnected. Exemplarily, as shown in the schematic diagram of the combined strips shown in FIG. 6, with VM0, VM1, VM2, VM3, VM4 (not shown in the figure) and VM5 (not shown in the figure), VM0 is represented by a thin solid line box. The data D000, D001 and D020 of VM1, the data D110 and D111 of VM1, the data D210, D220 and D221 of VM2, the data D300, D301 and D310 of VM3, the data of VM4 and VM5 are represented by solid boxes. That is, the data strips from VM0 to VM5 can be combined to obtain MSG0 to MSG5 messages as shown in FIG. 6 , and the first network device can send MSG0 to the second network device according to the sequence in the send queue (SQ) in the figure. Message to MSG5.
当VM2故障时,VM2的数据出了问题,这样MSG2和MSG3就无法直接从VM2获取数据,这就是导致MSG2和MSG3发送失败,多次重传后连接就会断链了,导致其他正常的消息无法发送,本申请实施例可以将后续的MSG4和MSG5的数据前移,替代MSG2和MSG3的数据,如图7所示,即原本MSG4的数据现在的编号为MSG2’,MSG5的数据现在的编号为MSG3’。然后补充填充报文作为新的MSG4和MSG5的数据得到新的SQ’,以保证数据长度正确,使得RDMA的消息能够正常地发送到第二网络设备。具体的,填充报文中的填充数据可以是其它的VM数据或空白数据。When VM2 fails, there is a problem with the data of VM2, so that MSG2 and MSG3 cannot directly obtain data from VM2, which causes MSG2 and MSG3 to fail to send, and the connection will be broken after multiple retransmissions, resulting in other normal messages. If it cannot be sent, in this embodiment of the present application, the subsequent data of MSG4 and MSG5 can be moved forward to replace the data of MSG2 and MSG3, as shown in Figure 7, that is, the data of MSG4 is now numbered as MSG2', and the data of MSG5 is currently numbered. is MSG3'. Then, the padding message is supplemented as the data of the new MSG4 and MSG5 to obtain a new SQ' to ensure that the data length is correct, so that the RDMA message can be sent to the second network device normally. Specifically, the padding data in the padding packet may be other VM data or blank data.
上述补充填充报文的方式需要对报文的PSN重新排序,可选的,本申请实施例还可以是不需要对报文的PSN重新排序的补充填充报文的方式,如图8所示,第一网络设备直接将目标VM数据用其他VM的数据或空白数据替换以获得新的SQ’。The above-mentioned way of supplementing the message needs to reorder the PSN of the message. Optionally, the embodiment of the present application may also be a way of supplementing the message that does not need to reorder the PSN of the message, as shown in FIG. 8 , The first network device directly replaces the target VM data with data of other VMs or blank data to obtain a new SQ'.
507、第一网络设备向第二网络设备发送修改后的报文序列。507. The first network device sends the modified packet sequence to the second network device.
第一网络设备依据修改后的报文序列的PSN依次向第二网络设备发送报文和填充报文,相应的,第二网络设备根据PSN顺序依次接收修改后的报文序列中的报文和填充报文。The first network device sequentially sends the message and the padding message to the second network device according to the PSN of the modified message sequence. Correspondingly, the second network device sequentially receives the message and the padding message in the modified message sequence according to the PSN sequence. Fill the message.
508、第二网络设备确定修改后的报文序列中的填充报文,并删除填充报文。508. The second network device determines a padding packet in the modified packet sequence, and deletes the padding packet.
第二网络设备接收到修改后的报文序列后,可以通过数据完整性域(data integrity field,DIF)校验,例如填充报文中的数据来自于不属于N个VM的其他的VM,则可校验出为不符合VM类型的无效报文,或者填充报文中包括空白数据可以直接校验为无效报文,上层的数据处理程序会剔除无效的报文,只保存修改后的报文序列中的上述报文。After the second network device receives the modified packet sequence, it can pass the data integrity field (DIF) check. For example, the data in the padding packet comes from other VMs that do not belong to the N VMs, then It can be verified as an invalid packet that does not conform to the VM type, or the padding packet includes blank data, which can be directly verified as an invalid packet. The upper-layer data processing program will remove the invalid packet and save only the modified packet. the above message in the sequence.
509、第二网络设备对删除填充报文后的报文序列进行解封装,以获得N个数据和N个数据的写入地址。509. The second network device decapsulates the packet sequence after deleting the padding packet, so as to obtain N data and N data write addresses.
本实施例中,删除填充报文后的报文序列只包括步骤502中的报文,可以对该报文进行解封装,如拆解协议包,处理包头中的信息,取出净荷中的N个数据和N个数据的写入地址。In this embodiment, the packet sequence after deleting the padding packet only includes the packet in step 502, and the packet can be decapsulated, such as disassembling the protocol packet, processing the information in the packet header, and taking out the N in the payload. write address of data and N data.
510、第二网络设备根据N个数据的写入地址在第二主机上存储N个数据。510. The second network device stores N pieces of data on the second host according to the write addresses of the N pieces of data.
第二网络设备在得到上述写入地址后,可以将N个数据存储在其对应的写入地址指示的VM中,例如可以根据CLA地址和长度,查询注册表以匹配相应的VM标识、VM内存地址和长度进行存储,或者直接根据VM标识、VM内存地址和长度进行存储。After obtaining the above write address, the second network device may store N pieces of data in the VM indicated by the corresponding write address, for example, may query the registry to match the corresponding VM identifier and VM memory according to the CLA address and length. The address and length are stored, or directly stored according to the VM ID, VM memory address and length.
本申请实施例中,第一网络设备将第一主机上运行的N个VM的N个数据封装成一个报文,并将该报文发送给第二网络设备,第一网络设备可以直接将多个VM的数据发送给第二网络设备,不需要执行多次RDMA传输,可以提高传输效率。In this embodiment of the present application, the first network device encapsulates N data of N VMs running on the first host into a packet, and sends the packet to the second network device. The first network device can directly When the data of each VM is sent to the second network device, it is not necessary to perform multiple RDMA transmissions, which can improve transmission efficiency.
进一步的,第一网络设备对包括异常报文的报文序列进行修改,删除异常报文并添加填充报文,向第二网络设备发送该修改后的报文序列,避免第二网络设备发现序列号不连续认为中间的消息丢失了,导致反复尝试重传,在经过数次尝试不成功后,认为QP故障或QP断链,这样,可以提高传输效率。Further, the first network device modifies the message sequence including the abnormal message, deletes the abnormal message and adds a padding message, and sends the modified message sequence to the second network device to prevent the second network device from discovering the sequence. If the number is discontinuous, it is considered that the message in the middle is lost, which leads to repeated attempts to retransmit. After several unsuccessful attempts, it is considered that the QP is faulty or the QP chain is disconnected. In this way, the transmission efficiency can be improved.
以上描述了数据传输方法,下面结合附图介绍本申请实施例的网络设备。The data transmission method is described above, and the following describes the network device according to the embodiments of the present application with reference to the accompanying drawings.
图9为本申请实施例中网络设备90的一实施例示意图。FIG. 9 is a schematic diagram of an embodiment of a network device 90 in an embodiment of the present application.
如图9所示,本申请实施例提供了网络设备,该网络设备包括:As shown in FIG. 9 , an embodiment of the present application provides a network device, where the network device includes:
获取单元901,用于获取N个数据,网络设备设置于第一主机上,第一主机上运行有N个虚拟机VM,N个数据来自N个VM,其中,N为大于1的整数;The obtaining unit 901 is used to obtain N pieces of data, the network device is set on the first host, and N virtual machines VMs are running on the first host, and the N pieces of data come from the N pieces of VM, where N is an integer greater than 1;
封装单元902,用于根据远程直接存储器存取RDMA协议将N个数据和N个数据的写入地址封装成一个报文;an encapsulation unit 902, configured to encapsulate the N data and the write addresses of the N data into a message according to the remote direct memory access RDMA protocol;
发送单元903,用于将报文发送至第二网络设备。The sending unit 903 is configured to send the message to the second network device.
可选的,获取单元901具体用于:获取N个VM的标识和内存地址;根据N个VM的标识和内存地址,获取N个数据。Optionally, the obtaining unit 901 is specifically configured to: obtain identifiers and memory addresses of N VMs; and obtain N pieces of data according to the identifiers and memory addresses of N VMs.
可选的,获取单元901还用于:根据异常VM的标识和内存地址以及N个VM中部分VM的标识和内存地址,获取M个数据,异常VM运行在第一主机上,异常VM根据异常VM的标识和内存地址无法获取数据,其中,M为小于N的正整数;Optionally, the obtaining unit 901 is further configured to: obtain M pieces of data according to the identifier and memory address of the abnormal VM and the identifiers and memory addresses of some VMs in the N VMs, the abnormal VM runs on the first host, and the abnormal VM The identity and memory address of the VM cannot obtain data, where M is a positive integer less than N;
封装单元902还用于:根据RDMA协议将M个数据封装成一个异常报文。The encapsulation unit 902 is further configured to encapsulate the M pieces of data into an exception packet according to the RDMA protocol.
可选的,网络设备90还包括生成单元904,生成单元904具体用于:生成报文序列,报文序列包括异常报文和至少一个报文。Optionally, the network device 90 further includes a generating unit 904, and the generating unit 904 is specifically configured to: generate a packet sequence, where the packet sequence includes an abnormal packet and at least one packet.
可选的,网络设备90还包括修改单元905,修改单元905具体用于:修改报文序列,其中,修改报文序列包括删除异常报文,并在报文序列中添加填充报文。Optionally, the network device 90 further includes a modification unit 905, and the modification unit 905 is specifically configured to: modify the packet sequence, wherein modifying the packet sequence includes deleting abnormal packets and adding padding packets to the packet sequence.
本实施例中,网络设备可以执行前述图3和图5所示实施例中第一网络设备所执行的操作,具体此处不再赘述。In this embodiment, the network device may perform the operations performed by the first network device in the foregoing embodiments shown in FIG. 3 and FIG. 5 , and details are not repeated here.
图10为本申请实施例中网络设备100的另一实施例示意图。FIG. 10 is a schematic diagram of another embodiment of the network device 100 in the embodiment of the present application.
如图10所示,本申请实施例提供了网络设备,该网络设备包括:As shown in FIG. 10 , an embodiment of the present application provides a network device, where the network device includes:
接收单元1001,用于接收来自第一网络设备的报文,第一网络设备设置于第一主机上,网络设备设置于第二主机上,第一主机上运行有N个虚拟机VM,报文为根据远程直接存储器存取RDMA协议将N个数据和N个数据的写入地址封装生成的报文,N个数据来自N个VM,其中,N为大于1的整数;The receiving unit 1001 is configured to receive a message from a first network device, the first network device is set on a first host, the network device is set on a second host, and N virtual machines VM are running on the first host, and the message A message generated by encapsulating N data and N data write addresses according to the remote direct memory access RDMA protocol, where N data comes from N VMs, where N is an integer greater than 1;
解封装单元1002,用于对报文进行解封装,以获得N个数据和N个数据的写入地址;A decapsulation unit 1002, configured to decapsulate the message to obtain N data and N data write addresses;
存储单元1003,用于根据N个数据的写入地址在第二主机上存储N个数据。The storage unit 1003 is configured to store N pieces of data on the second host according to the write addresses of the N pieces of data.
可选的,接收单元1001还用于:接收修改后的报文序列,修改后的报文序列包括填充报文;Optionally, the receiving unit 1001 is further configured to: receive a modified message sequence, where the modified message sequence includes a padding message;
网络设备100还包括删除单元1004,删除单元1004具体用于:确定修改后的报文序列 中的填充报文,并删除填充报文。The network device 100 further includes a deletion unit 1004, and the deletion unit 1004 is specifically configured to: determine the padding message in the modified message sequence, and delete the padding message.
本实施例中,网络设备可以执行前述图3和图5所示实施例中第二网络设备所执行的操作,具体此处不再赘述。In this embodiment, the network device may perform the operations performed by the second network device in the foregoing embodiments shown in FIG. 3 and FIG. 5 , and details are not described herein again.
图11所示,为本申请的实施例提供的网络设备110的一种可能的逻辑结构示意图。网络设备110包括:处理器1101、通信接口1102、存储系统1103以及总线1104。处理器1101、通信接口1102以及存储系统1103通过总线1104相互连接。在本申请的实施例中,处理器1101用于对网络设备110的动作进行控制管理,例如,处理器1101用于执行图3和图5方法实施例中第一网络设备所执行的步骤。通信接口1102用于支持网络设备110进行通信。存储系统1103,用于存储网络设备110的程序代码和数据。As shown in FIG. 11 , a schematic diagram of a possible logical structure of the network device 110 provided by the embodiment of the present application is shown. The network device 110 includes: a processor 1101 , a communication interface 1102 , a storage system 1103 and a bus 1104 . The processor 1101 , the communication interface 1102 , and the storage system 1103 are connected to each other through a bus 1104 . In this embodiment of the present application, the processor 1101 is configured to control and manage the actions of the network device 110. For example, the processor 1101 is configured to execute the steps performed by the first network device in the method embodiments of FIG. 3 and FIG. 5 . The communication interface 1102 is used to support the communication of the network device 110 . The storage system 1103 is used to store program codes and data of the network device 110 .
其中,处理器1101可以是中央处理器单元,通用处理器,数字信号处理器,专用集成电路,现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器1101也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理器和微处理器的组合等等。总线1104可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图11中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The processor 1101 may be a central processing unit, a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array, or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute the various exemplary logical blocks, modules and circuits described in connection with this disclosure. The processor 1101 may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and the like. The bus 1104 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like. The bus can be divided into address bus, data bus, control bus and so on. For ease of presentation, only one thick line is used in FIG. 11, but it does not mean that there is only one bus or one type of bus.
网络设备90中的发送单元903相当于网络设备110中的通信接口1102,网络设备90中的获取单元901、封装单元902、生成单元904和修改单元905相当于网络设备110中的处理器1101。The sending unit 903 in the network device 90 is equivalent to the communication interface 1102 in the network device 110 , and the acquiring unit 901 , the encapsulating unit 902 , the generating unit 904 and the modifying unit 905 in the network device 90 are equivalent to the processor 1101 in the network device 110 .
本实施例的网络设备110可对应于上述图3和图5方法实施例中的第一网络设备,该网络设备110中的通信接口1102可以实现上述图3和图5方法实施例中的第一网络设备所具有的功能和/或所实施的各种步骤,为了简洁,在此不再赘述。The network device 110 in this embodiment may correspond to the first network device in the foregoing method embodiments in FIG. 3 and FIG. 5 , and the communication interface 1102 in the network device 110 may implement the first network device in the foregoing method embodiments in FIG. 3 and FIG. 5 . For the sake of brevity, the functions possessed by the network device and/or the various steps implemented are not repeated here.
图12所示,为本申请的实施例提供的网络设备120的一种可能的逻辑结构示意图。网络设备120包括:处理器1201、通信接口1202、存储系统1203以及总线1204。处理器1201、通信接口1202以及存储系统1203通过总线1204相互连接。在本申请的实施例中,处理器1201用于对网络设备120的动作进行控制管理,例如,处理器1201用于执行图3和图5的方法实施例中第二网络设备所执行的步骤。通信接口1202用于支持网络设备120进行通信。存储系统1203,用于存储网络设备120的程序代码和数据。As shown in FIG. 12 , a schematic diagram of a possible logical structure of the network device 120 provided by the embodiment of the present application is shown. The network device 120 includes: a processor 1201 , a communication interface 1202 , a storage system 1203 and a bus 1204 . The processor 1201 , the communication interface 1202 , and the storage system 1203 are connected to each other through a bus 1204 . In this embodiment of the present application, the processor 1201 is configured to control and manage the actions of the network device 120. For example, the processor 1201 is configured to execute the steps performed by the second network device in the method embodiments of FIG. 3 and FIG. 5 . The communication interface 1202 is used to support the communication of the network device 120 . The storage system 1203 is used to store program codes and data of the network device 120 .
其中,处理器1201可以是中央处理器单元,通用处理器,数字信号处理器,专用集成电路,现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器1201也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理器和微处理器的组合等等。总线1204可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图12中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The processor 1201 may be a central processing unit, a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array, or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute the various exemplary logical blocks, modules and circuits described in connection with this disclosure. The processor 1201 may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and the like. The bus 1204 may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like. The bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is shown in FIG. 12, but it does not mean that there is only one bus or one type of bus.
网络设备100中的接收单元1001相当于网络设备120中的通信接口1202,网络设备100中的解封装单元1002、存储单元1003和删除单元1004可以相当于处理器1201。The receiving unit 1001 in the network device 100 is equivalent to the communication interface 1202 in the network device 120 , and the decapsulating unit 1002 , the storage unit 1003 and the deleting unit 1004 in the network device 100 may be equivalent to the processor 1201 .
本实施例的网络设备120可对应于上述图3和图5方法实施例中的第二网络设备,该网络设备120中的处理器1201和通信接口1202可以实现上述图3和图5方法实施例中的第二 网络设备所具有的功能和/或所实施的各种步骤,为了简洁,在此不再赘述。The network device 120 in this embodiment may correspond to the second network device in the foregoing method embodiments in FIGS. 3 and 5 , and the processor 1201 and the communication interface 1202 in the network device 120 may implement the foregoing method embodiments in FIGS. 3 and 5 . For the sake of brevity, the functions and/or various steps performed by the second network device in , are not repeated here.
在本申请的另一实施例中,还提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机执行指令,当设备的处理器执行该计算机执行指令时,设备执行上述图3和图5中第一网络设备所执行的数据传输方法的步骤。In another embodiment of the present application, a computer-readable storage medium is also provided, where computer-executable instructions are stored in the computer-readable storage medium. When the processor of the device executes the computer-executable instructions, the device executes the above-mentioned FIG. 3 and Steps of the data transmission method performed by the first network device in FIG. 5 .
在本申请的另一实施例中,还提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机执行指令,当设备的处理器执行该计算机执行指令时,设备执行上述图3和图5中第二网络设备所执行的数据传输方法的步骤。In another embodiment of the present application, a computer-readable storage medium is also provided, where computer-executable instructions are stored in the computer-readable storage medium. When the processor of the device executes the computer-executable instructions, the device executes the above-mentioned FIG. 3 and Steps of the data transmission method performed by the second network device in FIG. 5 .
在本申请的另一实施例中,还提供一种计算机程序产品,该计算机程序产品包括计算机执行指令,该计算机执行指令存储在计算机可读存储介质中;当设备的处理器执行该计算机执行指令时,设备执行上述图3和图5中第一网络设备所执行的数据传输方法的步骤。In another embodiment of the present application, a computer program product is also provided, the computer program product includes computer-executable instructions, and the computer-executable instructions are stored in a computer-readable storage medium; when a processor of a device executes the computer-executable instructions , the device executes the steps of the data transmission method executed by the first network device in the above-mentioned FIG. 3 and FIG. 5 .
在本申请的另一实施例中,还提供一种计算机程序产品,该计算机程序产品包括计算机执行指令,该计算机执行指令存储在计算机可读存储介质中;当设备的处理器执行该计算机执行指令时,设备执行上述图3和图5中第二网络设备所执行的数据传输方法的步骤。In another embodiment of the present application, a computer program product is also provided, the computer program product includes computer-executable instructions, and the computer-executable instructions are stored in a computer-readable storage medium; when a processor of a device executes the computer-executable instructions , the device executes the steps of the data transmission method executed by the second network device in the above-mentioned FIG. 3 and FIG. 5 .
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the system, device and unit described above may refer to the corresponding process in the foregoing method embodiments, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,read-only memory)、随机存取存储器(RAM,random access memory)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, read-only memory), random access memory (RAM, random access memory), magnetic disk or optical disk and other media that can store program codes .

Claims (22)

  1. 一种数据传输系统,其特征在于,所述数据传输系统包括第一网络设备和第二网络设备,所述第一网络设备设置于第一主机上,所述第二网络设备设置于第二主机上,所述第一主机上运行有N个虚拟机VM;A data transmission system, characterized in that the data transmission system includes a first network device and a second network device, the first network device is set on a first host, and the second network device is set on a second host On the first host, there are N virtual machines running on the first host;
    所述第一网络设备,用于获取N个数据,所述N个数据来自所述N个VM,根据远程直接存储器存取RDMA协议将所述N个数据和所述N个数据的写入地址封装成一个报文,将所述报文发送至所述第二网络设备,其中,N为大于1的整数;The first network device is configured to acquire N pieces of data, the N pieces of data are from the N pieces of VM, and write the N pieces of data and the write addresses of the N pieces of data according to the remote direct memory access (RDMA) protocol Encapsulate into a packet, and send the packet to the second network device, where N is an integer greater than 1;
    所述第二网络设备,用于接收所述报文,对所述报文进行解封装,以获得所述N个数据和所述N个数据的写入地址,根据所述N个数据的写入地址在所述第二主机上存储所述N个数据。The second network device is configured to receive the packet, and decapsulate the packet to obtain the N pieces of data and the write addresses of the N pieces of data, according to the write address of the N pieces of data The N pieces of data are stored on the second host at the incoming address.
  2. 根据权利要求1所述的数据传输系统,其特征在于,The data transmission system according to claim 1, wherein,
    所述第一网络设备,用于获取所述N个VM的标识和内存地址;根据所述N个VM的标识和内存地址,获取所述N个数据。The first network device is configured to acquire the identifiers and memory addresses of the N VMs; and acquire the N pieces of data according to the identifiers and memory addresses of the N VMs.
  3. 根据权利要求2所述的数据传输系统,其特征在于,所述第一主机上运行有异常VM,根据所述异常VM的标识和内存地址无法获取数据;The data transmission system according to claim 2, wherein an abnormal VM runs on the first host, and data cannot be obtained according to the identifier and memory address of the abnormal VM;
    所述第一网络设备,用于根据所述异常VM的标识和内存地址以及所述N个VM中部分VM的标识和内存地址,获取M个数据,其中,M为小于N的正整数;根据所述RDMA协议将所述M个数据封装成一个异常报文。The first network device is configured to obtain M pieces of data according to the identifier and memory address of the abnormal VM and the identifiers and memory addresses of some VMs in the N VMs, where M is a positive integer less than N; The RDMA protocol encapsulates the M pieces of data into an exception message.
  4. 根据权利要求3所述的数据传输系统,其特征在于,The data transmission system according to claim 3, wherein,
    所述第一网络设备,用于生成报文序列,所述报文序列包括所述异常报文和至少一个所述报文。The first network device is configured to generate a packet sequence, where the packet sequence includes the abnormal packet and at least one of the packets.
  5. 根据权利要4所述的数据传输系统,其特征在于,The data transmission system according to claim 4, wherein,
    所述第一网络设备,用于修改所述报文序列,其中,修改所述报文序列包括删除所述异常报文,并在所述报文序列中添加填充报文。The first network device is configured to modify the message sequence, wherein modifying the message sequence includes deleting the abnormal message and adding a padding message to the message sequence.
  6. 根据权利要求5所述的数据传输系统,其特征在于,The data transmission system according to claim 5, wherein,
    所述第二网络设备,用于接收修改后的所述报文序列,确定修改后的所述报文序列中的所述填充报文,删除所述填充报文。The second network device is configured to receive the modified packet sequence, determine the padding packet in the modified packet sequence, and delete the padding packet.
  7. 一种数据传输方法,其特征在于,包括:A data transmission method, comprising:
    第一网络设备获取N个数据,所述第一网络设备设置于第一主机上,所述第一主机上运行有N个虚拟机VM,所述N个数据来自所述N个VM,其中,N为大于1的整数;The first network device obtains N pieces of data, the first network device is set on a first host, and N virtual machines VMs run on the first host, and the N pieces of data come from the N VMs, wherein, N is an integer greater than 1;
    所述第一网络设备根据远程直接存储器存取RDMA协议将所述N个数据和所述N个数据的写入地址封装成一个报文;The first network device encapsulates the N data and the write address of the N data into a message according to the remote direct memory access RDMA protocol;
    所述第一网络设备将所述报文发送至所述第二网络设备。The first network device sends the message to the second network device.
  8. 根据权利要求7所述的数据传输方法,其特征在于,所述第一网络设备获取N个数据包括:The data transmission method according to claim 7, wherein the acquisition of N pieces of data by the first network device comprises:
    所述第一网络设备获取所述N个VM的标识和内存地址;The first network device obtains the identifiers and memory addresses of the N VMs;
    所述第一网络设备根据所述N个VM的标识和内存地址,获取所述N个数据。The first network device acquires the N pieces of data according to the identifiers and memory addresses of the N VMs.
  9. 根据权利要求7所述的数据传输方法,其特征在于,所述第一主机上运行有异常VM,所述异常VM根据所述异常VM的标识和内存地址无法获取数据,所述方法还包括:The data transmission method according to claim 7, wherein an abnormal VM runs on the first host, and the abnormal VM cannot acquire data according to the identifier and memory address of the abnormal VM, and the method further comprises:
    所述第一网络设备根据所述异常VM的标识和内存地址以及所述N个VM中部分VM的标识和内存地址,获取M个数据,其中,M为小于N的正整数;The first network device obtains M pieces of data according to the identifier and memory address of the abnormal VM and the identifiers and memory addresses of some VMs in the N VMs, where M is a positive integer less than N;
    所述第一网络设备根据所述RDMA协议将所述M个数据封装成一个异常报文。The first network device encapsulates the M pieces of data into an exception packet according to the RDMA protocol.
  10. 根据权利要求9所述的数据传输方法,其特征在于,所述方法还包括:The data transmission method according to claim 9, wherein the method further comprises:
    所述第一网络设备生成报文序列,所述报文序列包括所述异常报文和至少一个所述报文。The first network device generates a message sequence, and the message sequence includes the abnormal message and at least one of the messages.
  11. 根据权利要求10所述的数据传输方法,其特征在于,所述方法还包括:The data transmission method according to claim 10, wherein the method further comprises:
    所述第一网络设备修改所述报文序列,其中,修改所述报文序列包括删除所述异常报文,并在所述报文序列中添加填充报文。The first network device modifies the message sequence, wherein modifying the message sequence includes deleting the abnormal message and adding a padding message to the message sequence.
  12. 一种数据传输方法,其特征在于,包括:A data transmission method, comprising:
    第二网络设备接收来自第一网络设备的报文,所述第一网络设备设置于第一主机上,所述第二网络设备设置于第二主机上,所述第一主机上运行有N个虚拟机VM,所述报文为根据远程直接存储器存取RDMA协议将N个数据和所述N个数据的写入地址封装生成的报文,所述N个数据来自所述N个VM,其中,N为大于1的整数;The second network device receives the message from the first network device, the first network device is set on the first host, the second network device is set on the second host, and N running on the first host The virtual machine VM, the message is a message generated by encapsulating N pieces of data and the write addresses of the N pieces of data according to the remote direct memory access RDMA protocol, and the N pieces of data are from the N VMs, wherein , N is an integer greater than 1;
    所述第二网络设备对所述报文进行解封装,以获得所述N个数据和所述N个数据的写入地址;The second network device decapsulates the message to obtain the N pieces of data and the write addresses of the N pieces of data;
    所述第二网络设备根据所述N个数据的写入地址在所述第二主机上存储所述N个数据。The second network device stores the N pieces of data on the second host according to the write addresses of the N pieces of data.
  13. 根据权利要求12所述的数据传输方法,其特征在于,所述方法还包括:The data transmission method according to claim 12, wherein the method further comprises:
    所述第二网络设备接收修改后的报文序列,所述修改后的报文序列包括填充报文;receiving, by the second network device, a modified message sequence, where the modified message sequence includes a padding message;
    所述第二网络设备确定所述修改后的报文序列中的所述填充报文,并删除所述填充报文。The second network device determines the padding packet in the modified packet sequence, and deletes the padding packet.
  14. 一种网络设备,其特征在于,包括:A network device, characterized in that it includes:
    获取单元,用于获取N个数据,所述网络设备设置于第一主机上,所述第一主机上运行有N个虚拟机VM,所述N个数据来自所述N个VM,其中,N为大于1的整数;an acquisition unit, configured to acquire N pieces of data, the network device is set on a first host, and N virtual machines VMs run on the first host, and the N pieces of data come from the N VMs, where N is an integer greater than 1;
    封装单元,用于根据远程直接存储器存取RDMA协议将所述N个数据和所述N个数据的写入地址封装成一个报文;an encapsulation unit, configured to encapsulate the N pieces of data and the write addresses of the N pieces of data into a message according to the remote direct memory access RDMA protocol;
    发送单元,用于将所述报文发送至所述第二网络设备。A sending unit, configured to send the message to the second network device.
  15. 根据权利要求14所述的网络设备,其特征在于,所述获取单元具体用于:The network device according to claim 14, wherein the obtaining unit is specifically configured to:
    获取所述N个VM的标识和内存地址;Obtain the identifiers and memory addresses of the N VMs;
    根据所述N个VM的标识和内存地址,获取所述N个数据。The N pieces of data are acquired according to the identifiers and memory addresses of the N VMs.
  16. 根据权利要求14所述的网络设备,其特征在于,所述获取单元还用于:The network device according to claim 14, wherein the obtaining unit is further configured to:
    根据异常VM的标识和内存地址以及所述N个VM中部分VM的标识和内存地址,获取M个数据,所述异常VM运行在所述第一主机上,所述异常VM根据所述异常VM的标识和内存地址无法获取数据,其中,M为小于N的正整数;Acquire M pieces of data according to the identifier and memory address of the abnormal VM and the identifiers and memory addresses of some VMs in the N VMs, the abnormal VM runs on the first host, and the abnormal VM is based on the abnormal VM The identifier and memory address of , cannot obtain data, where M is a positive integer less than N;
    所述封装单元还用于:The packaging unit is also used for:
    根据所述RDMA协议将所述M个数据封装成一个异常报文。The M pieces of data are encapsulated into an exception message according to the RDMA protocol.
  17. 根据权利要求16所述的网络设备,其特征在于,所述网络设备还包括生成单元,所述生成单元具体用于:The network device according to claim 16, wherein the network device further comprises a generating unit, and the generating unit is specifically configured to:
    生成报文序列,所述报文序列包括所述异常报文和至少一个所述报文。A message sequence is generated, the message sequence including the abnormal message and at least one of the messages.
  18. 根据权利要求17所述的网络设备,其特征在于,所述网络设备还包括修改单元,所述修改单元具体用于:The network device according to claim 17, wherein the network device further comprises a modification unit, and the modification unit is specifically configured to:
    修改所述报文序列,其中,修改所述报文序列包括删除所述异常报文,并在所述报文序列中添加填充报文。Modifying the message sequence, wherein modifying the message sequence includes deleting the abnormal message, and adding a padding message to the message sequence.
  19. 一种网络设备,其特征在于,包括:A network device, characterized in that it includes:
    接收单元,用于接收来自第一网络设备的报文,所述第一网络设备设置于第一主机上, 所述网络设备设置于第二主机上,所述第一主机上运行有N个虚拟机VM,所述报文为根据远程直接存储器存取RDMA协议将N个数据和所述N个数据的写入地址封装生成的报文,所述N个数据来自所述N个VM,其中,N为大于1的整数;a receiving unit, configured to receive a message from a first network device, the first network device is set on a first host, the network device is set on a second host, and N virtual machines run on the first host machine VM, the message is a message generated by encapsulating N pieces of data and the write addresses of the N pieces of data according to the remote direct memory access RDMA protocol, and the N pieces of data are from the N VMs, wherein, N is an integer greater than 1;
    解封装单元,用于对所述报文进行解封装,以获得所述N个数据和所述N个数据的写入地址;a decapsulation unit, configured to decapsulate the message to obtain the N pieces of data and the write addresses of the N pieces of data;
    存储单元,用于根据所述N个数据的写入地址在所述第二主机上存储所述N个数据。A storage unit, configured to store the N pieces of data on the second host according to the write addresses of the N pieces of data.
  20. 根据权利要求19所述的网络设备,其特征在于,所述接收单元还用于:The network device according to claim 19, wherein the receiving unit is further configured to:
    接收修改后的报文序列,所述修改后的报文序列包括填充报文;receiving a modified sequence of messages, the modified sequence of messages including padding messages;
    所述网络设备还包括删除单元,所述删除单元具体用于:The network device further includes a deletion unit, and the deletion unit is specifically used for:
    确定所述修改后的报文序列中的所述填充报文,并删除所述填充报文。The padding message in the modified message sequence is determined, and the padding message is deleted.
  21. 一种网络设备,其特征在于,包括:处理器以及存储器,A network device, comprising: a processor and a memory,
    所述处理器用于执行所述存储器中存储的指令,使得所述网络设备执行权利要求7至11中任一项所述的方法。The processor is configured to execute instructions stored in the memory, so that the network device performs the method of any one of claims 7 to 11.
  22. 一种网络设备,其特征在于,包括:处理器以及存储器,A network device, comprising: a processor and a memory,
    所述处理器用于执行所述存储器中存储的指令,使得所述网络设备执行权利要求12至13中任一项所述的方法。The processor is configured to execute the instructions stored in the memory, so that the network device executes the method of any one of claims 12 to 13.
PCT/CN2021/129667 2021-01-14 2021-11-10 Data transmission system, data transmission method, and network device WO2022151820A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110049814.4 2021-01-14
CN202110049814.4A CN114765631A (en) 2021-01-14 2021-01-14 Data transmission system, data transmission method and network device

Publications (1)

Publication Number Publication Date
WO2022151820A1 true WO2022151820A1 (en) 2022-07-21

Family

ID=82363926

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/129667 WO2022151820A1 (en) 2021-01-14 2021-11-10 Data transmission system, data transmission method, and network device

Country Status (2)

Country Link
CN (1) CN114765631A (en)
WO (1) WO2022151820A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116150078A (en) * 2023-04-19 2023-05-23 湖南恩智绿源电子技术有限公司 Inter-board data communication transmission method, electronic device, and computer-readable storage medium
CN117278504A (en) * 2023-09-21 2023-12-22 中科驭数(北京)科技有限公司 Message data forwarding method and device

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117424849A (en) * 2022-07-26 2024-01-19 中兴智能科技南京有限公司 Data transmission method, device, computer equipment and readable medium
CN117707861A (en) * 2022-09-07 2024-03-15 华为技术有限公司 Data access method, device, network interface card, readable medium and electronic equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102801729A (en) * 2012-08-13 2012-11-28 福建星网锐捷网络有限公司 Virtual machine message forwarding method, network switching equipment and communication system
CN103248467A (en) * 2013-05-14 2013-08-14 中国人民解放军国防科学技术大学 In-chip connection management-based RDMA communication method
CN106897106A (en) * 2017-01-12 2017-06-27 北京三未信安科技发展有限公司 The sequential scheduling method and system of the concurrent DMA of multi-dummy machine under a kind of SR IOV environment
WO2018000195A1 (en) * 2016-06-28 2018-01-04 华为技术有限公司 Packet transmission method, virtual switch, and server
CN108228309A (en) * 2016-12-21 2018-06-29 腾讯科技(深圳)有限公司 Data packet method of sending and receiving and device based on virtual machine
US20190079896A1 (en) * 2017-09-14 2019-03-14 Vmware, Inc. Virtualizing connection management for virtual remote direct memory access (rdma) devices
CN109983439A (en) * 2016-12-28 2019-07-05 英特尔公司 Virtualize Remote Direct Memory access
US20200110626A1 (en) * 2018-10-08 2020-04-09 Microsoft Technology Licensing, Llc Rdma with virtual address space
CN111193653A (en) * 2019-12-31 2020-05-22 腾讯科技(深圳)有限公司 Data transmission method, device, equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102801729A (en) * 2012-08-13 2012-11-28 福建星网锐捷网络有限公司 Virtual machine message forwarding method, network switching equipment and communication system
CN103248467A (en) * 2013-05-14 2013-08-14 中国人民解放军国防科学技术大学 In-chip connection management-based RDMA communication method
WO2018000195A1 (en) * 2016-06-28 2018-01-04 华为技术有限公司 Packet transmission method, virtual switch, and server
CN108228309A (en) * 2016-12-21 2018-06-29 腾讯科技(深圳)有限公司 Data packet method of sending and receiving and device based on virtual machine
CN109983439A (en) * 2016-12-28 2019-07-05 英特尔公司 Virtualize Remote Direct Memory access
CN106897106A (en) * 2017-01-12 2017-06-27 北京三未信安科技发展有限公司 The sequential scheduling method and system of the concurrent DMA of multi-dummy machine under a kind of SR IOV environment
US20190079896A1 (en) * 2017-09-14 2019-03-14 Vmware, Inc. Virtualizing connection management for virtual remote direct memory access (rdma) devices
US20200110626A1 (en) * 2018-10-08 2020-04-09 Microsoft Technology Licensing, Llc Rdma with virtual address space
CN111193653A (en) * 2019-12-31 2020-05-22 腾讯科技(深圳)有限公司 Data transmission method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116150078A (en) * 2023-04-19 2023-05-23 湖南恩智绿源电子技术有限公司 Inter-board data communication transmission method, electronic device, and computer-readable storage medium
CN117278504A (en) * 2023-09-21 2023-12-22 中科驭数(北京)科技有限公司 Message data forwarding method and device

Also Published As

Publication number Publication date
CN114765631A (en) 2022-07-19

Similar Documents

Publication Publication Date Title
WO2022151820A1 (en) Data transmission system, data transmission method, and network device
CN111512603B (en) Data transmission method and first equipment
EP3042298B1 (en) Universal pci express port
EP3660686B1 (en) Method and device for transmitting data processing request
US20200218688A1 (en) Data validation method and apparatus, and network interface card
CN101827072A (en) Virtual memory protocol segmentation offloading
TW200846914A (en) Facilitating input/output processing of one or more guest processing systems
US11726666B2 (en) Network adapter with efficient storage-protocol emulation
WO2014086219A1 (en) Content searching chip and system based on peripheral component interconnect bus
WO2019190859A1 (en) Efficient and reliable message channel between a host system and an integrated circuit acceleration system
US11231983B2 (en) Fault tolerance processing method, apparatus, and server
US12088688B2 (en) Packet processing method, network device, and related device
KR101559089B1 (en) Communication protocol for sharing memory resources between components of a device
CN112866206A (en) Unidirectional data transmission method and device
US9769093B2 (en) Apparatus and method for performing InfiniBand communication between user programs in different apparatuses
CN113098780B (en) Message processing method of virtual network, electronic equipment and storage medium
US9787805B2 (en) Communication control system and communication control method
WO2023040330A1 (en) Data processing method, device, and system
CN101674219B (en) Communication method and device in tunnel mode of Internet security protocol intelligent card
CN117544579A (en) Data transmission method, device, equipment and medium
CN115987609A (en) Identification method of trusted virtual host, electronic device and storage medium

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21919011

Country of ref document: EP

Kind code of ref document: A1