WO2023005723A1 - Packet transmission method and communication apparatus - Google Patents
Packet transmission method and communication apparatus Download PDFInfo
- Publication number
- WO2023005723A1 WO2023005723A1 PCT/CN2022/106368 CN2022106368W WO2023005723A1 WO 2023005723 A1 WO2023005723 A1 WO 2023005723A1 CN 2022106368 W CN2022106368 W CN 2022106368W WO 2023005723 A1 WO2023005723 A1 WO 2023005723A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- network device
- message
- address
- header
- lid
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 93
- 230000005540 biological transmission Effects 0.000 title claims abstract description 89
- 238000004891 communication Methods 0.000 title claims abstract description 88
- 125000004122 cyclic group Chemical group 0.000 claims description 28
- 230000004044 response Effects 0.000 claims description 26
- 230000008859 change Effects 0.000 claims description 25
- 238000004590 computer program Methods 0.000 claims description 12
- 230000007246 mechanism Effects 0.000 claims description 12
- 238000005538 encapsulation Methods 0.000 claims description 10
- 238000010586 diagram Methods 0.000 description 23
- 238000012545 processing Methods 0.000 description 10
- 238000006243 chemical reaction Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 230000032683 aging Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 101100216234 Schizosaccharomyces pombe (strain 972 / ATCC 24843) cut20 gene Proteins 0.000 description 1
- 101100128228 Schizosaccharomyces pombe (strain 972 / ATCC 24843) lid2 gene Proteins 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000003032 molecular docking Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000003208 petroleum Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000005641 tunneling Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/66—Arrangements for connecting between networks having differing types of switching systems, e.g. gateways
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/26—Flow control; Congestion control using explicit feedback to the source, e.g. choke packets
- H04L47/263—Rate modification at the source after receiving feedback
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/30—Flow control; Congestion control in combination with information about buffer occupancy at either end or at transit nodes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements
- H04L49/10—Packet switching elements characterised by the switching fabric construction
- H04L49/111—Switch interfaces, e.g. port details
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/22—Parsing or analysis of headers
Definitions
- the embodiments of the present application relate to the communication field, and in particular, to a message transmission method and a communication device.
- Ethernet and infiniband technology (infiniband, IB) network clusters account for 78%, far exceeding other interconnection networks.
- Remote direct memory access (RDMA) technology first appeared on the IB network. IB has been the first choice for supercomputing interconnection since its inception due to its high performance and low latency, but the remote direct access (RDMA over Converged Ethernet) based on Ethernet Version2, RoCEv2) network has been adopted by more and more supercomputing interconnections because it is fully compatible with the Ethernet protocol (internet protocol, IP) network and supports the RDMA protocol.
- IP Ethernet protocol
- the Ethernet IP message is transmitted on the IB network through the gateway device using the Internet protocol over infiniband (IPoIB) technology running on the IB.
- IPoIB infiniband
- Ethernet IP message is directly encapsulated in the IB message through the tunneling technology, the message still passes through the kernel copy and software protocol stack, which cannot realize the remote direct access effect of the IB network and affects the transmission efficiency.
- Embodiments of the present application provide a message transmission method and a communication device, which are used to increase a message transmission rate.
- the first aspect of the embodiment of the present application provides a message transmission method, the method includes: the gateway receives the first message from the first network device, the first message header of the first message includes the Internet address of the second network device Protocol IP address, the second network device is the destination network device for the first network device to transmit the message; the gateway determines the local identifier LID of the second network device according to the IP address of the second network device in combination with the lookup table, and the lookup table includes the IP address The association relationship with the LID; the gateway strips the first packet header of the first packet and encapsulates the second packet header to obtain the second packet.
- the second packet header includes the local routing header, and the local routing header includes the first packet header.
- the gateway parses the first message of the first network device to obtain the IP address of the second network device, matches the LID corresponding to the IP address of the second network device in the lookup table, and includes the LID of the second network device
- the second packet header is encapsulated into the first packet after the first packet header is stripped to generate a second packet, and then the second packet is sent according to the LID of the second network device.
- the second packet is an ordinary IB format message, the LID of the second network device is stored in the local routing header of the second message, the transmission of the second message on the IB network does not need to be copied by the kernel, and the effect of remote direct access to the IB network can be realized, which improves the reporting efficiency. file transfer efficiency.
- the above steps where the gateway receives the first message from the first network device include: the gateway receives the first message in a large-capacity buffer according to the credit flow control mechanism; the gateway sends the first message to the first by suspending the message The network device feeds back the status information of the large-capacity buffer, so that the first network device adjusts packet transmission.
- the gateway can use the credit flow control mechanism of the IB network and combine a large-capacity buffer to receive messages from the first network device, and the large-capacity buffer can receive more flight messages , the gateway can also indicate the status information of the buffer to the first network device by suspending the message, so that the first network device can adjust the transmission of subsequent messages, and the first network device can reduce or stop sending messages accordingly, which can avoid Transmission congestion.
- the method before the gateway receives the first message from the first network device in the above step, the method further includes: the gateway applies for a LID from the subnet manager according to the route change, and the route change instructs the first network device to join the network
- the gateway receives the LID of the first network device; the gateway obtains a response message from the second network device, and the response message includes the LID and IP address of the second network device; the gateway according to the IP address and LID of the first network device, and the second 2.
- the IP address and LID of the network device update the lookup table.
- the gateway can apply for a LID for the subnet manager on the IB side of the first network device accessing the network, and obtain the response message of the second network device according to the broadcast message of the first network device, according to In response to the IP address and LID of the second network device in the message, a lookup table is generated or updated to improve the feasibility of the solution.
- the gateway in the above steps applying for a LID from the subnet manager according to the route change includes: the gateway receives an address resolution protocol ARP message from the first network device, and the ARP message includes the IP address of the first network device and The IP address of the second network device; the gateway applies to the subnet manager for the LID of the first network device according to the ARP message.
- the broadcast message can be an ARP message
- the route change can also be determined by the ARP message
- the gateway converts the ARP message into an ARP message in IB format, and sends it to the second network device
- the second network device feeds back an ARP response message
- the ARP response message includes the IP address and LID of the second network device, which improves the feasibility of the solution.
- the gateway strips the first packet header of the first packet and encapsulates the second packet header, so that after obtaining the second packet, the method further includes: the gateway updates the second packet Invariant cyclic redundancy check code (invariant cyclic redundancy check, ICRC) and variable cyclic redundancy check code (variant cyclic redundancy check, VCRC).
- Invariant cyclic redundancy check code invariant cyclic redundancy check, ICRC
- variable cyclic redundancy check code variable cyclic redundancy check code
- the gateway needs to modify the ICRC and VCRC of the second message to increase the error checking capability.
- the first message is an Ethernet message
- the second message is an IB message
- the Ethernet packet includes an Ethernet header, an IP header, a UDP header, an IB transmission header, an IB payload, an ICRC, and a cyclic redundancy check code (cyclic redundancy check, CRC).
- cyclic redundancy check CRC
- the IB packet includes a local routing header, an IB transport header, an IB payload, an ICRC, and a VCRC.
- the second aspect of the embodiment of the present application provides a message transmission method, including: the gateway receives the third message from the first network device, the third message header of the third message includes a local route header, and the local route The header includes the local identifier LID of the second network device, and the second network device is the destination network device for the first network device to transmit the message; the gateway determines the Internet protocol IP of the second network device according to the LID of the second network device and a lookup table address, the lookup table includes the association relationship between the IP address and the LID; the gateway strips the third packet header of the third packet, and encapsulates the fourth packet header to obtain the fourth packet, and the fourth packet header includes the second The IP address of the network device; the gateway sends the fourth message according to the IP address of the second network device.
- the gateway parses the first packet of the first network device to obtain the LID of the second network device, matches the IP address corresponding to the LID in the lookup table, and sends the second packet including the IP address of the second network device
- the header is encapsulated, and the second packet is generated from the first packet after stripping the first packet header, and then the second packet is sent according to the LID of the second network device.
- the transmission of the second packet on the Ethernet network does not require After the kernel is copied, the effect of remote direct access to the Ethernet network can be realized, and the efficiency of message transmission can be improved.
- the method also includes: the gateway obtains the queue pair number (queue pair number, QPN) of the first network device and the QPN of the second network device according to the link establishment message;
- the QPN of the second network device and the QPN of the second network device obtain the user datagram protocol UDP port number of the second network device, the lookup table also includes the association relationship between the QPN, the UDP port number, the IP address and the LID, and the fourth packet header also includes the second The MAC address and UDP port number of the media access control layer of the network device, and the MAC address of the second network device is obtained by broadcasting the IP address of the second network device.
- the gateway when the first network device and the second network device establish a transmission link, the gateway can also record the QPN of the first network device and the second network device, and pass the first network device and the second network device
- the QPN of the device calculates the UDP port number of the second network device
- the gateway can also send a broadcast message to the RoCE network according to the IP address of the second network device, so that the second network device can feedback the MAC address and send the second network device’s
- the MAC address and UDP port number are encapsulated into the second message, so that the gateway can transmit the second message according to the IP address, MAC address and UDP port number of the second network device, thereby improving the reliability of message transmission.
- the above steps where the gateway sends the fourth message according to the IP address of the second network device include: the gateway sends the fourth message in a large-capacity buffer according to the credit flow control mechanism and the IP address of the second network device.
- the gateway feeds back the status information of the large-capacity buffer to the second network device by suspending the message, so that the second network device adjusts message transmission.
- the method before the gateway receives the third message from the first network device in the above step, the method further includes: the gateway applies for a LID from the subnet manager according to the route change, and the route change instructs the second network device to join the network
- the gateway receives the LID of the second network device; the gateway obtains a response message from the first network device, and the response message includes the LID and IP address of the first network device; the gateway according to the IP address and LID of the first network device, and the second 2.
- the IP address and LID of the network device update the lookup table.
- the gateway in the above steps applying for a LID from the subnet manager according to the route change includes: the gateway receives an address resolution protocol ARP message from the second network device, and the ARP message includes the IP address of the first network device and The IP address of the second network device; the gateway applies to the subnet manager for the LID of the second network device according to the ARP message.
- the gateway strips the first packet header of the first packet and encapsulates the second packet header, so that after obtaining the second packet, the method further includes: the gateway updates the second packet The ICRC and CRC of the text.
- the first message is an Ethernet message
- the second message is an IB message
- the Ethernet packet includes an Ethernet header, an IP header, a UDP header, an IB transmission header, an IB payload, an ICRC, and a CRC.
- the IB message includes a local routing header, an IB transmission header, an IB payload, an ICRC, and a VCRC.
- the third aspect of the embodiment of the present application provides a communication device, including: a receiving unit, configured to receive a first message from a first network device, the first message header of the first message includes the Internet address of the second network device Protocol IP address, the second network device is the destination network device for the first network device to transmit the message; the determination unit is used to determine the local identifier LID of the second network device according to the IP address of the second network device in combination with a lookup table, and search The table includes the association relationship between the IP address and the LID; the encapsulation unit is used to strip the first packet header of the first packet and encapsulate the second packet header to obtain the second packet, the second packet header includes the local A routing header, where the local routing header includes the LID of the second network device; a sending unit configured to send the second packet according to the LID of the second network device.
- the communication device is configured to execute the method of the foregoing first aspect or any implementation manner of the first aspect.
- the fourth aspect of the embodiment of the present application provides a communication device, including: a receiving unit, configured to receive a third message from the first network device, the third message header of the third message includes a local routing header, the The local routing header includes the local identifier LID of the second network device, and the second network device is the destination network device of the message transmitted by the first network device; the determination unit is used to determine the second network device according to the LID of the second network device in combination with a lookup table
- the Internet protocol IP address of the network device, the lookup table includes the relationship between the IP address and the LID; the encapsulation unit is used to strip the third message header of the third message and encapsulate the fourth message header to obtain the fourth message
- the fourth message header includes the IP address of the second network device; the sending unit is configured to send the fourth message according to the IP address of the second network device.
- the communication device is configured to execute the method of the foregoing second aspect or any implementation manner of the second aspect.
- the fifth aspect of the embodiment of the present application provides a communication device, including: a processor, a memory, and a communication interface.
- the processor is used to execute instructions stored in the memory, so that the communication device performs the first aspect or any of the first aspects.
- An optional method provided by the communication interface for receiving or sending indications.
- For specific details of the communication device provided in the fifth aspect refer to the first aspect or any optional manner of the first aspect, and details are not repeated here.
- the sixth aspect of the embodiment of the present application provides a communication device, including: a processor, a memory, and a communication interface, the processor is used to execute instructions stored in the memory, so that the communication device performs the second aspect or any of the second aspects.
- the seventh aspect of the embodiment of the present application provides a computer-readable storage medium, where a program is stored in the computer-readable storage medium, and when the computer executes the program, the first aspect or any optional method of the first aspect can be executed provided method.
- the eighth aspect of the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a program, and when the computer executes the program, it executes the second aspect or any optional method of the second aspect provided method.
- a ninth aspect of the embodiments of the present application provides a computer program product.
- the computer program product When the computer program product is executed on a computer, the computer executes the method provided in the foregoing first aspect or any optional manner of the first aspect.
- the tenth aspect of the embodiments of the present application provides a computer program product.
- the computer program product When the computer program product is executed on a computer, the computer executes the method provided in the foregoing second aspect or any optional manner of the second aspect.
- Fig. 1 is the HPC system block diagram that the embodiment of the present application provides
- FIG. 2 is a schematic diagram of a short-distance scene inside a data center provided by an embodiment of the present application
- FIG. 3 is a schematic diagram of a long-distance scenario of a supercomputing intermediate interconnection provided by an embodiment of the present application
- FIG. 4 is a schematic diagram of an embodiment of a message transmission method provided by an embodiment of the present application.
- FIG. 5 is a schematic diagram of another embodiment of the message transmission method provided by the embodiment of the present application.
- FIG. 6 is a schematic structural diagram of a gateway provided by an embodiment of the present application.
- FIG. 7 is a schematic diagram of another structure of the gateway provided by the embodiment of the present application.
- FIG. 8 is a schematic diagram of another structure of the gateway provided by the embodiment of the present application.
- FIG. 9 is a schematic structural diagram of a communication device provided by an embodiment of the present application.
- FIG. 10 is another schematic structural diagram of a communication device provided by an embodiment of the present application.
- FIG. 11 is a schematic structural diagram of a communication device provided by an embodiment of the present application.
- FIG. 12 is a schematic diagram of another structure of a communication device provided by an embodiment of the present application.
- Embodiments of the present application provide a message transmission method and a communication device, which are used to increase a message transmission rate.
- HPC High Performance Computing
- the performance of the HPC system is not only related to the calculation of the computing nodes. Power and storage performance are also closely related to node interconnection network performance.
- Infiniband technology infiniband, IB
- IB Infiniband, IB
- the Ethernet IP network supports the remote direct memory access (RDMA) protocol and is adopted by more and more supercomputing interconnections.
- RDMA remote direct memory access
- the Ethernet message is a message transmitted in the Ethernet network, and is transmitted in the Ethernet network based on the IP address in the IP header.
- the IB message is a message transmitted in the IB network.
- the IB network does not perceive the IP address.
- the network device in the IB network is assigned a local identifier (local identification, LID).
- LID local identification
- the IB message is based on the LID in the local routing header in the IB network. in transmission.
- FIG. 1 is a block diagram of the HPC system provided by the embodiment of this application.
- high-performance RDMA network interconnection is used between computing nodes and storage nodes.
- IB and RoCEv2 network interconnection As shown in Figure 1.
- IB and RoCEv2 network interconnection There is a mixed interconnection between IB and RoCE, and there is a situation where Ethernet packets are converted to IB packets.
- the implementation of the IB and Ethernet RoCE conversion method in the embodiment of the present application can be used in efficient data interaction scenarios across the IB network and the RoCE network, mainly in short-distance scenarios within the data center and long-distance scenarios between supercomputing centers, as shown in Figure 2.
- the schematic diagram of the short-distance scene inside the data center provided in the embodiment of the application, the IB network and the RoCE network are transmitted through a gateway (gateway), and the IB message of the IB network is converted into an Ethernet message through the gateway for transmission on the RoCE network, or the RoCE network Ethernet packets are converted into IB packets through the gateway for transmission on the IB network.
- FIG 3 it is a schematic diagram of the long-distance scene of supercomputing intermediate interconnection provided by the embodiment of the present application, IB network 1, gateway 1, IB network 2 and gateway 2, and IB network 1 converts IB messages into Ethernet packets through gateway 1 The message is transmitted to the gateway 2, and the gateway 2 converts the Ethernet message into an IB message for transmission in the IB network 2.
- the embodiment of the present application takes a short-distance scene as an example.
- the Ethernet IP message is transmitted on the IB network through a gateway device (GateWay) using the Internet protocol (internet protocol over infiniband, IPoIB) technology running on the IB, wherein the TCP/IP message
- IPoIB Internet protocol over infiniband
- IPoIB Internet protocol over infiniband
- the text is directly encapsulated in the IB message transmission through the tunnel technology, and the message still passes through the kernel copy and software protocol stack, which cannot take advantage of the IB network kernel bypass (kernel bypass) and zero copy (zero copy), and the delay is large and the CPU usage is high
- Traditional packets are encapsulated by the IB network, which cannot take advantage of the high efficiency of the IB network. Especially for small message packets, the packet encapsulation efficiency is significantly reduced.
- the embodiment of the present application provides a message transmission method, the method is as follows.
- FIG. 4 it is an embodiment of a message transmission method provided by the embodiment of the present application.
- the method includes:
- the first network device sends the first packet to the gateway.
- the first network device is a network device on the RoCE network side
- the second network device is a network device on the IB network side.
- the network device on the RoCE network side needs to send data to the network device on the IB network side.
- the first network device can send the first message including the data to the second network device through the gateway.
- the first message for Ethernet packets.
- the first message includes a media access control (media access control, MAC) address of the first network device, an Internet protocol (internet protocol, IP) address, a user datagram protocol (user datagram protocol, UDP) port number, and , the MAC address and IP address of the second network device.
- media access control media access control
- IP Internet protocol
- UDP user datagram protocol
- the message format of the first message is shown in Table 1 below, and the first message includes an Ethernet (ethernet, ETH) header (header), an IP header, a UDP header, an IB transmission (transport) header, an IB payload (payload) ), an invariant cyclic redundancy check code (invariant cyclic redundancy check, ICRC) and a cyclic redundancy check code (cyclic redundancy check, CRC) field, wherein, the ETH header stores the information of the first network device and the second network device MAC address, the IP header stores the IP addresses of the first network device and the second network device, the UDP header stores the UDP port number of the first network device, and the IB transport header field stores the QPN of the first network device and the second network device The QPN, ICRC field and CRC field are used to check the data in the frame to ensure the correctness of data transmission.
- Ethernet Ethernet
- ETH Ethernet
- IP header IP header
- UDP header IP addresses of the first network device and the second
- the gateway determines the LID of the second network device according to the IP address of the second network device in combination with a lookup table.
- the gateway after the gateway receives the first message, it can analyze the first message to obtain the IP address of the second network device in the IP header, and then according to the IP address and LID in the lookup table Association relationship, matching the IP address of the second network device with the lookup table to obtain the LID of the second network device.
- the gateway applies for a LID from the subnet manager according to the routing change; the gateway receives the LID of the first network device; the gateway obtains a response message from the second network device; The address and LID, and the IP address and LID of the second network device update the lookup table.
- the gateway needs to add a routing path, that is, the gateway can determine the routing change, and the gateway can apply for a LID from the subnet manager, which is located in the IB network, and the subnet manager A LID may be allocated randomly, and the gateway may correspond the LID to the IP address of the first network device.
- the gateway can also receive a broadcast message from the first network, convert it into an IB message and forward it to the second network device, so that the second network device feeds back a response message, which includes the IP address of the second network device address and LID, the gateway can associate the IP address and LID of the first network device with the IP address and LID of the second network device, and store them in the lookup table, or update the lookup table.
- the gateway may apply for a LID from the subnet manager according to the route change, and the gateway receives an address resolution protocol (address resolution protocol, ARP) message from the first network device, and applies for the first subnet manager according to the ARP message.
- a LID of a network device Specifically, the gateway can also recognize that there is a newly added terminal on the Ethernet side by receiving an ARP message from the Ethernet side or a message from other protocols.
- the gateway can obtain the IP address in the ARP message , and apply for a LID from the subnet manager for the first network device.
- the gateway converts the ARP message into an ARP message on the IB side, and sends it to the second network device, and the response message sent by the second network device is an ARP response message.
- aging settings can also be performed, and the source IP address and destination IP address in the lookup table can be used as the basis for judgment.
- source IP address and destination IP address delete the association relationship between the source IP address and destination IP address in the lookup table, or if the source IP address and destination IP address are not received within a set time message, the lookup table enters the aging counting process, and when the aging counter reaches a preset value, the association between the source IP address and the destination IP address in the lookup table is deleted.
- the deletion of the source IP address and the destination IP address The way of looking up the association relationship in the table is not limited. Specifically, if an Ethernet message containing a specific IP is received during the counting period of the aging counting process, the aging counting process is re-entered.
- the gateway strips the first packet header of the first packet, and encapsulates the second packet header, to obtain the second packet.
- the gateway after the gateway obtains the LID of the second network device from the lookup table according to the IP address of the second network device, it can strip the first message header of the first message, wherein the first message includes the following For the ETH header, IP header, and UDP header in Table 1, the gateway can encapsulate the first packet after stripping the first packet header into the second packet header to form the second packet.
- the second packet header includes The LID of the second network device, the second message header is an IB message, and the conversion between the IB message and the Ethernet message is realized by using a hardware lookup table.
- the format of the second message is as shown in Table 2, and the Local Route Header (Local Route Header) is included in the second message header, and the Local Route Header field includes the LIDs of the first network device and the second network device.
- the gateway After the gateway strips the first message header and encapsulates the second message header, it can also update the UDP port number in the first message header and the QPN in the IB transport header to the lookup table.
- Src represents the source device
- Dst represents the destination device
- Src UDP port1 represents the UDP port number of the second network device.
- the gateway may also update the ICRC and variable cyclic redundancy check code (variant cyclic redundancy check, VCRC) of the second message. Specifically, after the gateway converts the first message into the second message, the gateway needs to modify the ICRC and VCRC of the second message to increase the code distance and error detection and correction capabilities of the entire coding system.
- variable cyclic redundancy check code variable cyclic redundancy check, VCRC
- the service level (service level) SL field of the IB message realizes the quality of service (quality of service, QOS) information transmission.
- DSCP uses the used 6 bits and the unused 2 bits in the service class TOS identification byte of each data packet IP header to distinguish the priority by encoding the value. Select 8 of the 16 virtual ports on the IB side to map with the 8 sending queues on the RoCE network.
- Table 4 The value range of the DSCP field on the Ethernet side and the mapping method of the SL field on the IB side are shown in Table 4.
- RoCEv2 side DSCP (6bits) IB side SL(4bits) Priority 0-7 0 Priority 0 8-15 1 Priority 1 16-23 2 Priority 2 24-31 3 Priority 3 32-39 4 Priority 4 40-47 5 Priority 5 48-55 6 Priority 6 56-63 7 Priority 7
- the gateway sends the second packet according to the LID of the second network device.
- the gateway after the gateway generates the second message, it can send the second message to the second network device in the IB network according to the LID of the second network device.
- the local gateway and the remote gateway need to negotiate a common subnet manager, that is, the LID allocated by the subnet manager needs to be uniquely identified in the IB network where the local gateway and the remote gateway are located.
- the local gateway converts IB packets into Ethernet packets, and the IP address and MAC address can be configured on the local gateway.
- the local gateway adopts a two-level flow control method for receiving packets from the remote gateway:
- the gateway receiving the first message from the first network device may be that the gateway receives the first message in a large-capacity buffer according to a credit (credit) flow control mechanism; the gateway sends the first message to the first The network device feeds back the status information of the large-capacity buffer, so that the first network device adjusts packet transmission.
- the gateway adopts the original credit flow control mechanism of the IB network, and uses first-in-first-out queue (first input first output, FIFO) large-capacity buffer docking based on virtual lane (virtual lane, VL) granularity, and adopts configurable FIFO water Line, while monitoring the internal storage of FIFO in real time, the large-capacity buffer can receive more flight messages.
- the local gateway transmits the FIFO status information to the Ethernet port of the remote gateway through the Ethernet flow control pause message, and at the same time analyzes the Pause message combined with the waterline setting to determine the congestion situation of the peer end.
- the originator makes adjustments to ensure efficient transmission and no congestion at the peer end.
- PFC priority-based flow control
- the gateway parses the first message of the first network device to obtain the IP address of the second network device, matches the LID corresponding to the IP address of the second network device in the lookup table, and includes the second network device
- the second packet header of the LID is encapsulated into the first packet after stripping the first packet header to generate a second packet, and then the second packet is sent according to the LID of the second network device, and the second packet is in
- the transmission of the IB network does not need to be copied by the kernel, which can realize the effect of remote direct access to the IB network and improve the efficiency of message transmission.
- the first network device is a network device of the IB network
- the second network device is an Ethernet network. Internet equipment.
- FIG. 5 is a schematic diagram of another embodiment of a message transmission method provided by the embodiment of the present application, and the method is as follows.
- the first network device sends the third packet to the gateway.
- the third message is an IB message, and the format of the IB message can refer to the format of the IB message in step 403.
- the third message header of the third message includes a Local Route Header, and the Local Route The Header includes the LID of the first network device and the LID of the second network device, and the QPN of the first network device and the QPN of the second network device.
- the gateway determines the Internet Protocol IP address of the second network device according to the LID of the second network device in combination with a lookup table.
- step 502 the manner of searching for the IP address according to the LID and the manner of updating the lookup table can refer to the relevant description of the manner of searching for the LID according to the IP address and the manner of updating the lookup table in step 402, which will not be repeated here.
- the gateway strips the third packet header of the third packet, and encapsulates the fourth packet header, to obtain the fourth packet.
- the fourth message header includes ETH header, IP header, UDP header, IB transport header, IB payload and CRC fields
- the gateway receives the IB message, and parses the source LID and the source LID of the LRH field inside the IB message
- the destination LID and destination QPN, through the source LID and destination LID of the message, and the destination QPN matching lookup table, realize the source IP address, destination IP address and UDP port number of the Ethernet header IP header of the Ethernet message, and at the same time set the IB transport header and the IB payload are directly encapsulated into the corresponding location field of RoCEv2, wherein the gateway can also send a broadcast message to the RoCE network according to the destination IP address, so that the second network device can feedback the MAC address, then the MAC address is the destination MAC address, and the The destination MAC address is stored in the ETH header field.
- the MAC address of the Ethernet side interface of the gateway can be used as the source MAC address, and the ICRC and CRC fields of the message are updated at the same time and encapsulated into the corresponding check field.
- the gateway implements IB IB packets on the Ethernet side and Ethernet packets on the Ethernet side are converted to each other.
- IB IB packets on the Ethernet side and Ethernet packets on the Ethernet side are converted to each other.
- the gateway obtains the QPN of the first network device and the QPN of the second network device according to the link establishment message; the gateway obtains the UDP port number of the second network device according to the QPN of the first network device and the QPN of the second network device .
- the first network device may send a link establishment message to the gateway, and the gateway converts the link establishment message into an IB format and sends it to the second network device, and the second The network device then feeds back a link establishment response message; it may also be that the second network device sends a link establishment message to the gateway, and the gateway converts the link establishment message into an Ethernet format and sends it to the first network device, and the first network device feeds back the establishment link message.
- Chain response message may also be that the second network device sends a link establishment message to the gateway, and the gateway converts the link establishment message into an Ethernet format and sends it to the first network device, and the first network device feeds back the establishment link message.
- the gateway can obtain the QPN of the first network device and the QPN of the second network device according to the link establishment message, and calculate the second link based on the QPN of the first network device and the QPN of the second network device according to the mapping relationship between the QPN and the UDP port number.
- the UDP port number of the second network device the gateway can store the corresponding relationship between the QPN of the first network device, the QPN of the second network device, and the UDP port number of the second network device in the lookup table, when the gateway is the second network device After the LID is allocated, the lookup table may also include the association relationship among IP address, LID, QPN and UDP port number.
- the gateway can determine the MAC address of the first network device and the MAC address of the second network device, the IP address of the first network device and the IP address of the second network device required by the fourth message header according to the third message, The QPN of the first network device, the QPN of the second network device, and the UDP port number of the second network device.
- the gateway sends the fourth packet according to the IP address of the second network device.
- the gateway may transmit the fourth message according to the IP address of the second network device in the fourth message according to the transmission mode of the Ethernet message.
- the gateway sends the fourth message according to the IP address of the second network device.
- Two-level flow control transmission can be used.
- the gateway sends the fourth message in a large-capacity buffer according to the credit flow control mechanism and the IP address of the second network device.
- the gateway feeds back the status information of the large-capacity buffer to the second network device by suspending the message, so that the second network device adjusts message transmission.
- the two-stage flow control transmission reference may be made to the relevant description in step 304, which will not be repeated here.
- the gateway parses the first message of the first network device to obtain the LID of the second network device, matches the IP address corresponding to the LID in the lookup table, and includes the first packet of the IP address of the second network device
- the second message header is encapsulated, and the second message is generated in the first message after stripping the first message header, and then the second message is sent according to the LID of the second network device, and the transmission of the second message on the Ethernet network It does not need to be copied by the kernel, and can realize the remote direct access effect of the Ethernet network, which improves the efficiency of message transmission.
- the structure of the gateway in the embodiment of the present application can be a schematic diagram of a gateway structure as shown in Figure 6.
- the gateway includes a switching chip and a processing chip, wherein the switching chip mainly implements basic forwarding functions, and the processing chip can be a CPU, FPGA, etc. Responsible for the establishment and maintenance of lookup tables required for the conversion of different protocol messages and the conversion of different protocol messages.
- the structure of the gateway in the embodiment of the present application may be another schematic structural diagram of the gateway as shown in FIG.
- the processing module is responsible for the establishment and maintenance of the lookup tables required for the conversion of different protocol messages and the conversion of different protocol messages.
- the structure of the processing chip can refer to the schematic structural diagram of the processing chip shown in FIG. and management mods.
- the Ethernet interface is used to receive Ethernet messages or output Ethernet messages
- the encapsulation/decapsulation module is used to perform conversion between Ethernet packets and IB packets.
- the buffer module is used to store flight messages, and send the status information of the buffer module to the peer end through the RoCE network.
- the IB credit flow control module is used to adjust the message transmission on the IB side.
- the QoS module is used to determine the virtual port for transmission between the Ethernet side and the IB side.
- the management module is used to manage LIDs, for example, applying for LIDs from the subnet manager and assigning LIDs to nodes.
- the message transmission method has been described above, and the communication device that can implement the message transmission method will be described below.
- FIG. 9 is a schematic structural diagram of a communication device provided by the embodiment of the present application.
- the communication device 90 includes:
- the receiving unit 901 is configured to receive a first message from the first network device, the first message header of the first message includes the Internet Protocol IP address of the second network device, and the second network device transmits the message for the first network device The destination network device of the document;
- the determining unit 902 is configured to determine the local identifier LID of the second network device according to the IP address of the second network device in combination with a lookup table, and the lookup table includes an association relationship between the IP address and the LID;
- the encapsulation unit 903 is configured to strip off the first packet header of the first packet and encapsulate the second packet header to obtain the second packet, the second packet header includes a local routing header, and the local routing header includes a second The LID of the network device;
- a sending unit 904 configured to send the second packet according to the LID of the second network device.
- the receiving unit 901 is specifically used for:
- the status information of the large-capacity buffer is fed back to the first network device by suspending the message, so that the first network device adjusts message transmission.
- the sending unit 904 is also used to:
- the receiving unit 901 is also used for:
- the obtaining unit 905 is also used for:
- the communication device further includes an updating unit 906, and the updating unit 906 is specifically used for:
- the lookup table is updated according to the IP address and LID of the first network device, and the IP address and LID of the second network device.
- the sending unit 904 is also used to:
- ARP message Receive an address resolution protocol ARP message from the first network device, where the ARP message includes the IP address of the first network device and the IP address of the second network device;
- the update unit 906 is also used to:
- the invariant cyclic redundancy check code ICRC and the variable cyclic redundancy check code VCRC of the second packet are updated.
- the first message is an Ethernet message
- the second message is an IB message
- the Ethernet packet includes an Ethernet header, an IP header, a UDP header, an IB transmission header, an IB payload, an ICRC, and a CRC.
- the IB packet includes a local routing header, an IB transport header, an IB payload, an ICRC, and a VCRC.
- FIG. 10 it is a schematic diagram of another structure of the communication device provided by the embodiment of the present application.
- the communication device 100 includes:
- the receiving unit 1001 is configured to receive a third message from the first network device, the third message header of the third message includes a local routing header, and the local routing header includes a local identifier LID of the second network device, and the second network
- the device is a destination network device for transmitting packets by the first network device;
- the determining unit 1002 is configured to determine the Internet protocol IP address of the second network device according to the LID of the second network device in combination with a lookup table, and the lookup table includes an association relationship between the IP address and the LID;
- the encapsulation unit 1003 is configured to strip off the third packet header of the third packet, and encapsulate the fourth packet header to obtain the fourth packet, where the fourth packet header includes the IP address of the second network device;
- a sending unit 1004 configured to send a fourth packet according to the IP address of the second network device.
- the communication device 100 further includes an obtaining unit 1005, and the obtaining unit 1005 is specifically used for:
- the lookup table also includes the association relationship between QPN, UDP port number, IP address and LID, the fourth message
- the header also includes the MAC address of the media access control layer of the second network device and the UDP port number, and the MAC address of the second network device is obtained by broadcasting the IP address of the second network device.
- the sending unit 1004 is specifically used for:
- the status information of the large-capacity buffer is fed back to the second network device by suspending the message, so that the second network device adjusts message transmission.
- the sending unit 1004 is also used to:
- the receiving unit 1001 is also used for:
- the acquiring unit 1005 is also used for:
- the response message includes the LID and IP address of the first network device
- the communication device 100 also includes an updating unit 1006, and the updating unit 1006 is specifically used for:
- the lookup table is updated according to the IP address and LID of the first network device, and the IP address and LID of the second network device.
- the sending unit 1004 is also used to:
- ARP message Receive an address resolution protocol ARP message from the second network device, where the ARP message includes the IP address of the first network device and the IP address of the second network device;
- the updating unit 1006 is also used for:
- the invariant cyclic redundancy check ICRC and the cyclic redundancy check CRC of the second packet are updated.
- the first message is an Ethernet message
- the second message is an IB message
- the Ethernet packet includes an Ethernet header, an IP header, a UDP header, an IB transmission header, an IB payload, an ICRC, and a CRC.
- the IB packet includes a local routing header, an IB transport header, an IB payload, an ICRC, and a VCRC.
- FIG. 11 is a schematic diagram of a possible logical structure of a communication device 110 provided by an embodiment of the present application.
- the communication device 110 includes: a processor 1101 , a communication interface 1102 , a storage system 1103 and a bus 1104 .
- the processor 1101 , the communication interface 1102 and the storage system 1103 are connected to each other through a bus 1104 .
- the processor 1101 is used to control and manage the actions of the communication device 110, for example, the processor 1101 is used to execute the steps performed by the gateway in the method embodiment in FIG. 4 .
- the communication interface 1102 is used to support the communication device 110 to communicate.
- the storage system 1103 is configured to store program codes and data of the communication device 110 .
- the processor 1101 may be a central processing unit, a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It can implement or execute the various illustrative logical blocks, modules and circuits described in connection with the present disclosure.
- the processor 1101 may also be a combination that implements computing functions, for example, a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and the like.
- the bus 1104 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus, etc.
- PCI Peripheral Component Interconnect
- EISA Extended Industry Standard Architecture
- the receiving unit 901 and the sending unit 904 in the communication device 90 are equivalent to the communication interface 1102 in the communication device 110, and the determination unit 902, encapsulation unit 903, acquisition unit 905 and update unit 906 in the communication device 90 are equivalent to the communication device 110.
- the communication device 110 in this embodiment may correspond to the gateway in the above-mentioned embodiment of the method in FIG. Various steps are not repeated here for the sake of brevity.
- FIG. 12 is a schematic diagram of a possible logical structure of a communication device 120 provided by an embodiment of the present application.
- the communication device 120 includes: a processor 1201 , a communication interface 1202 , a storage system 1203 and a bus 1204 .
- the processor 1201 , the communication interface 1202 and the storage system 1203 are connected to each other through the bus 1204 .
- the processor 1201 is used to control and manage the actions of the communication device 120, for example, the processor 1201 is used to execute the steps performed by the gateway in the method embodiment in FIG. 5 .
- the communication interface 1202 is used to support the communication device 120 to communicate.
- the storage system 1203 is used for storing program codes and data of the communication device 120 .
- the processor 1201 may be a central processing unit, a general processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It can implement or execute the various illustrative logical blocks, modules and circuits described in connection with the present disclosure.
- the processor 1201 may also be a combination that implements computing functions, for example, a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and the like.
- the bus 1204 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus, etc.
- PCI Peripheral Component Interconnect
- EISA Extended Industry Standard Architecture
- the receiving unit 1001 and the sending unit 1004 in the communication device 100 are equivalent to the communication interface 1202 in the communication device 120, and the determination unit 1002, encapsulation unit 1003, acquisition unit 1005 and update unit 1006 in the communication device 100 are equivalent to the communication interface 1202 in the communication device 120.
- the communication device 120 in this embodiment may correspond to the gateway in the above-mentioned method embodiment in FIG. Various steps are not repeated here for the sake of brevity.
- a computer-readable storage medium in which computer-executable instructions are stored, and when the processor of the device executes the computer-executable instructions, the device executes the above-mentioned method in Figure 4 Steps of the message transmission method executed by the gateway device in the embodiment.
- a computer-readable storage medium stores computer-executable instructions.
- the processor of the device executes the computer-executable instructions
- the device executes the above-mentioned method in FIG. 5 Steps of the message transmission method performed by the gateway in the embodiment.
- a computer program product includes computer-executable instructions stored in a computer-readable storage medium; when the processor of the device executes the computer-executable instructions , the device executes the steps of the packet transmission method executed by the gateway in the method embodiment in FIG. 4 above.
- a computer program product includes computer-executable instructions stored in a computer-readable storage medium; when the processor of the device executes the computer-executable instructions , the device executes the steps of the packet transmission method executed by the gateway in the method embodiment in FIG. 5 above.
- the disclosed system, device and method can be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function division. In actual implementation, there may be other division methods.
- multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
- the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
- the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
- the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, read-only memory), random access memory (RAM, random access memory), magnetic disk or optical disc, etc., which can store program codes. .
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Disclosed in embodiments of the present application are a packet transmission method and a communication apparatus, for use in improving a packet transmission rate. The method in the embodiments of the present application comprises: a gateway parses a first packet of a first network device to obtain an IP address of a second network device; matches in a lookup table an LID corresponding to the IP address of the second network device; encapsulates a second packet header comprising the LID of the second network device to the first packet from which a first packet header is stripped, so as to form a second packet, the LID being stored in a local routing header of the second packet; and then sends the second packet according to the LID of the second network device.
Description
本申请要求与2021年7月30日提交中国国家知识产权局,申请号为202110872533.9,发明名称为“一种报文传输方法以及通信装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed with the State Intellectual Property Office of China on July 30, 2021, with the application number 202110872533.9, and the title of the invention is "A Message Transmission Method and Communication Device", the entire contents of which are incorporated by reference in this application.
本申请实施例涉及通信领域,尤其涉及一种报文传输方法以及通信装置。The embodiments of the present application relate to the communication field, and in particular, to a message transmission method and a communication device.
随着数据规模急剧增大,应用性能对处理系统的算力需求指数型扩大,高性能计算(high performance computing,HPC)机群应用需求剧增。HPC互联网络中采用以太和无限带宽技术(infiniband,IB)网络集群占比78%,远超其他互联网络。远程直接内存访问(remote direct memory access,RDMA)技术最早出现在IB网络,IB因其高性能低时延自诞生以来一直是超算互联的首选,但基于以太的远程直接访问(RDMA over Converged Ethernet version2,RoCEv2)网络出现后,因其完全兼容以太互联网协议(internet protocol,IP)网络并且支持RDMA协议而被越来越多的超算互联采用。With the rapid increase of data scale, the computing power demand of application performance on processing system has expanded exponentially, and the demand for high performance computing (high performance computing, HPC) cluster applications has increased dramatically. In the HPC interconnection network, Ethernet and infiniband technology (infiniband, IB) network clusters account for 78%, far exceeding other interconnection networks. Remote direct memory access (RDMA) technology first appeared on the IB network. IB has been the first choice for supercomputing interconnection since its inception due to its high performance and low latency, but the remote direct access (RDMA over Converged Ethernet) based on Ethernet Version2, RoCEv2) network has been adopted by more and more supercomputing interconnections because it is fully compatible with the Ethernet protocol (internet protocol, IP) network and supports the RDMA protocol.
以太网络IP报文通过网关设备采用运行在IB上的互联网协议(internet protocol over infiniband,IPoIB)技术实现以太IP报文在IB网络上传输。The Ethernet IP message is transmitted on the IB network through the gateway device using the Internet protocol over infiniband (IPoIB) technology running on the IB.
但是,由于以太网络IP报文通过隧道技术直接封装在IB报文传输,报文仍然经过内核拷贝和软件协议栈,无法实现IB网络的远程直接访问效果,影响传输效率。However, because the Ethernet IP message is directly encapsulated in the IB message through the tunneling technology, the message still passes through the kernel copy and software protocol stack, which cannot realize the remote direct access effect of the IB network and affects the transmission efficiency.
发明内容Contents of the invention
本申请实施例提供了一种报文传输方法以及通信装置,用于提高报文传输速率。Embodiments of the present application provide a message transmission method and a communication device, which are used to increase a message transmission rate.
本申请实施例第一方面提供了一种报文传输方法,该方法包括:网关接收来自第一网络设备的第一报文,第一报文的第一报文头包括第二网络设备的互联网协议IP地址,第二网络设备为第一网络设备传输报文的目的网络设备;网关根据第二网络设备的IP地址,结合查找表确定第二网络设备的本地标识符LID,查找表包括IP地址和LID的关联关系;网关将第一报文的第一报文头剥离,并封装第二报文头,以获得第二报文,第二报文头包括本地路由头,本地路由头包括第二网络设备的LID;网关根据第二网络设备的LID发送第二报文。The first aspect of the embodiment of the present application provides a message transmission method, the method includes: the gateway receives the first message from the first network device, the first message header of the first message includes the Internet address of the second network device Protocol IP address, the second network device is the destination network device for the first network device to transmit the message; the gateway determines the local identifier LID of the second network device according to the IP address of the second network device in combination with the lookup table, and the lookup table includes the IP address The association relationship with the LID; the gateway strips the first packet header of the first packet and encapsulates the second packet header to obtain the second packet. The second packet header includes the local routing header, and the local routing header includes the first packet header. The LID of the second network device; the gateway sends the second message according to the LID of the second network device.
上述第一方面中,网关将第一网络设备的第一报文解析获得第二网络设备的IP地址,在查找表匹配第二网络设备的IP地址对应的LID,将包括第二网络设备的LID的第二报文头封装到,剥离第一报文头后的第一报文中生成第二报文,然后根据第二网络设备的LID发送该第二报文,第二报文为普通的IB格式报文,第二网络设备的LID存储在第二报文的本地路由头,第二报文在IB网络的传输不需要经过内核拷贝,可以实现IB网络的远程直接访问效果,提高了报文传输效率。In the first aspect above, the gateway parses the first message of the first network device to obtain the IP address of the second network device, matches the LID corresponding to the IP address of the second network device in the lookup table, and includes the LID of the second network device The second packet header is encapsulated into the first packet after the first packet header is stripped to generate a second packet, and then the second packet is sent according to the LID of the second network device. The second packet is an ordinary IB format message, the LID of the second network device is stored in the local routing header of the second message, the transmission of the second message on the IB network does not need to be copied by the kernel, and the effect of remote direct access to the IB network can be realized, which improves the reporting efficiency. file transfer efficiency.
一种可能的实施方式中,上述步骤网关接收来自第一网络设备的第一报文包括:网关 根据信用流控机制在大容量的缓冲区接收第一报文;网关通过暂停报文向第一网络设备反馈大容量的缓冲区的状态信息,以使得第一网络设备调整报文传输。In a possible implementation manner, the above steps where the gateway receives the first message from the first network device include: the gateway receives the first message in a large-capacity buffer according to the credit flow control mechanism; the gateway sends the first message to the first by suspending the message The network device feeds back the status information of the large-capacity buffer, so that the first network device adjusts packet transmission.
上述可能的实施方式中,网关可以采用IB网络的信用流控机制的基础上结合一个大容量的缓冲区接收来自第一网络设备的报文,大容量的缓冲区可以接收更多的飞行报文,网关还可以通过暂停报文指示第一网络设备该缓冲区的状态信息,以便第一网络设备调整后续报文的传输,第一网络设备可以相应减少或停止报文的发送,可以避免报文传输拥塞。In the above possible implementation manner, the gateway can use the credit flow control mechanism of the IB network and combine a large-capacity buffer to receive messages from the first network device, and the large-capacity buffer can receive more flight messages , the gateway can also indicate the status information of the buffer to the first network device by suspending the message, so that the first network device can adjust the transmission of subsequent messages, and the first network device can reduce or stop sending messages accordingly, which can avoid Transmission congestion.
一种可能的实施方式中,上述步骤网关接收来自第一网络设备的第一报文之前,该方法还包括:网关根据路由变化向子网管理器申请LID,路由变化指示第一网络设备加入网络;网关接收第一网络设备的LID;网关获取来自第二网络设备的响应报文,响应报文包括第二网络设备的LID和IP地址;网关根据第一网络设备的IP地址和LID,以及第二网络设备的IP地址和LID更新查找表。In a possible implementation manner, before the gateway receives the first message from the first network device in the above step, the method further includes: the gateway applies for a LID from the subnet manager according to the route change, and the route change instructs the first network device to join the network The gateway receives the LID of the first network device; the gateway obtains a response message from the second network device, and the response message includes the LID and IP address of the second network device; the gateway according to the IP address and LID of the first network device, and the second 2. The IP address and LID of the network device update the lookup table.
上述可能的实施方式中,网关可以为接入网络的第一网络设备在IB侧的子网管理器申请LID,并根据第一网络设备的广播报文获取第二网络设备的响应报文,根据响应报文中第二网络设备的IP地址和LID,生成或更新查找表,提高本方案的可行性。In the above possible implementation manner, the gateway can apply for a LID for the subnet manager on the IB side of the first network device accessing the network, and obtain the response message of the second network device according to the broadcast message of the first network device, according to In response to the IP address and LID of the second network device in the message, a lookup table is generated or updated to improve the feasibility of the solution.
一种可能的实施方式中,上述步骤网关根据路由变化向子网管理器申请LID包括:网关接收来自第一网络设备的地址解析协议ARP报文,ARP报文包括第一网络设备的IP地址和第二网络设备的IP地址;网关根据ARP报文向子网管理器申请第一网络设备的LID。In a possible implementation manner, the gateway in the above steps applying for a LID from the subnet manager according to the route change includes: the gateway receives an address resolution protocol ARP message from the first network device, and the ARP message includes the IP address of the first network device and The IP address of the second network device; the gateway applies to the subnet manager for the LID of the first network device according to the ARP message.
上述可能的实施方式中,广播报文可以是ARP报文,路由变化也可以是由ARP报文确定的,网关将ARP报文转换成IB格式的ARP报文,并发送给第二网络设备,第二网络设备反馈ARP响应报文,该ARP响应报文包括第二网络设备的IP地址和LID,提高本方案的可行性。In the above-mentioned possible implementation mode, the broadcast message can be an ARP message, and the route change can also be determined by the ARP message, and the gateway converts the ARP message into an ARP message in IB format, and sends it to the second network device, The second network device feeds back an ARP response message, and the ARP response message includes the IP address and LID of the second network device, which improves the feasibility of the solution.
一种可能的实施方式中,上述步骤网关将第一报文的第一报文头剥离,并封装第二报文头,以获得第二报文之后,该方法还包括:网关更新第二报文的不变循环冗余校验码(invariant cyclic redundancy check,ICRC)和可变循环冗余校验码(variant cyclic redundancy check,VCRC)。In a possible implementation manner, in the above steps, the gateway strips the first packet header of the first packet and encapsulates the second packet header, so that after obtaining the second packet, the method further includes: the gateway updates the second packet Invariant cyclic redundancy check code (invariant cyclic redundancy check, ICRC) and variable cyclic redundancy check code (variant cyclic redundancy check, VCRC).
上述可能的实施方式中,网关将第一报文转换成第二报文后,网关需要修改第二报文的ICRC和VCRC,以增加查错能力。In the above possible implementation manners, after the gateway converts the first message into the second message, the gateway needs to modify the ICRC and VCRC of the second message to increase the error checking capability.
一种可能的实施方式中,第一报文为以太报文,第二报文为IB报文。In a possible implementation manner, the first message is an Ethernet message, and the second message is an IB message.
一种可能的实施方式中,以太报文包括以太网头、IP头、UDP头、IB传输头、IB有效负载、ICRC和循环冗余校验码(cyclic redundancy check,CRC)。In a possible implementation manner, the Ethernet packet includes an Ethernet header, an IP header, a UDP header, an IB transmission header, an IB payload, an ICRC, and a cyclic redundancy check code (cyclic redundancy check, CRC).
一种可能的实施方式中,IB报文包括本地路由头、IB传输头、IB有效负载、ICRC和VCRC。In a possible implementation manner, the IB packet includes a local routing header, an IB transport header, an IB payload, an ICRC, and a VCRC.
本申请实施例第二方面提供了一种报文传输方法,包括:网关接收来自第一网络设备的第三报文,第三报文的第三报文头包括本地路由头,所述本地路由头包括第二网络设备的本地标识符LID,第二网络设备为第一网络设备传输报文的目的网络设备;网关根据第二网络设备的LID,结合查找表确定第二网络设备的互联网协议IP地址,查找表包括IP地址和LID的关联关系;网关将第三报文的第三报文头剥离,并封装第四报文头,以获得 第四报文,第四报文头包括第二网络设备的IP地址;网关根据第二网络设备的IP地址发送第四报文。The second aspect of the embodiment of the present application provides a message transmission method, including: the gateway receives the third message from the first network device, the third message header of the third message includes a local route header, and the local route The header includes the local identifier LID of the second network device, and the second network device is the destination network device for the first network device to transmit the message; the gateway determines the Internet protocol IP of the second network device according to the LID of the second network device and a lookup table address, the lookup table includes the association relationship between the IP address and the LID; the gateway strips the third packet header of the third packet, and encapsulates the fourth packet header to obtain the fourth packet, and the fourth packet header includes the second The IP address of the network device; the gateway sends the fourth message according to the IP address of the second network device.
上述第二方面中,网关将第一网络设备的第一报文解析获得第二网络设备的LID,在查找表匹配该LID对应的IP地址,将包括第二网络设备的IP地址的第二报文头封装到,剥离第一报文头后的第一报文中生成第二报文,然后根据第二网络设备的LID发送该第二报文,第二报文在以太网络的传输不需要经过内核拷贝,可以实现以太网络的远程直接访问效果,提高了报文传输效率。In the second aspect above, the gateway parses the first packet of the first network device to obtain the LID of the second network device, matches the IP address corresponding to the LID in the lookup table, and sends the second packet including the IP address of the second network device The header is encapsulated, and the second packet is generated from the first packet after stripping the first packet header, and then the second packet is sent according to the LID of the second network device. The transmission of the second packet on the Ethernet network does not require After the kernel is copied, the effect of remote direct access to the Ethernet network can be realized, and the efficiency of message transmission can be improved.
一种可能的实施方式中,该方法还包括:网关根据建链报文,获取第一网络设备的队列对序号(queue pair number,QPN)和第二网络设备的QPN;网关根据第一网络设备的QPN和第二网络设备的QPN获取第二网络设备的用户数据报协议UDP端口号,查找表还包括QPN、UDP端口号、IP地址和LID的关联关系,第四报文头还包括第二网络设备的媒体介入控制层MAC地址和UDP端口号,第二网络设备的MAC地址为根据第二网络设备的IP地址广播获得的。In a possible implementation manner, the method also includes: the gateway obtains the queue pair number (queue pair number, QPN) of the first network device and the QPN of the second network device according to the link establishment message; The QPN of the second network device and the QPN of the second network device obtain the user datagram protocol UDP port number of the second network device, the lookup table also includes the association relationship between the QPN, the UDP port number, the IP address and the LID, and the fourth packet header also includes the second The MAC address and UDP port number of the media access control layer of the network device, and the MAC address of the second network device is obtained by broadcasting the IP address of the second network device.
上述可能的实施方式中,网关在第一网络设备和第二网络设备建立传输链路的时候,还可以记录第一网络设备和第二网络设备的QPN,并通过第一网络设备和第二网络设备的QPN计算出第二网络设备的UDP端口号,网关还可以根据第二网络设备的IP地址向RoCE网络发送广播报文,以使得第二网络设备反馈MAC地址,并将第二网络设备的MAC地址和UDP端口号封装到第二报文中,以使得网关可以根据第二网络设备的IP地址、MAC地址和UDP端口号传输第二报文,提高报文传输的可靠性。In the above possible implementation manner, when the first network device and the second network device establish a transmission link, the gateway can also record the QPN of the first network device and the second network device, and pass the first network device and the second network device The QPN of the device calculates the UDP port number of the second network device, and the gateway can also send a broadcast message to the RoCE network according to the IP address of the second network device, so that the second network device can feedback the MAC address and send the second network device’s The MAC address and UDP port number are encapsulated into the second message, so that the gateway can transmit the second message according to the IP address, MAC address and UDP port number of the second network device, thereby improving the reliability of message transmission.
一种可能的实施方式中,上述步骤网关根据第二网络设备的IP地址发送第四报文包括:网关根据信用流控机制和第二网络设备的IP地址在大容量的缓冲区发送第四报文;网关通过暂停报文向第二网络设备反馈大容量的缓冲区的状态信息,以使得第二网络设备调整报文传输。In a possible implementation manner, the above steps where the gateway sends the fourth message according to the IP address of the second network device include: the gateway sends the fourth message in a large-capacity buffer according to the credit flow control mechanism and the IP address of the second network device. The gateway feeds back the status information of the large-capacity buffer to the second network device by suspending the message, so that the second network device adjusts message transmission.
一种可能的实施方式中,上述步骤网关接收来自第一网络设备的第三报文之前,该方法还包括:网关根据路由变化向子网管理器申请LID,路由变化指示第二网络设备加入网络;网关接收第二网络设备的LID;网关获取来自第一网络设备的响应报文,响应报文包括第一网络设备的LID和IP地址;网关根据第一网络设备的IP地址和LID,以及第二网络设备的IP地址和LID更新查找表。In a possible implementation manner, before the gateway receives the third message from the first network device in the above step, the method further includes: the gateway applies for a LID from the subnet manager according to the route change, and the route change instructs the second network device to join the network The gateway receives the LID of the second network device; the gateway obtains a response message from the first network device, and the response message includes the LID and IP address of the first network device; the gateway according to the IP address and LID of the first network device, and the second 2. The IP address and LID of the network device update the lookup table.
一种可能的实施方式中,上述步骤网关根据路由变化向子网管理器申请LID包括:网关接收来自第二网络设备的地址解析协议ARP报文,ARP报文包括第一网络设备的IP地址和第二网络设备的IP地址;网关根据ARP报文向子网管理器申请第二网络设备的LID。In a possible implementation manner, the gateway in the above steps applying for a LID from the subnet manager according to the route change includes: the gateway receives an address resolution protocol ARP message from the second network device, and the ARP message includes the IP address of the first network device and The IP address of the second network device; the gateway applies to the subnet manager for the LID of the second network device according to the ARP message.
一种可能的实施方式中,上述步骤网关将第一报文的第一报文头剥离,并封装第二报文头,以获得第二报文之后,该方法还包括:网关更新第二报文的ICRC和CRC。In a possible implementation manner, in the above steps, the gateway strips the first packet header of the first packet and encapsulates the second packet header, so that after obtaining the second packet, the method further includes: the gateway updates the second packet The ICRC and CRC of the text.
一种可能的实施方式中,第一报文为以太报文,第二报文为IB报文。In a possible implementation manner, the first message is an Ethernet message, and the second message is an IB message.
一种可能的实施方式中,以太报文包括以太网头、IP头、UDP头、IB传输头、IB有效负载、ICRC和CRC。In a possible implementation manner, the Ethernet packet includes an Ethernet header, an IP header, a UDP header, an IB transmission header, an IB payload, an ICRC, and a CRC.
一种可能的实施方式中,IB报文包括本地路由头、IB传输头、IB有效负载、ICRC和 VCRC。In a possible implementation manner, the IB message includes a local routing header, an IB transmission header, an IB payload, an ICRC, and a VCRC.
本申请实施例第三方面提供了一种通信装置,包括:接收单元,用于接收来自第一网络设备的第一报文,第一报文的第一报文头包括第二网络设备的互联网协议IP地址,第二网络设备为第一网络设备传输报文的目的网络设备;确定单元,用于根据第二网络设备的IP地址,结合查找表确定第二网络设备的本地标识符LID,查找表包括IP地址和LID的关联关系;封装单元,用于将第一报文的第一报文头剥离,并封装第二报文头,以获得第二报文,第二报文头包括本地路由头,所述本地路由头包括第二网络设备的LID;发送单元,用于根据第二网络设备的LID发送第二报文。The third aspect of the embodiment of the present application provides a communication device, including: a receiving unit, configured to receive a first message from a first network device, the first message header of the first message includes the Internet address of the second network device Protocol IP address, the second network device is the destination network device for the first network device to transmit the message; the determination unit is used to determine the local identifier LID of the second network device according to the IP address of the second network device in combination with a lookup table, and search The table includes the association relationship between the IP address and the LID; the encapsulation unit is used to strip the first packet header of the first packet and encapsulate the second packet header to obtain the second packet, the second packet header includes the local A routing header, where the local routing header includes the LID of the second network device; a sending unit configured to send the second packet according to the LID of the second network device.
该通信设备用于执行前述第一方面的方法或第一方面任意一种实施方式。The communication device is configured to execute the method of the foregoing first aspect or any implementation manner of the first aspect.
本申请实施例第四方面提供了一种通信装置,包括:接收单元,用于接收来自第一网络设备的第三报文,第三报文的第三报文头包括本地路由头,所述本地路由头包括第二网络设备的本地标识符LID,第二网络设备为第一网络设备传输报文的目的网络设备;确定单元,用于根据第二网络设备的LID,结合查找表确定第二网络设备的互联网协议IP地址,查找表包括IP地址和LID的关联关系;封装单元,用于将第三报文的第三报文头剥离,并封装第四报文头,以获得第四报文,第四报文头包括第二网络设备的IP地址;发送单元,用于根据第二网络设备的IP地址发送第四报文。The fourth aspect of the embodiment of the present application provides a communication device, including: a receiving unit, configured to receive a third message from the first network device, the third message header of the third message includes a local routing header, the The local routing header includes the local identifier LID of the second network device, and the second network device is the destination network device of the message transmitted by the first network device; the determination unit is used to determine the second network device according to the LID of the second network device in combination with a lookup table The Internet protocol IP address of the network device, the lookup table includes the relationship between the IP address and the LID; the encapsulation unit is used to strip the third message header of the third message and encapsulate the fourth message header to obtain the fourth message The fourth message header includes the IP address of the second network device; the sending unit is configured to send the fourth message according to the IP address of the second network device.
该通信设备用于执行前述第二方面的方法或第二方面任意一种实施方式。The communication device is configured to execute the method of the foregoing second aspect or any implementation manner of the second aspect.
本申请实施例第五方面提供了一种通信设备,包括:处理器、存储器、以及通信接口,该处理器用于执行该存储器中存储的指令,使得通信设备执行上述第一方面或第一方面任一种可选方式所提供的方法,该通信接口用于接收或发送指示。第五方面提供的通信设备的具体细节可参见上述第一方面或第一方面任一种可选方式,此处不再赘述。The fifth aspect of the embodiment of the present application provides a communication device, including: a processor, a memory, and a communication interface. The processor is used to execute instructions stored in the memory, so that the communication device performs the first aspect or any of the first aspects. An optional method provided by the communication interface for receiving or sending indications. For specific details of the communication device provided in the fifth aspect, refer to the first aspect or any optional manner of the first aspect, and details are not repeated here.
本申请实施例第六方面提供了一种通信设备,包括:处理器、存储器、以及通信接口,该处理器用于执行该存储器中存储的指令,使得通信设备执行上述第二方面或第二方面任一种可选方式所提供的方法,该通信接口用于接收或发送指示。第六方面提供的通信设备的具体细节可参见上述第二方面或第二方面任一种可选方式,此处不再赘述。The sixth aspect of the embodiment of the present application provides a communication device, including: a processor, a memory, and a communication interface, the processor is used to execute instructions stored in the memory, so that the communication device performs the second aspect or any of the second aspects. An optional method provided by the communication interface for receiving or sending indications. For specific details of the communication device provided in the sixth aspect, reference may be made to the second aspect or any optional manner of the second aspect, and details are not repeated here.
本申请实施例第七方面提供了一种计算机可读存储介质,该计算机可读存储介质中保存有程序,当该计算机执行程序时,执行前述第一方面或第一方面任一种可选方式提供的方法。The seventh aspect of the embodiment of the present application provides a computer-readable storage medium, where a program is stored in the computer-readable storage medium, and when the computer executes the program, the first aspect or any optional method of the first aspect can be executed provided method.
本申请实施例第八方面提供了一种计算机可读存储介质,该计算机可读存储介质中保存有程序,当该计算机执行程序时,执行前述第二方面或第二方面任一种可选方式提供的方法。The eighth aspect of the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a program, and when the computer executes the program, it executes the second aspect or any optional method of the second aspect provided method.
本申请实施例第九方面提供了一种计算机程序产品,当该计算机程序产品在计算机上执行时,该计算机执行前述第一方面或第一方面任一种可选方式提供的方法。A ninth aspect of the embodiments of the present application provides a computer program product. When the computer program product is executed on a computer, the computer executes the method provided in the foregoing first aspect or any optional manner of the first aspect.
本申请实施例第十方面提供了一种计算机程序产品,当该计算机程序产品在计算机上执行时,该计算机执行前述第二方面或第二方面任一种可选方式提供的方法。The tenth aspect of the embodiments of the present application provides a computer program product. When the computer program product is executed on a computer, the computer executes the method provided in the foregoing second aspect or any optional manner of the second aspect.
图1为本申请实施例提供的HPC系统框图;Fig. 1 is the HPC system block diagram that the embodiment of the present application provides;
图2为本申请实施例提供的数据中心内部的短距场景示意图;FIG. 2 is a schematic diagram of a short-distance scene inside a data center provided by an embodiment of the present application;
图3为本申请实施例提供的超算中间互联的长距场景示意图;FIG. 3 is a schematic diagram of a long-distance scenario of a supercomputing intermediate interconnection provided by an embodiment of the present application;
图4为本申请实施例提供的报文传输方法一实施例示意图;FIG. 4 is a schematic diagram of an embodiment of a message transmission method provided by an embodiment of the present application;
图5为本申请实施例提供的报文传输方法另一实施例示意图;FIG. 5 is a schematic diagram of another embodiment of the message transmission method provided by the embodiment of the present application;
图6为本申请实施例提供的网关一结构示意图;FIG. 6 is a schematic structural diagram of a gateway provided by an embodiment of the present application;
图7为本申请实施例提供的网关另一结构示意图;FIG. 7 is a schematic diagram of another structure of the gateway provided by the embodiment of the present application;
图8为本申请实施例提供的网关另一结构示意图;FIG. 8 is a schematic diagram of another structure of the gateway provided by the embodiment of the present application;
图9为本申请实施例提供的通信装置一结构示意图;FIG. 9 is a schematic structural diagram of a communication device provided by an embodiment of the present application;
图10为本申请实施例提供的通信装置另一结构示意图;FIG. 10 is another schematic structural diagram of a communication device provided by an embodiment of the present application;
图11为本申请实施例提供的通信设备一结构示意图;FIG. 11 is a schematic structural diagram of a communication device provided by an embodiment of the present application;
图12为本申请实施例提供的通信设备另一结构示意图。FIG. 12 is a schematic diagram of another structure of a communication device provided by an embodiment of the present application.
本申请实施例提供了一种报文传输方法以及通信装置,用于提高报文传输速率。Embodiments of the present application provide a message transmission method and a communication device, which are used to increase a message transmission rate.
下面结合附图,对本申请的实施例进行描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。Embodiments of the present application are described below in conjunction with the accompanying drawings. Apparently, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Those of ordinary skill in the art know that, with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second" and the like in the specification and claims of the present application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to the expressly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as superior or better than other embodiments.
本申请中的表格可以进行拆分和合并,并不局限于此,此处仅仅给出一种示例。The tables in this application can be split and merged, and are not limited thereto, and an example is only given here.
另外,为了更好的说明本申请,在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解,没有某些具体细节,本申请同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述,以便于凸显本申请的主旨。In addition, in order to better illustrate the present application, numerous specific details are given in the following specific implementation manners. It will be understood by those skilled in the art that the present application may be practiced without certain of the specific details. In some instances, methods, means, components and circuits well known to those skilled in the art have not been described in detail in order to highlight the gist of the present application.
随着数据规模急剧增大,应用性能对处理系统的算力需求指数型扩大,高性能计算(High Performance Computing,HPC)应用需求剧增。高性能计算(HPC)的主要行业和应用在学校、研究所等科学研究机构。石油部门、医学生物、计算化学和汽车与航空航天设计、建筑结构设计、三维图形运算等领域。HPC是指使用很多处理器(单个机器的一部分)或者某集群中若干计算机(作为单个计算资源操作)的计算系统和环境。面对大规模运算任务使用并行算法,将一个大任务拆分并分发到集群内的不同节点上进行并行运算,再将 计算结果汇总,实现快速得到最终结果,可见HPC系统性能不仅与计算节点算力、存储性能还与节点互联网络性能密切相关。无限带宽技术(infiniband,IB)因其高性能低时延自诞生以来一直是超算互联的首选,但基于以太的远程直接访问(RDMA over converged ethernet version2,RoCEv2)网络出现后,因其完全兼容以太IP网络并且支持远程直接内存访问(remote direct memory access,RDMA)协议而被越来越多的超算互联采用。With the rapid increase of data scale, the computing power demand of application performance on processing system has expanded exponentially, and the demand for high-performance computing (High Performance Computing, HPC) applications has increased dramatically. The main industries and applications of high-performance computing (HPC) are in scientific research institutions such as schools and research institutes. Petroleum sector, medical biology, computational chemistry and automotive and aerospace design, architectural structure design, 3D graphics computing and other fields. HPC refers to computing systems and environments that use many processors (part of a single machine) or several computers in a cluster (operating as a single computing resource). In the face of large-scale computing tasks, parallel algorithms are used to split and distribute a large task to different nodes in the cluster for parallel computing, and then aggregate the calculation results to quickly obtain the final result. It can be seen that the performance of the HPC system is not only related to the calculation of the computing nodes. Power and storage performance are also closely related to node interconnection network performance. Infiniband technology (infiniband, IB) has been the first choice for supercomputing interconnection since its birth due to its high performance and low latency. The Ethernet IP network supports the remote direct memory access (RDMA) protocol and is adopted by more and more supercomputing interconnections.
其中,以太报文为以太网络中传输的报文,基于IP头中的IP地址在以太网络中传输。Wherein, the Ethernet message is a message transmitted in the Ethernet network, and is transmitted in the Ethernet network based on the IP address in the IP header.
IB报文为IB网络中传输的报文,IB网络不感知IP地址,IB网络中的网络设备分配有本地标识符(local identification,LID),IB报文基于本地路由头中的LID在IB网络中传输。The IB message is a message transmitted in the IB network. The IB network does not perceive the IP address. The network device in the IB network is assigned a local identifier (local identification, LID). The IB message is based on the LID in the local routing header in the IB network. in transmission.
目前数据中心存在IB网络和RoCE网络同时并存的并且需要进行交互场景,比如存储和计算采用不同的高速互联网络,需要支持IB网络和RoCE网络互联的设备和装置。如图1所示为本申请实施例提供的HPC系统框图,在HPC集群环境下,计算节点和存储节点间采用高性能RDMA网络互联,目前主流的采用IB和RoCEv2网络互联,如图1所示的存在IB和RoCE混合互联连,存在以太报文转换为IB报文的情况。At present, data centers have IB networks and RoCE networks coexisting and require interaction scenarios. For example, storage and computing use different high-speed interconnection networks, and equipment and devices that support the IB network and RoCE network interconnection are required. Figure 1 is a block diagram of the HPC system provided by the embodiment of this application. In the HPC cluster environment, high-performance RDMA network interconnection is used between computing nodes and storage nodes. Currently, the mainstream uses IB and RoCEv2 network interconnection, as shown in Figure 1. There is a mixed interconnection between IB and RoCE, and there is a situation where Ethernet packets are converted to IB packets.
HPC集群内部大部分采用IB和RoCE网络组网,存在跨IB和RoCE网络数据传输需求,需要实现IB网络和RoCE网络高效互通。Most of the HPC clusters use the IB and RoCE networks for networking, and there is a need for data transmission across the IB and RoCE networks. It is necessary to achieve efficient interworking between the IB network and the RoCE network.
本申请实施例实现IB和以太RoCE转换方法可以用于跨IB网络和RoCE网络高效数据交互场景,主要存在数据中心内部的短距以及超算中心间的长距场景,如图2所示为本申请实施例提供的数据中心内部的短距场景示意图,IB网络和RoCE网络间通过网关(gateway)传输,IB网络的IB报文通过网关转换成以太报文以在RoCE网络传输,或者RoCE网络的以太报文通过网关转换成IB报文以在IB网络中传输。如图3所示为本申请实施例提供的超算中间互联的长距场景示意图,IB网络1、网关1、IB网络2和网关2,IB网络1将IB报文通过网关1转换成以太报文传输到网关2,网关2将以太报文转换成IB报文以在IB网络2中传输。本申请实施例以短距场景为例。The implementation of the IB and Ethernet RoCE conversion method in the embodiment of the present application can be used in efficient data interaction scenarios across the IB network and the RoCE network, mainly in short-distance scenarios within the data center and long-distance scenarios between supercomputing centers, as shown in Figure 2. The schematic diagram of the short-distance scene inside the data center provided in the embodiment of the application, the IB network and the RoCE network are transmitted through a gateway (gateway), and the IB message of the IB network is converted into an Ethernet message through the gateway for transmission on the RoCE network, or the RoCE network Ethernet packets are converted into IB packets through the gateway for transmission on the IB network. As shown in Figure 3, it is a schematic diagram of the long-distance scene of supercomputing intermediate interconnection provided by the embodiment of the present application, IB network 1, gateway 1, IB network 2 and gateway 2, and IB network 1 converts IB messages into Ethernet packets through gateway 1 The message is transmitted to the gateway 2, and the gateway 2 converts the Ethernet message into an IB message for transmission in the IB network 2. The embodiment of the present application takes a short-distance scene as an example.
现有技术中,以太网络IP报文通过网关设备(GateWay)采用运行在IB上的互联网协议(internet protocol over infiniband,IPoIB)技术实现以太IP报文在IB网络上传输,其中,TCP/IP报文通过隧道技术直接封装在IB报文传输,报文仍然经过内核拷贝和软件协议栈,不能发挥IB网络内核旁路(kernel bypass)、零复制(zero copy)优势,时延大CPU占用率高,传统报文经过IB网络封装,不能发挥IB网络承载效率高优势,特别对于小消息报文,报文封装效率降低明显。In the prior art, the Ethernet IP message is transmitted on the IB network through a gateway device (GateWay) using the Internet protocol (internet protocol over infiniband, IPoIB) technology running on the IB, wherein the TCP/IP message The text is directly encapsulated in the IB message transmission through the tunnel technology, and the message still passes through the kernel copy and software protocol stack, which cannot take advantage of the IB network kernel bypass (kernel bypass) and zero copy (zero copy), and the delay is large and the CPU usage is high , Traditional packets are encapsulated by the IB network, which cannot take advantage of the high efficiency of the IB network. Especially for small message packets, the packet encapsulation efficiency is significantly reduced.
为解决上述问题,本申请实施例提供了一种报文传输方法,该方法如下所示。In order to solve the above problem, the embodiment of the present application provides a message transmission method, the method is as follows.
请参阅图4,如图4所示为本申请实施例提供的一种报文传输方法一实施例,该方法包括:Please refer to FIG. 4. As shown in FIG. 4, it is an embodiment of a message transmission method provided by the embodiment of the present application. The method includes:
401.第一网络设备向网关发送第一报文。401. The first network device sends the first packet to the gateway.
本申请实施例中,第一网络设备为RoCE网络侧的网络设备,第二网络设备为IB网络侧的网络设备。其中,RoCE网络侧的网络设备需要将数据发送到IB网络侧的网络设备,具体的,第一网络设备可以通过网关向第二网络设备发送包括该数据的第一报文,该第一 报文为以太报文。其中,该第一报文包括第一网络设备的介质访问控制(media access control,MAC)地址,互联网协议(internet protocol,IP)地址,用户数据报协议(user datagram protocol,UDP)端口号,以及,第二网络设备的MAC地址,IP地址。In this embodiment of the present application, the first network device is a network device on the RoCE network side, and the second network device is a network device on the IB network side. Wherein, the network device on the RoCE network side needs to send data to the network device on the IB network side. Specifically, the first network device can send the first message including the data to the second network device through the gateway. The first message for Ethernet packets. Wherein, the first message includes a media access control (media access control, MAC) address of the first network device, an Internet protocol (internet protocol, IP) address, a user datagram protocol (user datagram protocol, UDP) port number, and , the MAC address and IP address of the second network device.
第一报文的报文格式如下表1所示,该第一报文包括以太网(ethernet,ETH)头(header),IP header,UDP header,IB传输(transport)header,IB有效负荷(payload),不变循环冗余校验码(invariant cyclic redundancy check,ICRC)和循环冗余校验码(cyclic redundancy check,CRC)字段,其中,ETH header存储有第一网络设备和第二网络设备的MAC地址,IP header存储有第一网络设备和第二网络设备的IP地址,UDP header存储有第一网络设备的UDP端口号,IB transport header字段存储有第一网络设备的QPN和第二网络设备的QPN,ICRC字段和CRC字段用于对帧内数据进行校验,保证数据传输的正确性。The message format of the first message is shown in Table 1 below, and the first message includes an Ethernet (ethernet, ETH) header (header), an IP header, a UDP header, an IB transmission (transport) header, an IB payload (payload) ), an invariant cyclic redundancy check code (invariant cyclic redundancy check, ICRC) and a cyclic redundancy check code (cyclic redundancy check, CRC) field, wherein, the ETH header stores the information of the first network device and the second network device MAC address, the IP header stores the IP addresses of the first network device and the second network device, the UDP header stores the UDP port number of the first network device, and the IB transport header field stores the QPN of the first network device and the second network device The QPN, ICRC field and CRC field are used to check the data in the frame to ensure the correctness of data transmission.
表1Table 1
402.网关根据第二网络设备的IP地址,结合查找表确定第二网络设备的LID。402. The gateway determines the LID of the second network device according to the IP address of the second network device in combination with a lookup table.
本申请实施例中,网关在接收到该第一报文后,可以对该第一报文进行解析,获得IP header中第二网络设备的IP地址,然后可以根据查找表中IP地址和LID的关联关系,将第二网络设备的IP地址与该查找表进行匹配获得该第二网络设备的LID。In the embodiment of the present application, after the gateway receives the first message, it can analyze the first message to obtain the IP address of the second network device in the IP header, and then according to the IP address and LID in the lookup table Association relationship, matching the IP address of the second network device with the lookup table to obtain the LID of the second network device.
可选的,在步骤401之前,网关根据路由变化向子网管理器申请LID;网关接收第一网络设备的LID;网关获取来自第二网络设备的响应报文;网关根据第一网络设备的IP地址和LID,以及第二网络设备的IP地址和LID更新查找表。具体的,当第一网络设备接入网络时,网关需要增加路由路径,即网关可以确定路由变化,该网关可以向子网管理器申请LID,该子网管理器位于IB网络,子网管理器可以随机分配一个LID,网关即可以将该LID与第一网络设备的IP地址对应。网关还可以接收来自第一网络的广播报文,并转换成IB报文转发给第二网络设备,以使得第二网络设备反馈响应报文,该响应报文中包括该第二网络设备的IP地址和LID,网关即可将第一网络设备的IP地址和LID,与第二网络设备的IP地址和LID对应起来,并保存在查找表中,或者对该查找表进行更新。Optionally, before step 401, the gateway applies for a LID from the subnet manager according to the routing change; the gateway receives the LID of the first network device; the gateway obtains a response message from the second network device; The address and LID, and the IP address and LID of the second network device update the lookup table. Specifically, when the first network device is connected to the network, the gateway needs to add a routing path, that is, the gateway can determine the routing change, and the gateway can apply for a LID from the subnet manager, which is located in the IB network, and the subnet manager A LID may be allocated randomly, and the gateway may correspond the LID to the IP address of the first network device. The gateway can also receive a broadcast message from the first network, convert it into an IB message and forward it to the second network device, so that the second network device feeds back a response message, which includes the IP address of the second network device address and LID, the gateway can associate the IP address and LID of the first network device with the IP address and LID of the second network device, and store them in the lookup table, or update the lookup table.
可选的,网关根据路由变化向子网管理器申请LID可以是,网关接收来自第一网络设备的地址解析协议(address resolution protocol,ARP)报文,根据ARP报文向子网管理器申请第一网络设备的LID。具体的,网关还可以是通过接收到以太侧ARP报文或者其他协议的报文识别以太侧有新加入的终端,当接收到该ARP报文时,网关即可获得ARP报文中的IP地址,并为第一网络设备向子网管理器申请LID。相应的,网关将ARP报文转换成IB侧的ARP报文,并发送给第二网络设备,第二网络设备发送的响应报文即为ARP响应报文。Optionally, the gateway may apply for a LID from the subnet manager according to the route change, and the gateway receives an address resolution protocol (address resolution protocol, ARP) message from the first network device, and applies for the first subnet manager according to the ARP message. A LID of a network device. Specifically, the gateway can also recognize that there is a newly added terminal on the Ethernet side by receiving an ARP message from the Ethernet side or a message from other protocols. When receiving the ARP message, the gateway can obtain the IP address in the ARP message , and apply for a LID from the subnet manager for the first network device. Correspondingly, the gateway converts the ARP message into an ARP message on the IB side, and sends it to the second network device, and the response message sent by the second network device is an ARP response message.
可选的,上述查找表在生成或更新后,还可以进行老化设置,可以以该查找表中的源IP地址和目的IP地址为判断依据,如果在一个设定的时间内没有收到包括该源IP地址和目的IP地址的报文,则删除该源IP地址和目的IP地址在查找表中的关联关系,或者如果 在一个设定的时间内没有收到包括该源IP地址和目的IP地址的报文,则查找表进入老化计数流程,当老化计数器达到预设数值,再删除该源IP地址和目的IP地址在查找表中的关联关系,本实施例对删除源IP地址和目的IP地址在查找表中的关联关系的方式不作限定。具体的,如果在老化计数流程的计数期间内收到含有特定IP的以太报文则重新进入老化计数流程。Optionally, after the above-mentioned lookup table is generated or updated, aging settings can also be performed, and the source IP address and destination IP address in the lookup table can be used as the basis for judgment. source IP address and destination IP address, delete the association relationship between the source IP address and destination IP address in the lookup table, or if the source IP address and destination IP address are not received within a set time message, the lookup table enters the aging counting process, and when the aging counter reaches a preset value, the association between the source IP address and the destination IP address in the lookup table is deleted. In this embodiment, the deletion of the source IP address and the destination IP address The way of looking up the association relationship in the table is not limited. Specifically, if an Ethernet message containing a specific IP is received during the counting period of the aging counting process, the aging counting process is re-entered.
403.网关将第一报文的第一报文头剥离,并封装第二报文头,以获得第二报文。403. The gateway strips the first packet header of the first packet, and encapsulates the second packet header, to obtain the second packet.
本申请实施例中,网关根据第二网络设备的IP地址从查找表获得第二网络设备的LID后,即可将第一报文的第一报文头剥离,其中,第一报文包括如表1中的ETH header、IP header和UDP header,网关即可将剥离第一报文头后的第一报文再封装上第二报文头,组成第二报文,第二报文头包括第二网络设备的LID,第二报文头为IB报文,采用硬件查找表方式实现IB报文和以太报文转换。第二报文的格式如表2所示,本地路由头(Local Route Header)包含于第二报文头,该Local Route Header字段包括第一网络设备和第二网络设备的LID。In the embodiment of the present application, after the gateway obtains the LID of the second network device from the lookup table according to the IP address of the second network device, it can strip the first message header of the first message, wherein the first message includes the following For the ETH header, IP header, and UDP header in Table 1, the gateway can encapsulate the first packet after stripping the first packet header into the second packet header to form the second packet. The second packet header includes The LID of the second network device, the second message header is an IB message, and the conversion between the IB message and the Ethernet message is realized by using a hardware lookup table. The format of the second message is as shown in Table 2, and the Local Route Header (Local Route Header) is included in the second message header, and the Local Route Header field includes the LIDs of the first network device and the second network device.
表2Table 2
网关在剥离第一报文头,封装第二报文头后,还可以将第一报文头中的UDP端口号和IB transport header中的QPN更新到查找表中。After the gateway strips the first message header and encapsulates the second message header, it can also update the UDP port number in the first message header and the QPN in the IB transport header to the lookup table.
该查找表的格式可以如下表3所示,Src表示来源设备,Dst表示目的设备,Src UDP port1表示第二网络设备的UDP端口号。The format of the lookup table can be shown in Table 3 below, Src represents the source device, Dst represents the destination device, and Src UDP port1 represents the UDP port number of the second network device.
表3table 3
Src IP1Src IP1 | Dst IP1Dst IP1 | Src LID1Src LID1 | Dst LID1Dst LID1 | Dst QPN1Dst QPN1 | Src QPN1Src QPN1 | Src UDP port1Src UDP port1 |
Src IP2Src IP2 | Dst IP2Dst IP2 | Src LID2Src LID2 | Dst LID2Dst LID2 | Dst QPN2Dst QPN2 | Src QPN2Src QPN2 | Src UDP port2Src UDP port2 |
Src IP3Src IP3 | Dst IP3Dst IP3 | Src LID3Src LID3 | Dst LID3Dst LID3 | Dst QPN3Dst QPN3 | Src QPN3Src QPN3 | Src UDP port3Src UDP port3 |
…… | …… | …… | …… | …… | …… | …… |
Src IPnSrc IPn | Dst IPnDst IPn | Src LIDnSrc LIDn | Dst LIDnDst LIDn | Dst QPNnDst QPNn | Src QPNnSrc QPNn | Src UDP portnSrc UDP portn |
可选的,网关在向第二网络设备发送第二报文之前,还可以更新第二报文的ICRC和可变循环冗余校验码(variant cyclic redundancy check,VCRC)。具体的,网关将第一报文转换成第二报文后,网关需要修改第二报文的ICRC和VCRC,以增加整个编码系统的码距和查错纠错能力。Optionally, before the gateway sends the second message to the second network device, the gateway may also update the ICRC and variable cyclic redundancy check code (variant cyclic redundancy check, VCRC) of the second message. Specifically, after the gateway converts the first message into the second message, the gateway needs to modify the ICRC and VCRC of the second message to increase the code distance and error detection and correction capabilities of the entire coding system.
实现以太报文转化为IB报文转换并更新报文ICRC以及VCRC同时记录UDP端口号,另外将RoCEv2的业务类型(type of service,TOS)/差分服务代码点(differentiated services code point,DSCP)映射IB报文的服务级别(service level)SL字段,实现服务质量(quality of service,QOS)信息传递。DSCP在每个数据包IP header的服务类别TOS标识字节中,利用已使用的6比特和未使用的2比特,通过编码值来区分优先级。从IB侧的16个虚拟端口中选择8个与RoCE网络的8个发送队列进行映射,以太侧DSCP字段的取值范围与IB侧SL字段映射方式如表4所示。Realize the conversion of Ethernet messages into IB messages and update the ICRC and VCRC of the messages while recording the UDP port number, and also map the RoCEv2 type of service (TOS)/differentiated services code point (DSCP) The service level (service level) SL field of the IB message realizes the quality of service (quality of service, QOS) information transmission. DSCP uses the used 6 bits and the unused 2 bits in the service class TOS identification byte of each data packet IP header to distinguish the priority by encoding the value. Select 8 of the 16 virtual ports on the IB side to map with the 8 sending queues on the RoCE network. The value range of the DSCP field on the Ethernet side and the mapping method of the SL field on the IB side are shown in Table 4.
表4Table 4
RoCEv2侧DSCP(6bits)RoCEv2 side DSCP (6bits) | IB侧SL(4bits)IB side SL(4bits) | 优先级(Priority)Priority |
0-70-7 | 00 | Priority 0Priority 0 |
8-158-15 | 11 | Priority 1Priority 1 |
16-2316-23 | 22 | Priority 2Priority 2 |
24-3124-31 | 33 | Priority 3Priority 3 |
32-3932-39 | 44 | Priority 4Priority 4 |
40-4740-47 | 55 | Priority 5Priority 5 |
48-5548-55 | 66 | Priority 6Priority 6 |
56-6356-63 | 77 | Priority 7Priority 7 |
404.网关根据第二网络设备的LID发送第二报文。404. The gateway sends the second packet according to the LID of the second network device.
本申请实施例中,网关在生成第二报文后,即可将根据第二网络设备的LID,将该第二报文发送到IB网络中的第二网络设备。In the embodiment of the present application, after the gateway generates the second message, it can send the second message to the second network device in the IB network according to the LID of the second network device.
针对超算中心互联长距互联场景,本地网关和远端网关需要协商出共同的子网管理器,即子网管理器分配的LID需要在本地网关和远端网关所在IB网络中唯一标识。本地网关将IB报文转换成以太报文,IP地址和MAC地址可以在本地网关配置。For the supercomputing center interconnection long-distance interconnection scenario, the local gateway and the remote gateway need to negotiate a common subnet manager, that is, the LID allocated by the subnet manager needs to be uniquely identified in the IB network where the local gateway and the remote gateway are located. The local gateway converts IB packets into Ethernet packets, and the IP address and MAC address can be configured on the local gateway.
本地网关接收来自远端网关的报文采用两级流控方式:The local gateway adopts a two-level flow control method for receiving packets from the remote gateway:
可选的,网关接收来自第一网络设备的第一报文可以是网关根据信用(credit)流控机制在大容量的缓冲区接收第一报文;网关通过暂停(Pause)报文向第一网络设备反馈大容量的缓冲区的状态信息,以使得第一网络设备调整报文传输。具体的,网关采用IB网络原生的credit流控机制,基于虚拟通道(virtual lane,VL)粒度采用先入先出队列(first input first output,FIFO)的大容量的缓冲区对接,采用可配置FIFO水线,同时实时监测FIFO内部存储情况,大容量的缓冲区可以接收更多的飞行报文。当远端以太侧向本地网关传输大量数据时,本地网关将FIFO状态信息通过以太流控pause报文传递到远端网关的以太端口,同时解析Pause报文结合水线设置判断对端拥塞情况对发端进行调节,确保对端高效传输且不拥塞。具体的,可以使用端口的基于优先级流量控制(priority-based flow control,PFC)功能,可以基于802.1P优先级对报文进行流量控制,将PFC中的停顿时间更改为缓冲区(buffer)的堆叠情况。Optionally, the gateway receiving the first message from the first network device may be that the gateway receives the first message in a large-capacity buffer according to a credit (credit) flow control mechanism; the gateway sends the first message to the first The network device feeds back the status information of the large-capacity buffer, so that the first network device adjusts packet transmission. Specifically, the gateway adopts the original credit flow control mechanism of the IB network, and uses first-in-first-out queue (first input first output, FIFO) large-capacity buffer docking based on virtual lane (virtual lane, VL) granularity, and adopts configurable FIFO water Line, while monitoring the internal storage of FIFO in real time, the large-capacity buffer can receive more flight messages. When the remote Ethernet side transmits a large amount of data to the local gateway, the local gateway transmits the FIFO status information to the Ethernet port of the remote gateway through the Ethernet flow control pause message, and at the same time analyzes the Pause message combined with the waterline setting to determine the congestion situation of the peer end. The originator makes adjustments to ensure efficient transmission and no congestion at the peer end. Concretely, can use the priority-based flow control (priority-based flow control, PFC) function of port, can carry out flow control to message based on 802.1P priority, change the pause time in PFC to buffer (buffer) stacking situation.
本申请实施例的技术方案通过网关将第一网络设备的第一报文解析获得第二网络设备的IP地址,在查找表匹配第二网络设备的IP地址对应的LID,将包括第二网络设备的LID的第二报文头封装到,剥离第一报文头后的第一报文中生成第二报文,然后根据第二网络设备的LID发送该第二报文,第二报文在IB网络的传输不需要经过内核拷贝,可以实现IB网络的远程直接访问效果,提高了报文传输效率。In the technical solution of the embodiment of the present application, the gateway parses the first message of the first network device to obtain the IP address of the second network device, matches the LID corresponding to the IP address of the second network device in the lookup table, and includes the second network device The second packet header of the LID is encapsulated into the first packet after stripping the first packet header to generate a second packet, and then the second packet is sent according to the LID of the second network device, and the second packet is in The transmission of the IB network does not need to be copied by the kernel, which can realize the effect of remote direct access to the IB network and improve the efficiency of message transmission.
上面讲述了网关将以太报文转换为IB报文的方法,下面讲述网关将IB报文转换成以太报文的方法,第一网络设备为IB网络的网络设备,第二网络设备为以太网络的网络设备。Described above is the method that the gateway converts the Ethernet message into an IB message, and the following describes the method that the gateway converts the IB message into an Ethernet message. The first network device is a network device of the IB network, and the second network device is an Ethernet network. Internet equipment.
请参阅图5,如图5所示为本申请实施例提供的一种报文传输方法另一实施例示意图,该方法如下所示。Please refer to FIG. 5 . FIG. 5 is a schematic diagram of another embodiment of a message transmission method provided by the embodiment of the present application, and the method is as follows.
501.第一网络设备向网关发送第三报文。501. The first network device sends the third packet to the gateway.
本申请实施例中,第三报文为IB报文,该IB报文的格式可以参照步骤403中IB报文的格式,第三报文的第三报文头包括Local Route Header,该Local Route Header包括第一网络设备的LID和第二网络设备的LID,以及第一网络设备的QPN和第二网络设备的QPN。In the embodiment of the present application, the third message is an IB message, and the format of the IB message can refer to the format of the IB message in step 403. The third message header of the third message includes a Local Route Header, and the Local Route The Header includes the LID of the first network device and the LID of the second network device, and the QPN of the first network device and the QPN of the second network device.
502.网关根据第二网络设备的LID,结合查找表确定第二网络设备的互联网协议IP地址。502. The gateway determines the Internet Protocol IP address of the second network device according to the LID of the second network device in combination with a lookup table.
步骤502根据LID查找IP地址以及查找表的更新方式可以参照步骤402中根据IP地址查找LID以及查找表的更新方式的相关描述,此处不再赘述。In step 502, the manner of searching for the IP address according to the LID and the manner of updating the lookup table can refer to the relevant description of the manner of searching for the LID according to the IP address and the manner of updating the lookup table in step 402, which will not be repeated here.
503.网关将第三报文的第三报文头剥离,并封装第四报文头,以获得第四报文。503. The gateway strips the third packet header of the third packet, and encapsulates the fourth packet header, to obtain the fourth packet.
本申请实施例中,第四报文头包括ETH header,IP header,UDP header,IB transport header,IB payload和CRC字段,网关接收到IB报文,解析IB报文内部的LRH字段的源LID和目的LID以及目的QPN,通过报文的源LID和目的LID、目的QPN匹配查找表实现以太报文的以太头IP header源IP地址、目的IP地址以及UDP头的UDP端口号,同时将IB transport header和IB payload直接封装到RoCEv2相应的位置字段,其中,网关还可以根据目的IP地址向RoCE网络发送广播报文,以使得第二网络设备反馈MAC地址,则该MAC地址为目的MAC地址,并将该目的MAC地址存储在ETH header字段。基于以太报文ETH header内MAC地址逐跳性,可以采用网关以太侧接口的MAC地址作为源MAC地址,同时更新报文的ICRC和CRC字段并封装到相应的校验字段内,至此网关实现IB侧的IB报文和以太侧以太报文相互转化。第四报文可以参照图4中第一报文的格式,此处不再赘述。In the embodiment of the present application, the fourth message header includes ETH header, IP header, UDP header, IB transport header, IB payload and CRC fields, the gateway receives the IB message, and parses the source LID and the source LID of the LRH field inside the IB message The destination LID and destination QPN, through the source LID and destination LID of the message, and the destination QPN matching lookup table, realize the source IP address, destination IP address and UDP port number of the Ethernet header IP header of the Ethernet message, and at the same time set the IB transport header and the IB payload are directly encapsulated into the corresponding location field of RoCEv2, wherein the gateway can also send a broadcast message to the RoCE network according to the destination IP address, so that the second network device can feedback the MAC address, then the MAC address is the destination MAC address, and the The destination MAC address is stored in the ETH header field. Based on the hop-by-hop nature of the MAC address in the ETH header of the Ethernet message, the MAC address of the Ethernet side interface of the gateway can be used as the source MAC address, and the ICRC and CRC fields of the message are updated at the same time and encapsulated into the corresponding check field. At this point, the gateway implements IB IB packets on the Ethernet side and Ethernet packets on the Ethernet side are converted to each other. For the fourth message, refer to the format of the first message in FIG. 4 , which will not be repeated here.
可选的,网关根据建链报文,获取第一网络设备的QPN和第二网络设备的QPN;网关根据第一网络设备的QPN和第二网络设备的QPN获取第二网络设备的UDP端口号。具体的,当第一网络设备和第二网络设备建立连接时,第一网络设备可以向网关发送建链报文,网关将建链报文转换为IB格式并发送给第二网络设备,第二网络设备则反馈建链响应报文;也可以是第二网络设备向网关发送建链报文,网关将建链报文转换为以太格式并发送给第一网络设备,第一网络设备则反馈建链响应报文。网关可以根据建链报文获取第一网络设备的QPN和第二网络设备的QPN,并根据QPN和UDP端口号的映射关系,结合第一网络设备的QPN和第二网络设备的QPN计算出第二网络设备的UDP端口号,网关可以将该第一网络设备的QPN、第二网络设备的QPN以及第二网络设备的UDP端口号的对应关系存储在查找表中,当网关为第二网络设备分配LID后,该查找表还可以包括IP地址、LID、QPN以及UDP端口号的关联关系。则网关可以根据第三报文,确定第四报文头所需的第一网络设备的MAC地址和第二网络设备的MAC地址、第一网络设备的IP地址和第二网络设备的IP地址、第一网络设备的QPN和第二网络设备的QPN,以及第二网络设备的UDP端口号。Optionally, the gateway obtains the QPN of the first network device and the QPN of the second network device according to the link establishment message; the gateway obtains the UDP port number of the second network device according to the QPN of the first network device and the QPN of the second network device . Specifically, when the first network device establishes a connection with the second network device, the first network device may send a link establishment message to the gateway, and the gateway converts the link establishment message into an IB format and sends it to the second network device, and the second The network device then feeds back a link establishment response message; it may also be that the second network device sends a link establishment message to the gateway, and the gateway converts the link establishment message into an Ethernet format and sends it to the first network device, and the first network device feeds back the establishment link message. Chain response message. The gateway can obtain the QPN of the first network device and the QPN of the second network device according to the link establishment message, and calculate the second link based on the QPN of the first network device and the QPN of the second network device according to the mapping relationship between the QPN and the UDP port number. The UDP port number of the second network device, the gateway can store the corresponding relationship between the QPN of the first network device, the QPN of the second network device, and the UDP port number of the second network device in the lookup table, when the gateway is the second network device After the LID is allocated, the lookup table may also include the association relationship among IP address, LID, QPN and UDP port number. Then the gateway can determine the MAC address of the first network device and the MAC address of the second network device, the IP address of the first network device and the IP address of the second network device required by the fourth message header according to the third message, The QPN of the first network device, the QPN of the second network device, and the UDP port number of the second network device.
504.网关根据第二网络设备的IP地址发送第四报文。504. The gateway sends the fourth packet according to the IP address of the second network device.
本申请实施例中,网关可以按照以太报文的传输方式,根据第四报文中第二网络设备的IP地址传输该第四报文。In the embodiment of the present application, the gateway may transmit the fourth message according to the IP address of the second network device in the fourth message according to the transmission mode of the Ethernet message.
可选的,网关根据第二网络设备的IP地址发送第四报文可以采用两级流控传输,网关根据信用流控机制和第二网络设备的IP地址在大容量的缓冲区发送第四报文;网关通过暂停报文向第二网络设备反馈大容量的缓冲区的状态信息,以使得第二网络设备调整报文传输。其中,该两级流控传输的相关描述可以参照步骤304中的相关描述,此处不再赘述。Optionally, the gateway sends the fourth message according to the IP address of the second network device. Two-level flow control transmission can be used. The gateway sends the fourth message in a large-capacity buffer according to the credit flow control mechanism and the IP address of the second network device. The gateway feeds back the status information of the large-capacity buffer to the second network device by suspending the message, so that the second network device adjusts message transmission. For the relevant description of the two-stage flow control transmission, reference may be made to the relevant description in step 304, which will not be repeated here.
本申请实施例的技术方案通过网关将第一网络设备的第一报文解析获得第二网络设备的LID,在查找表匹配该LID对应的IP地址,将包括第二网络设备的IP地址的第二报文头封装到,剥离第一报文头后的第一报文中生成第二报文,然后根据第二网络设备的LID发送该第二报文,第二报文在以太网络的传输不需要经过内核拷贝,可以实现以太网络的远程直接访问效果,提高了报文传输效率。In the technical solution of the embodiment of the present application, the gateway parses the first message of the first network device to obtain the LID of the second network device, matches the IP address corresponding to the LID in the lookup table, and includes the first packet of the IP address of the second network device The second message header is encapsulated, and the second message is generated in the first message after stripping the first message header, and then the second message is sent according to the LID of the second network device, and the transmission of the second message on the Ethernet network It does not need to be copied by the kernel, and can realize the remote direct access effect of the Ethernet network, which improves the efficiency of message transmission.
本申请实施例的网关的结构可以是如图6所示的网关一结构示意图,该网关包括交换芯片和处理芯片,其中,交换芯片主要实现基本的转发功能,处理芯片可以是CPU、FPGA等,负责实现不同协议报文转换所需的查找表的建立和维护以及不同协议报文转换。The structure of the gateway in the embodiment of the present application can be a schematic diagram of a gateway structure as shown in Figure 6. The gateway includes a switching chip and a processing chip, wherein the switching chip mainly implements basic forwarding functions, and the processing chip can be a CPU, FPGA, etc. Responsible for the establishment and maintenance of lookup tables required for the conversion of different protocol messages and the conversion of different protocol messages.
本申请实施例的网关的结构可以是如图7所示的网关另一结构示意图,该网关包括交换芯片,其中,该交换芯片包括接收模块,处理模块和发送模块,由接收模块和发送模块实现基本的转发功能,由处理模块负责实现不同协议报文转换所需的查找表的建立和维护以及不同协议报文转换。The structure of the gateway in the embodiment of the present application may be another schematic structural diagram of the gateway as shown in FIG. For the basic forwarding function, the processing module is responsible for the establishment and maintenance of the lookup tables required for the conversion of different protocol messages and the conversion of different protocol messages.
本申请实施例中,处理芯片的结构可以参照图8所示的处理芯片结构示意图,该处理芯片包括以太接口、封装/解封模块、缓冲模块、IB信用流控模块、IB接口、服务质量模块和管理模组。In the embodiment of the present application, the structure of the processing chip can refer to the schematic structural diagram of the processing chip shown in FIG. and management mods.
其中,以太接口用于接收以太报文或输出以太报文;Among them, the Ethernet interface is used to receive Ethernet messages or output Ethernet messages;
封装/解封装模块用于执行以太报文和IB报文的转换。The encapsulation/decapsulation module is used to perform conversion between Ethernet packets and IB packets.
缓冲模块用于存储飞行报文,并且将缓冲模块的状态信息通过RoCE网络发送给对端。The buffer module is used to store flight messages, and send the status information of the buffer module to the peer end through the RoCE network.
IB信用流控模块用于调整IB侧报文传输。The IB credit flow control module is used to adjust the message transmission on the IB side.
服务质量模块用于确定以太侧和IB侧之间传输的虚拟端口。The QoS module is used to determine the virtual port for transmission between the Ethernet side and the IB side.
管理模组用于管理LID,示例性的可以是向子网管理器申请LID,并为节点分配LID。The management module is used to manage LIDs, for example, applying for LIDs from the subnet manager and assigning LIDs to nodes.
上面讲述了报文传输方法,下面对可以执行该报文传输方法的通信装置进行描述。The message transmission method has been described above, and the communication device that can implement the message transmission method will be described below.
请参阅图9,如图9所示为本申请实施例提供的通信装置一结构示意图,该通信装置90包括:Please refer to FIG. 9, as shown in FIG. 9 is a schematic structural diagram of a communication device provided by the embodiment of the present application. The communication device 90 includes:
接收单元901,用于接收来自第一网络设备的第一报文,第一报文的第一报文头包括第二网络设备的互联网协议IP地址,第二网络设备为第一网络设备传输报文的目的网络设备;The receiving unit 901 is configured to receive a first message from the first network device, the first message header of the first message includes the Internet Protocol IP address of the second network device, and the second network device transmits the message for the first network device The destination network device of the document;
确定单元902,用于根据第二网络设备的IP地址,结合查找表确定第二网络设备的本地标识符LID,查找表包括IP地址和LID的关联关系;The determining unit 902 is configured to determine the local identifier LID of the second network device according to the IP address of the second network device in combination with a lookup table, and the lookup table includes an association relationship between the IP address and the LID;
封装单元903,用于将第一报文的第一报文头剥离,并封装第二报文头,以获得第二报文,第二报文头包括本地路由头,本地路由头包括第二网络设备的LID;The encapsulation unit 903 is configured to strip off the first packet header of the first packet and encapsulate the second packet header to obtain the second packet, the second packet header includes a local routing header, and the local routing header includes a second The LID of the network device;
发送单元904,用于根据第二网络设备的LID发送第二报文。A sending unit 904, configured to send the second packet according to the LID of the second network device.
可选的,接收单元901具体用于:Optionally, the receiving unit 901 is specifically used for:
根据信用流控机制在大容量的缓冲区接收第一报文;Receive the first message in a large-capacity buffer according to the credit flow control mechanism;
通过暂停报文向第一网络设备反馈大容量的缓冲区的状态信息,以使得第一网络设备调整报文传输。The status information of the large-capacity buffer is fed back to the first network device by suspending the message, so that the first network device adjusts message transmission.
可选的,发送单元904还用于:Optionally, the sending unit 904 is also used to:
根据路由变化向子网管理器申请LID,路由变化指示第一网络设备加入网络;Apply for a LID from the subnet manager according to the route change, and the route change instructs the first network device to join the network;
接收单元901还用于:The receiving unit 901 is also used for:
接收第一网络设备的LID;receiving the LID of the first network device;
获取单元905还用于:The obtaining unit 905 is also used for:
获取来自第二网络设备的响应报文,响应报文包括第二网络设备的LID和IP地址;Obtain a response message from the second network device, where the response message includes the LID and IP address of the second network device;
通信装置还包括更新单元906,更新单元906具体用于:The communication device further includes an updating unit 906, and the updating unit 906 is specifically used for:
根据第一网络设备的IP地址和LID,以及第二网络设备的IP地址和LID更新查找表。The lookup table is updated according to the IP address and LID of the first network device, and the IP address and LID of the second network device.
可选的,发送单元904还用于:Optionally, the sending unit 904 is also used to:
接收来自第一网络设备的地址解析协议ARP报文,ARP报文包括第一网络设备的IP地址和第二网络设备的IP地址;Receive an address resolution protocol ARP message from the first network device, where the ARP message includes the IP address of the first network device and the IP address of the second network device;
根据ARP报文向子网管理器申请第一网络设备的LID。Apply for the LID of the first network device from the subnet manager according to the ARP message.
可选的,更新单元906还用于:Optionally, the update unit 906 is also used to:
更新第二报文的不变循环冗余校验码ICRC和可变循环冗余校验码VCRC。The invariant cyclic redundancy check code ICRC and the variable cyclic redundancy check code VCRC of the second packet are updated.
可选的,第一报文为以太报文,第二报文为IB报文。Optionally, the first message is an Ethernet message, and the second message is an IB message.
可选的,以太报文包括以太网头、IP头、UDP头、IB传输头、IB有效负载、ICRC和CRC。Optionally, the Ethernet packet includes an Ethernet header, an IP header, a UDP header, an IB transmission header, an IB payload, an ICRC, and a CRC.
可选的,IB报文包括本地路由头、IB传输头、IB有效负载、ICRC和VCRC。Optionally, the IB packet includes a local routing header, an IB transport header, an IB payload, an ICRC, and a VCRC.
请参阅图10,如图10所示为本申请实施例提供的通信装置另一结构示意图,该通信装置100包括:Please refer to FIG. 10 , as shown in FIG. 10 , it is a schematic diagram of another structure of the communication device provided by the embodiment of the present application. The communication device 100 includes:
接收单元1001,用于接收来自第一网络设备的第三报文,第三报文的第三报文头包括本地路由头,本地路由头包括第二网络设备的本地标识符LID,第二网络设备为第一网络设备传输报文的目的网络设备;The receiving unit 1001 is configured to receive a third message from the first network device, the third message header of the third message includes a local routing header, and the local routing header includes a local identifier LID of the second network device, and the second network The device is a destination network device for transmitting packets by the first network device;
确定单元1002,用于根据第二网络设备的LID,结合查找表确定第二网络设备的互联网协议IP地址,查找表包括IP地址和LID的关联关系;The determining unit 1002 is configured to determine the Internet protocol IP address of the second network device according to the LID of the second network device in combination with a lookup table, and the lookup table includes an association relationship between the IP address and the LID;
封装单元1003,用于将第三报文的第三报文头剥离,并封装第四报文头,以获得第四报文,第四报文头包括第二网络设备的IP地址;The encapsulation unit 1003 is configured to strip off the third packet header of the third packet, and encapsulate the fourth packet header to obtain the fourth packet, where the fourth packet header includes the IP address of the second network device;
发送单元1004,用于根据第二网络设备的IP地址发送第四报文。A sending unit 1004, configured to send a fourth packet according to the IP address of the second network device.
可选的,通信装置100还包括获取单元1005,获取单元1005具体用于:Optionally, the communication device 100 further includes an obtaining unit 1005, and the obtaining unit 1005 is specifically used for:
根据建链报文,获取第一网络设备的QPN和第二网络设备的QPN;Obtain the QPN of the first network device and the QPN of the second network device according to the link establishment message;
根据第一网络设备的QPN和第二网络设备的QPN获取第二网络设备的用户数据报协议UDP端口号,查找表还包括QPN、UDP端口号、IP地址和LID的关联关系,第四报文头还包括第二网络设备的媒体介入控制层MAC地址和UDP端口号,第二网络设备的MAC地址为根据第二网络设备的IP地址广播获得的。Acquire the User Datagram Protocol UDP port number of the second network device according to the QPN of the first network device and the QPN of the second network device, the lookup table also includes the association relationship between QPN, UDP port number, IP address and LID, the fourth message The header also includes the MAC address of the media access control layer of the second network device and the UDP port number, and the MAC address of the second network device is obtained by broadcasting the IP address of the second network device.
可选的,发送单元1004具体用于:Optionally, the sending unit 1004 is specifically used for:
根据信用流控机制和第二网络设备的IP地址在大容量的缓冲区发送第四报文;Send the fourth message in a large-capacity buffer according to the credit flow control mechanism and the IP address of the second network device;
通过暂停报文向第二网络设备反馈大容量的缓冲区的状态信息,以使得第二网络设备调整报文传输。The status information of the large-capacity buffer is fed back to the second network device by suspending the message, so that the second network device adjusts message transmission.
可选的,发送单元1004还用于:Optionally, the sending unit 1004 is also used to:
根据路由变化向子网管理器申请LID,路由变化指示第二网络设备加入网络;Apply for a LID from the subnet manager according to the route change, and the route change instructs the second network device to join the network;
接收单元1001还用于:The receiving unit 1001 is also used for:
接收第二网络设备的LID;receiving the LID of the second network device;
获取单元1005还用于:The acquiring unit 1005 is also used for:
获取来自第一网络设备的响应报文,响应报文包括第一网络设备的LID和IP地址;Obtain a response message from the first network device, where the response message includes the LID and IP address of the first network device;
通信装置100还包括更新单元1006,更新单元1006具体用于:The communication device 100 also includes an updating unit 1006, and the updating unit 1006 is specifically used for:
根据第一网络设备的IP地址和LID,以及第二网络设备的IP地址和LID更新查找表。The lookup table is updated according to the IP address and LID of the first network device, and the IP address and LID of the second network device.
可选的,发送单元1004还用于:Optionally, the sending unit 1004 is also used to:
接收来自第二网络设备的地址解析协议ARP报文,ARP报文包括第一网络设备的IP地址和第二网络设备的IP地址;Receive an address resolution protocol ARP message from the second network device, where the ARP message includes the IP address of the first network device and the IP address of the second network device;
根据ARP报文向子网管理器申请第二网络设备的LID。Apply for the LID of the second network device from the subnet manager according to the ARP message.
可选的,更新单元1006还用于:Optionally, the updating unit 1006 is also used for:
更新第二报文的不变循环冗余码校验ICRC和循环冗余码校验CRC。The invariant cyclic redundancy check ICRC and the cyclic redundancy check CRC of the second packet are updated.
可选的,第一报文为以太报文,第二报文为IB报文。Optionally, the first message is an Ethernet message, and the second message is an IB message.
可选的,以太报文包括以太网头、IP头、UDP头、IB传输头、IB有效负载、ICRC和CRC。Optionally, the Ethernet packet includes an Ethernet header, an IP header, a UDP header, an IB transmission header, an IB payload, an ICRC, and a CRC.
可选的,IB报文包括本地路由头、IB传输头、IB有效负载、ICRC和VCRC。Optionally, the IB packet includes a local routing header, an IB transport header, an IB payload, an ICRC, and a VCRC.
图11所示,为本申请的实施例提供的通信设备110的一种可能的逻辑结构示意图。通信设备110包括:处理器1101、通信接口1102、存储系统1103以及总线1104。处理器1101、通信接口1102以及存储系统1103通过总线1104相互连接。在本申请的实施例中,处理器1101用于对通信设备110的动作进行控制管理,例如,处理器1101用于执行图4的方法实施例中网关所执行的步骤。通信接口1102用于支持通信设备110进行通信。存储系统1103,用于存储通信设备110的程序代码和数据。FIG. 11 is a schematic diagram of a possible logical structure of a communication device 110 provided by an embodiment of the present application. The communication device 110 includes: a processor 1101 , a communication interface 1102 , a storage system 1103 and a bus 1104 . The processor 1101 , the communication interface 1102 and the storage system 1103 are connected to each other through a bus 1104 . In the embodiment of the present application, the processor 1101 is used to control and manage the actions of the communication device 110, for example, the processor 1101 is used to execute the steps performed by the gateway in the method embodiment in FIG. 4 . The communication interface 1102 is used to support the communication device 110 to communicate. The storage system 1103 is configured to store program codes and data of the communication device 110 .
其中,处理器1101可以是中央处理器单元,通用处理器,数字信号处理器,专用集成电路,现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器1101也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理器和微处理器的组合等等。总线1104可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图11中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。Wherein, the processor 1101 may be a central processing unit, a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It can implement or execute the various illustrative logical blocks, modules and circuits described in connection with the present disclosure. The processor 1101 may also be a combination that implements computing functions, for example, a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and the like. The bus 1104 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus, etc. The bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 11 , but it does not mean that there is only one bus or one type of bus.
通信装置90中的接收单元901和发送单元904相当于通信设备110中的通信接口1102,通信装置90中的确定单元902、封装单元903、获取单元905和更新单元906相当于通信设备110中的处理器1101。The receiving unit 901 and the sending unit 904 in the communication device 90 are equivalent to the communication interface 1102 in the communication device 110, and the determination unit 902, encapsulation unit 903, acquisition unit 905 and update unit 906 in the communication device 90 are equivalent to the communication device 110. Processor 1101.
本实施例的通信设备110可对应于上述图4方法实施例中的网关,该通信设备110中的通信接口1102可以实现上述图4方法实施例中的网关所具有的功能和/或所实施的各种步骤,为了简洁,在此不再赘述。The communication device 110 in this embodiment may correspond to the gateway in the above-mentioned embodiment of the method in FIG. Various steps are not repeated here for the sake of brevity.
图12所示,为本申请的实施例提供的通信设备120的一种可能的逻辑结构示意图。通信设备120包括:处理器1201、通信接口1202、存储系统1203以及总线1204。处理器1201、通信接口1202以及存储系统1203通过总线1204相互连接。在本申请的实施例中,处理器1201用于对通信设备120的动作进行控制管理,例如,处理器1201用于执行图5的方法实施例中网关所执行的步骤。通信接口1202用于支持通信设备120进行通信。存储系统1203,用于存储通信设备120的程序代码和数据。FIG. 12 is a schematic diagram of a possible logical structure of a communication device 120 provided by an embodiment of the present application. The communication device 120 includes: a processor 1201 , a communication interface 1202 , a storage system 1203 and a bus 1204 . The processor 1201 , the communication interface 1202 and the storage system 1203 are connected to each other through the bus 1204 . In the embodiment of the present application, the processor 1201 is used to control and manage the actions of the communication device 120, for example, the processor 1201 is used to execute the steps performed by the gateway in the method embodiment in FIG. 5 . The communication interface 1202 is used to support the communication device 120 to communicate. The storage system 1203 is used for storing program codes and data of the communication device 120 .
其中,处理器1201可以是中央处理器单元,通用处理器,数字信号处理器,专用集成电路,现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器1201也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理器和微处理器的组合等等。总线1204可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图12中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。Wherein, the processor 1201 may be a central processing unit, a general processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It can implement or execute the various illustrative logical blocks, modules and circuits described in connection with the present disclosure. The processor 1201 may also be a combination that implements computing functions, for example, a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and the like. The bus 1204 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus, etc. The bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 12 , but it does not mean that there is only one bus or one type of bus.
通信装置100中的接收单元1001和发送单元1004相当于通信设备120中的通信接口1202,通信装置100中的确定单元1002、封装单元1003、获取单元1005和更新单元1006相当于通信设备120中的处理器1201。The receiving unit 1001 and the sending unit 1004 in the communication device 100 are equivalent to the communication interface 1202 in the communication device 120, and the determination unit 1002, encapsulation unit 1003, acquisition unit 1005 and update unit 1006 in the communication device 100 are equivalent to the communication interface 1202 in the communication device 120. Processor 1201.
本实施例的通信设备120可对应于上述图5方法实施例中的网关,该通信设备120中的通信接口1202可以实现上述图5方法实施例中的网关所具有的功能和/或所实施的各种步骤,为了简洁,在此不再赘述。The communication device 120 in this embodiment may correspond to the gateway in the above-mentioned method embodiment in FIG. Various steps are not repeated here for the sake of brevity.
在本申请的另一实施例中,还提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机执行指令,当设备的处理器执行该计算机执行指令时,设备执行上述图4方法实施例中的网关备所执行的报文传输方法的步骤。In another embodiment of the present application, a computer-readable storage medium is also provided, in which computer-executable instructions are stored, and when the processor of the device executes the computer-executable instructions, the device executes the above-mentioned method in Figure 4 Steps of the message transmission method executed by the gateway device in the embodiment.
在本申请的另一实施例中,还提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机执行指令,当设备的处理器执行该计算机执行指令时,设备执行上述图5方法实施例中的网关所执行的报文传输方法的步骤。In another embodiment of the present application, a computer-readable storage medium is also provided. Computer-readable storage medium stores computer-executable instructions. When the processor of the device executes the computer-executable instructions, the device executes the above-mentioned method in FIG. 5 Steps of the message transmission method performed by the gateway in the embodiment.
在本申请的另一实施例中,还提供一种计算机程序产品,该计算机程序产品包括计算机执行指令,该计算机执行指令存储在计算机可读存储介质中;当设备的处理器执行该计 算机执行指令时,设备执行上述图4方法实施例中的网关所执行的报文传输方法的步骤。In another embodiment of the present application, a computer program product is also provided, the computer program product includes computer-executable instructions stored in a computer-readable storage medium; when the processor of the device executes the computer-executable instructions , the device executes the steps of the packet transmission method executed by the gateway in the method embodiment in FIG. 4 above.
在本申请的另一实施例中,还提供一种计算机程序产品,该计算机程序产品包括计算机执行指令,该计算机执行指令存储在计算机可读存储介质中;当设备的处理器执行该计算机执行指令时,设备执行上述图5方法实施例中的网关所执行的报文传输方法的步骤。In another embodiment of the present application, a computer program product is also provided, the computer program product includes computer-executable instructions stored in a computer-readable storage medium; when the processor of the device executes the computer-executable instructions , the device executes the steps of the packet transmission method executed by the gateway in the method embodiment in FIG. 5 above.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device and method can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,read-only memory)、随机存取存储器(RAM,random access memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, read-only memory), random access memory (RAM, random access memory), magnetic disk or optical disc, etc., which can store program codes. .
Claims (38)
- 一种报文传输方法,其特征在于,包括:A message transmission method, characterized in that, comprising:网关接收来自第一网络设备的第一报文,所述第一报文的第一报文头包括第二网络设备的互联网协议IP地址,所述第二网络设备为第一网络设备传输报文的目的网络设备;The gateway receives the first message from the first network device, the first message header of the first message includes the Internet Protocol IP address of the second network device, and the second network device transmits the message for the first network device the destination network device;所述网关根据所述第二网络设备的IP地址,结合查找表确定所述第二网络设备的本地标识符LID,所述查找表包括IP地址和LID的关联关系;The gateway determines the local identifier LID of the second network device according to the IP address of the second network device in combination with a lookup table, and the lookup table includes an association relationship between the IP address and the LID;所述网关将第一报文的第一报文头剥离,并封装第二报文头,以获得第二报文,所述第二报文头包括本地路由头,所述本地路由头包括所述第二网络设备的LID;The gateway strips the first packet header of the first packet and encapsulates the second packet header to obtain the second packet, the second packet header includes a local routing header, and the local routing header includes the the LID of the second network device;所述网关根据所述第二网络设备的LID发送所述第二报文。The gateway sends the second packet according to the LID of the second network device.
- 根据权利要求1所述的报文传输方法,其特征在于,所述网关接收来自第一网络设备的第一报文包括:The message transmission method according to claim 1, wherein the gateway receiving the first message from the first network device comprises:所述网关根据信用流控机制在大容量的缓冲区接收所述第一报文;The gateway receives the first message in a large-capacity buffer according to a credit flow control mechanism;所述网关通过暂停报文向所述第一网络设备反馈所述大容量的缓冲区的状态信息,以使得所述第一网络设备调整报文传输。The gateway feeds back the status information of the large-capacity buffer to the first network device by suspending the message, so that the first network device adjusts message transmission.
- 根据权利要求1-2任一项所述的报文传输方法,其特征在于,所述网关接收来自第一网络设备的第一报文之前,所述方法还包括:The message transmission method according to any one of claims 1-2, wherein before the gateway receives the first message from the first network device, the method further comprises:所述网关根据路由变化向子网管理器申请LID,所述路由变化指示所述第一网络设备加入网络;The gateway applies for a LID from the subnet manager according to a route change, and the route change instructs the first network device to join the network;所述网关接收所述第一网络设备的LID;the gateway receives the LID of the first network device;所述网关获取来自所述第二网络设备的响应报文,所述响应报文包括第二网络设备的LID和IP地址;The gateway acquires a response message from the second network device, where the response message includes the LID and IP address of the second network device;所述网关根据所述第一网络设备的IP地址和LID,以及所述第二网络设备的IP地址和LID更新所述查找表。The gateway updates the lookup table according to the IP address and LID of the first network device and the IP address and LID of the second network device.
- 根据权利要求3所述的报文传输方法,其特征在于,所述网关根据路由变化向子网管理器申请LID包括:The message transmission method according to claim 3, wherein the gateway applying for a LID from the subnet manager according to the routing change comprises:所述网关接收来自所述第一网络设备的地址解析协议ARP报文,所述ARP报文包括所述第一网络设备的IP地址和所述第二网络设备的IP地址;The gateway receives an Address Resolution Protocol ARP message from the first network device, where the ARP message includes the IP address of the first network device and the IP address of the second network device;所述网关根据所述ARP报文向子网管理器申请所述第一网络设备的LID。The gateway applies for the LID of the first network device from the subnet manager according to the ARP message.
- 根据权利要求1-4任一项所述的报文传输方法,其特征在于,所述网关将第一报文的第一报文头剥离,并封装第二报文头,以获得第二报文之后,所述方法还包括:The message transmission method according to any one of claims 1-4, wherein the gateway strips the first message header of the first message and encapsulates the second message header to obtain the second message header. After the text, the method also includes:所述网关更新所述第二报文的不变循环冗余校验码ICRC和可变循环冗余校验码VCRC。The gateway updates the invariant cyclic redundancy check code ICRC and the variable cyclic redundancy check code VCRC of the second message.
- 根据权利要求1-5任一项所述的报文传输方法,其特征在于,所述第一报文为以太报文,所述第二报文为IB报文。The message transmission method according to any one of claims 1-5, characterized in that, the first message is an Ethernet message, and the second message is an IB message.
- 根据权利要求6所述的报文传输方法,其特征在于,所述以太报文包括以太网头、IP头、UDP头、IB传输头、IB有效负载、ICRC和循环冗余校验码CRC。The message transmission method according to claim 6, wherein the Ethernet message includes an Ethernet header, an IP header, a UDP header, an IB transmission header, an IB payload, an ICRC, and a cyclic redundancy check code (CRC).
- 根据权利要求6所述的报文传输方法,其特征在于,所述IB报文包括所述本地路由头、IB传输头、IB有效负载、ICRC和VCRC。The message transmission method according to claim 6, wherein the IB message includes the local routing header, the IB transmission header, the IB payload, ICRC and VCRC.
- 一种报文传输方法,其特征在于,包括:A message transmission method, characterized in that, comprising:网关接收来自第一网络设备的第三报文,所述第三报文的第三报文头包括本地路由头,所述本地路由头包括第二网络设备的本地标识符LID,所述第二网络设备为第一网络设备传输报文的目的网络设备;The gateway receives the third message from the first network device, the third message header of the third message includes a local routing header, and the local routing header includes a local identifier LID of the second network device, and the second The network device is the destination network device for transmitting the message by the first network device;所述网关根据所述第二网络设备的LID,结合查找表确定所述第二网络设备的互联网协议IP地址,所述查找表包括IP地址和LID的关联关系;The gateway determines the Internet Protocol IP address of the second network device according to the LID of the second network device in combination with a lookup table, and the lookup table includes an association relationship between the IP address and the LID;所述网关将第三报文的第三报文头剥离,并封装第四报文头,以获得第四报文,所述第四报文头包括所述第二网络设备的IP地址;The gateway strips the third packet header of the third packet, and encapsulates the fourth packet header to obtain the fourth packet, and the fourth packet header includes the IP address of the second network device;所述网关根据所述第二网络设备的IP地址发送所述第四报文。The gateway sends the fourth packet according to the IP address of the second network device.
- 根据权利要求9所述的报文传输方法,其特征在于,所述方法还包括:The message transmission method according to claim 9, wherein the method further comprises:所述网关根据建链报文,获取所述第一网络设备的队列对序号QPN和所述第二网络设备的QPN;The gateway obtains the queue pair sequence number QPN of the first network device and the QPN of the second network device according to the link establishment message;所述网关根据所述第一网络设备的QPN和所述第二网络设备的QPN获取所述第二网络设备的用户数据报协议UDP端口号,所述查找表还包括QPN、UDP端口号、IP地址和LID的关联关系,所述第四报文头还包括所述第二网络设备的媒体介入控制层MAC地址和UDP端口号,所述第二网络设备的MAC地址为根据所述第二网络设备的IP地址广播获得的。The gateway acquires the User Datagram Protocol UDP port number of the second network device according to the QPN of the first network device and the QPN of the second network device, and the lookup table further includes QPN, UDP port number, IP The association relationship between the address and the LID, the fourth packet header also includes the MAC address of the media access control layer and the UDP port number of the second network device, and the MAC address of the second network device is based on the second network The IP address of the device is obtained by broadcasting.
- 根据权利要求9或10所述的报文传输方法,其特征在于,所述网关根据所述第二网络设备的IP地址发送所述第四报文包括:The message transmission method according to claim 9 or 10, wherein the gateway sending the fourth message according to the IP address of the second network device includes:所述网关根据信用流控机制和所述第二网络设备的IP地址在大容量的缓冲区发送所述第四报文;The gateway sends the fourth message in a large-capacity buffer according to the credit flow control mechanism and the IP address of the second network device;所述网关通过暂停报文向所述第二网络设备反馈所述大容量的缓冲区的状态信息,以使得所述第二网络设备调整报文传输。The gateway feeds back the state information of the large-capacity buffer to the second network device by suspending the message, so that the second network device adjusts message transmission.
- 根据权利要求9-11任一项所述的报文传输方法,其特征在于,所述网关接收来自第一网络设备的第三报文之前,所述方法还包括:The message transmission method according to any one of claims 9-11, wherein before the gateway receives the third message from the first network device, the method further comprises:所述网关根据路由变化向子网管理器申请LID,所述路由变化指示所述第二网络设备加入网络;The gateway applies for a LID from the subnet manager according to a route change, and the route change instructs the second network device to join the network;所述网关接收所述第二网络设备的LID;the gateway receives the LID of the second network device;所述网关获取来自所述第一网络设备的响应报文,所述响应报文包括第一网络设备的LID和IP地址;The gateway acquires a response message from the first network device, where the response message includes the LID and IP address of the first network device;所述网关根据所述第一网络设备的IP地址和LID,以及所述第二网络设备的IP地址和LID更新所述查找表。The gateway updates the lookup table according to the IP address and LID of the first network device and the IP address and LID of the second network device.
- 根据权利要求12所述的报文传输方法,其特征在于,所述网关根据路由变化向子网管理器申请LID包括:The message transmission method according to claim 12, wherein the gateway applying for the LID from the subnet manager according to the route change comprises:所述网关接收来自所述第二网络设备的地址解析协议ARP报文,所述ARP报文包括所述第一网络设备的IP地址和所述第二网络设备的IP地址;The gateway receives an Address Resolution Protocol ARP message from the second network device, where the ARP message includes the IP address of the first network device and the IP address of the second network device;所述网关根据所述ARP报文向子网管理器申请所述第二网络设备的LID。The gateway applies for the LID of the second network device from the subnet manager according to the ARP message.
- 根据权利要求9-13任一项所述的报文传输方法,其特征在于,所述网关将第一报 文的第一报文头剥离,并封装第二报文头,以获得第二报文之后,所述方法还包括:The message transmission method according to any one of claims 9-13, wherein the gateway strips the first message header of the first message and encapsulates the second message header to obtain the second message header. After the text, the method also includes:所述网关更新所述第二报文的不变循环冗余码校验ICRC和循环冗余码校验CRC。The gateway updates the invariant cyclic redundancy check ICRC and the cyclic redundancy check CRC of the second message.
- 根据权利要求9-14任一项所述的报文传输方法,其特征在于,所述第一报文为以太报文,所述第二报文为IB报文。The message transmission method according to any one of claims 9-14, wherein the first message is an Ethernet message, and the second message is an IB message.
- 根据权利要求15所述的报文传输方法,其特征在于,所述以太报文包括以太网头、IP头、UDP头、IB传输头、IB有效负载、ICRC和CRC。The message transmission method according to claim 15, wherein the Ethernet message includes an Ethernet header, an IP header, a UDP header, an IB transmission header, an IB payload, an ICRC and a CRC.
- 根据权利要求15所述的报文传输方法,其特征在于,所述IB报文包括所述本地路由头、IB传输头、IB有效负载、ICRC和可变循环冗余校验码VCRC。The message transmission method according to claim 15, wherein the IB message includes the local routing header, the IB transmission header, the IB payload, ICRC and a variable cyclic redundancy check code VCRC.
- 一种通信装置,其特征在于,包括:A communication device, characterized by comprising:接收单元,用于接收来自第一网络设备的第一报文,所述第一报文的第一报文头包括第二网络设备的互联网协议IP地址,所述第二网络设备为第一网络设备传输报文的目的网络设备;A receiving unit, configured to receive a first message from a first network device, the first message header of the first message includes an Internet Protocol IP address of a second network device, and the second network device is the first network device The destination network device for the device to transmit the message;确定单元,用于根据所述第二网络设备的IP地址,结合查找表确定所述第二网络设备的本地标识符LID,所述查找表包括IP地址和LID的关联关系;A determining unit, configured to determine the local identifier LID of the second network device according to the IP address of the second network device in combination with a lookup table, the lookup table including an association relationship between the IP address and the LID;封装单元,用于将第一报文的第一报文头剥离,并封装第二报文头,以获得第二报文,所述第二报文头包括本地路由头,所述本地路由头包括所述第二网络设备的LID;An encapsulation unit, configured to strip off the first packet header of the first packet, and encapsulate the second packet header to obtain the second packet, the second packet header includes a local routing header, and the local routing header including the LID of the second network device;发送单元,用于根据所述第二网络设备的LID发送所述第二报文。A sending unit, configured to send the second packet according to the LID of the second network device.
- 根据权利要求18所述的通信装置,其特征在于,所述接收单元具体用于:The communication device according to claim 18, wherein the receiving unit is specifically used for:根据信用流控机制在大容量的缓冲区接收所述第一报文;receiving the first message in a large-capacity buffer according to a credit flow control mechanism;通过暂停报文向所述第一网络设备反馈所述大容量的缓冲区的状态信息,以使得所述第一网络设备调整报文传输。The status information of the large-capacity buffer is fed back to the first network device by suspending the message, so that the first network device adjusts message transmission.
- 根据权利要求18-19任一项所述的通信装置,其特征在于,所述发送单元还用于:The communication device according to any one of claims 18-19, wherein the sending unit is further configured to:根据路由变化向子网管理器申请LID,所述路由变化指示所述第一网络设备加入网络;Applying for a LID from the subnet manager according to a route change, the route change instructing the first network device to join the network;所述接收单元还用于:The receiving unit is also used for:接收所述第一网络设备的LID;receiving the LID of the first network device;所述获取单元还用于:The acquisition unit is also used for:获取来自所述第二网络设备的响应报文,所述响应报文包括第二网络设备的LID和IP地址;Obtain a response message from the second network device, where the response message includes the LID and IP address of the second network device;所述通信装置还包括更新单元,所述更新单元具体用于:The communication device also includes an updating unit, and the updating unit is specifically used for:根据所述第一网络设备的IP地址和LID,以及所述第二网络设备的IP地址和LID更新所述查找表。updating the lookup table according to the IP address and LID of the first network device and the IP address and LID of the second network device.
- 根据权利要求20所述的通信装置,其特征在于,所述发送单元还用于:The communication device according to claim 20, wherein the sending unit is further used for:接收来自所述第一网络设备的地址解析协议ARP报文,所述ARP报文包括所述第一网络设备的IP地址和所述第二网络设备的IP地址;receiving an Address Resolution Protocol ARP message from the first network device, where the ARP message includes the IP address of the first network device and the IP address of the second network device;根据所述ARP报文向子网管理器申请所述第一网络设备的LID。Applying for the LID of the first network device from the subnet manager according to the ARP message.
- 根据权利要求18-21任一项所述的通信装置,其特征在于,所述更新单元还用于:The communication device according to any one of claims 18-21, wherein the updating unit is further configured to:更新所述第二报文的不变循环冗余校验码ICRC和可变循环冗余校验码VCRC。Updating the invariant cyclic redundancy check code ICRC and the variable cyclic redundancy check code VCRC of the second packet.
- 根据权利要求18-22任一项所述的通信装置,其特征在于,所述第一报文为以太报文,所述第二报文为IB报文。The communication device according to any one of claims 18-22, wherein the first message is an Ethernet message, and the second message is an IB message.
- 根据权利要求23所述的通信装置,其特征在于,所述以太报文包括以太网头、IP头、UDP头、IB传输头、IB有效负载、ICRC和循环冗余校验码CRC。The communication device according to claim 23, wherein the Ethernet message includes an Ethernet header, an IP header, a UDP header, an IB transmission header, an IB payload, an ICRC, and a cyclic redundancy check code (CRC).
- 根据权利要求23所述的通信装置,其特征在于,所述IB报文包括所述本地路由头、IB传输头、IB有效负载、ICRC和VCRC。The communication device according to claim 23, wherein the IB message includes the local routing header, IB transmission header, IB payload, ICRC and VCRC.
- 一种通信装置,其特征在于,包括:A communication device, characterized by comprising:接收单元,用于接收来自第一网络设备的第三报文,所述第三报文的第三报文头包括本地路由头,所述本地路由头包括第二网络设备的本地标识符LID,所述第二网络设备为第一网络设备传输报文的目的网络设备;a receiving unit, configured to receive a third message from the first network device, the third message header of the third message includes a local routing header, and the local routing header includes a local identifier LID of the second network device, The second network device is a destination network device for transmitting packets by the first network device;确定单元,用于根据所述第二网络设备的LID,结合查找表确定所述第二网络设备的互联网协议IP地址,所述查找表包括IP地址和LID的关联关系;A determining unit, configured to determine the Internet Protocol IP address of the second network device according to the LID of the second network device in combination with a lookup table, the lookup table including an association relationship between the IP address and the LID;封装单元,用于将第三报文的第三报文头剥离,并封装第四报文头,以获得第四报文,所述第四报文头包括所述第二网络设备的IP地址;An encapsulation unit, configured to strip off the third packet header of the third packet, and encapsulate the fourth packet header to obtain the fourth packet, the fourth packet header including the IP address of the second network device ;发送单元,用于根据所述第二网络设备的IP地址发送所述第四报文。a sending unit, configured to send the fourth packet according to the IP address of the second network device.
- 根据权利要求26所述的通信装置,其特征在于,所述通信装置还包括获取单元,所述获取单元具体用于:The communication device according to claim 26, wherein the communication device further comprises an acquisition unit, and the acquisition unit is specifically used for:根据建链报文,获取所述第一网络设备的队列对序号QPN和所述第二网络设备的QPN;Obtain the queue pair sequence number QPN of the first network device and the QPN of the second network device according to the link establishment message;根据所述第一网络设备的QPN和所述第二网络设备的QPN获取所述第二网络设备的用户数据报协议UDP端口号,所述查找表还包括QPN、UDP端口号、IP地址和LID的关联关系,所述第四报文头还包括所述第二网络设备的媒体介入控制层MAC地址和UDP端口号,所述第二网络设备的MAC地址为根据所述第二网络设备的IP地址广播获得的。Obtain the User Datagram Protocol UDP port number of the second network device according to the QPN of the first network device and the QPN of the second network device, and the lookup table also includes QPN, UDP port number, IP address and LID The fourth packet header also includes the MAC address of the media access control layer and the UDP port number of the second network device, and the MAC address of the second network device is based on the IP address of the second network device. The address is obtained by broadcasting.
- 根据权利要求26或27所述的通信装置,其特征在于,所述发送单元具体用于:The communication device according to claim 26 or 27, wherein the sending unit is specifically used for:根据信用流控机制和所述第二网络设备的IP地址在大容量的缓冲区发送所述第四报文;sending the fourth message in a large-capacity buffer according to the credit flow control mechanism and the IP address of the second network device;通过暂停报文向所述第二网络设备反馈所述大容量的缓冲区的状态信息,以使得所述第二网络设备调整报文传输。The status information of the large-capacity buffer is fed back to the second network device by suspending the message, so that the second network device adjusts message transmission.
- 根据权利要求26-28任一项所述的通信装置,其特征在于,所述发送单元还用于:The communication device according to any one of claims 26-28, wherein the sending unit is further configured to:根据路由变化向子网管理器申请LID,所述路由变化指示所述第二网络设备加入网络;Applying for a LID from the subnet manager according to a route change, the route change instructing the second network device to join the network;所述接收单元还用于:The receiving unit is also used for:接收所述第二网络设备的LID;receiving the LID of the second network device;所述获取单元还用于:The acquisition unit is also used for:获取来自所述第一网络设备的响应报文,所述响应报文包括第一网络设备的LID和IP地址;Obtain a response message from the first network device, where the response message includes the LID and IP address of the first network device;所述通信装置还包括更新单元,所述更新单元具体用于:The communication device also includes an updating unit, and the updating unit is specifically used for:根据所述第一网络设备的IP地址和LID,以及所述第二网络设备的IP地址和LID更新所述查找表。updating the lookup table according to the IP address and LID of the first network device and the IP address and LID of the second network device.
- 根据权利要求29所述的通信装置,其特征在于,所述发送单元还用于:The communication device according to claim 29, wherein the sending unit is further used for:接收来自所述第二网络设备的地址解析协议ARP报文,所述ARP报文包括所述第一网络设备的IP地址和所述第二网络设备的IP地址;receiving an Address Resolution Protocol ARP message from the second network device, where the ARP message includes the IP address of the first network device and the IP address of the second network device;根据所述ARP报文向子网管理器申请所述第二网络设备的LID。Applying for the LID of the second network device from the subnet manager according to the ARP message.
- 根据权利要求26-30任一项所述的通信装置,其特征在于,所述更新单元还用于:The communication device according to any one of claims 26-30, wherein the update unit is further configured to:更新所述第二报文的不变循环冗余码校验ICRC和循环冗余码校验CRC。Updating the invariant cyclic redundancy check ICRC and the cyclic redundancy check CRC of the second packet.
- 根据权利要求26-31任一项所述的通信装置,其特征在于,所述第一报文为以太报文,所述第二报文为IB报文。The communication device according to any one of claims 26-31, wherein the first message is an Ethernet message, and the second message is an IB message.
- 根据权利要求32所述的通信装置,其特征在于,所述以太报文包括以太网头、IP头、UDP头、IB传输头、IB有效负载、ICRC和CRC。The communication device according to claim 32, wherein the Ethernet packet includes an Ethernet header, an IP header, a UDP header, an IB transmission header, an IB payload, an ICRC and a CRC.
- 根据权利要求32所述的通信装置,其特征在于,所述IB报文包括所述本地路由头、IB传输头、IB有效负载、ICRC和可变循环冗余校验码VCRC。The communication device according to claim 32, wherein the IB message includes the local routing header, IB transmission header, IB payload, ICRC and variable cyclic redundancy check code VCRC.
- 一种通信设备,其特征在于,包括:处理器以及存储器,A communication device, characterized in that it includes: a processor and a memory,所述处理器用于执行所述存储器中存储的指令,使得所述通信设备执行权利要求1至8中任一项所述的方法。The processor is configured to execute instructions stored in the memory, so that the communication device executes the method according to any one of claims 1-8.
- 一种通信设备,其特征在于,包括:处理器以及存储器,A communication device, characterized in that it includes: a processor and a memory,所述处理器用于执行所述存储器中存储的指令,使得所述通信设备执行权利要求9至17中任一项所述的方法。The processor is configured to execute instructions stored in the memory, so that the communication device executes the method according to any one of claims 9 to 17.
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,当所述计算机程序在所述计算机上运行时,使得所述计算机执行如权利要求1至17中任一项所述的方法。A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, and when the computer program is run on the computer, the computer executes the any one of the methods described.
- 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上执行时,所述计算机执行如权利要求1至17中任一项所述的方法。A computer program product, characterized in that, when the computer program product is executed on a computer, the computer executes the method according to any one of claims 1 to 17.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110872533.9A CN115701063A (en) | 2021-07-30 | 2021-07-30 | Message transmission method and communication device |
CN202110872533.9 | 2021-07-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023005723A1 true WO2023005723A1 (en) | 2023-02-02 |
Family
ID=85086271
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/106368 WO2023005723A1 (en) | 2021-07-30 | 2022-07-19 | Packet transmission method and communication apparatus |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115701063A (en) |
WO (1) | WO2023005723A1 (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090141727A1 (en) * | 2007-11-30 | 2009-06-04 | Brown Aaron C | Method and System for Infiniband Over Ethernet by Mapping an Ethernet Media Access Control (MAC) Address to an Infiniband Local Identifier (LID) |
US20090141734A1 (en) * | 2007-12-04 | 2009-06-04 | Brown Aaron C | Method and system for a converged infiniband over ethernet network |
CN103368959A (en) * | 2013-07-05 | 2013-10-23 | 华为技术有限公司 | Method and device for conversion between RapidIO message and InfiniBand message |
US20140226659A1 (en) * | 2013-02-13 | 2014-08-14 | Red Hat Israel, Ltd. | Systems and Methods for Ethernet Frame Translation to Internet Protocol over Infiniband |
US20190327345A1 (en) * | 2018-04-23 | 2019-10-24 | Tianjin Chip Sea Innovation Technology Co. Ltd. | Method and apparatus for forwarding heterogeneous protocol message and network switching device |
-
2021
- 2021-07-30 CN CN202110872533.9A patent/CN115701063A/en active Pending
-
2022
- 2022-07-19 WO PCT/CN2022/106368 patent/WO2023005723A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090141727A1 (en) * | 2007-11-30 | 2009-06-04 | Brown Aaron C | Method and System for Infiniband Over Ethernet by Mapping an Ethernet Media Access Control (MAC) Address to an Infiniband Local Identifier (LID) |
US20090141734A1 (en) * | 2007-12-04 | 2009-06-04 | Brown Aaron C | Method and system for a converged infiniband over ethernet network |
US20140226659A1 (en) * | 2013-02-13 | 2014-08-14 | Red Hat Israel, Ltd. | Systems and Methods for Ethernet Frame Translation to Internet Protocol over Infiniband |
CN103368959A (en) * | 2013-07-05 | 2013-10-23 | 华为技术有限公司 | Method and device for conversion between RapidIO message and InfiniBand message |
US20190327345A1 (en) * | 2018-04-23 | 2019-10-24 | Tianjin Chip Sea Innovation Technology Co. Ltd. | Method and apparatus for forwarding heterogeneous protocol message and network switching device |
Also Published As
Publication number | Publication date |
---|---|
CN115701063A (en) | 2023-02-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10778464B2 (en) | NSH encapsulation for traffic steering establishing a tunnel between virtual extensible local area network (VxLAN) tunnel end points (VTEPS) using a NSH encapsulation header comprising a VxLAN header whose VNI field has been replaced by an NSH shim | |
CN113326228B (en) | Message forwarding method, device and equipment based on remote direct data storage | |
US8913613B2 (en) | Method and system for classification and management of inter-blade network traffic in a blade server | |
US8284785B2 (en) | System and method for direct communications between FCoE devices | |
US8634415B2 (en) | Method and system for routing network traffic for a blade server | |
US8649387B2 (en) | Method and system for fibre channel and ethernet interworking | |
US11750699B2 (en) | Small message aggregation | |
US10057162B1 (en) | Extending Virtual Routing and Forwarding at edge of VRF-aware network | |
WO2016191990A1 (en) | Packet conversion method and device | |
WO2019134383A1 (en) | Method for controlling network congestion, access device, and computer readable storage medium | |
CN105791214B (en) | Method and equipment for converting RapidIO message and Ethernet message | |
US10616105B1 (en) | Extending virtual routing and forwarding using source identifiers | |
US20120163374A1 (en) | Methods and apparatus for providing unique mac address to individual node for fibre channel over ethernet (fcoe) traffic | |
US12132663B2 (en) | Technologies for protocol-agnostic network packet segmentation | |
US10547547B1 (en) | Uniform route distribution for a forwarding table | |
US10819640B1 (en) | Congestion avoidance in multipath routed flows using virtual output queue statistics | |
TWI721103B (en) | Cluster accurate speed limiting method and device | |
CN113228571B (en) | Method and apparatus for network optimization for accessing cloud services from a premise network | |
US20120163392A1 (en) | Packet processing apparatus and method | |
US10541842B2 (en) | Methods and apparatus for enhancing virtual switch capabilities in a direct-access configured network interface card | |
WO2019062252A1 (en) | Method, apparatus, and storage medium for determining quality of service | |
WO2024217179A1 (en) | Cloud gateway, method for network adapter, and network adapter | |
WO2023005723A1 (en) | Packet transmission method and communication apparatus | |
US10805436B2 (en) | Deliver an ingress packet to a queue at a gateway device | |
WO2024098757A1 (en) | Network cluster system, message transmission method, and network device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22848328 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22848328 Country of ref document: EP Kind code of ref document: A1 |