CN114745255A - Hardware chip, DPU, server, communication method and related device - Google Patents

Hardware chip, DPU, server, communication method and related device Download PDF

Info

Publication number
CN114745255A
CN114745255A CN202210381313.0A CN202210381313A CN114745255A CN 114745255 A CN114745255 A CN 114745255A CN 202210381313 A CN202210381313 A CN 202210381313A CN 114745255 A CN114745255 A CN 114745255A
Authority
CN
China
Prior art keywords
port
bond
message
rep
forwarding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210381313.0A
Other languages
Chinese (zh)
Other versions
CN114745255B (en
Inventor
李吉
孙路遥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xingyun Zhilian Technology Co ltd
Original Assignee
Shenzhen Xingyun Zhilian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Xingyun Zhilian Technology Co ltd filed Critical Shenzhen Xingyun Zhilian Technology Co ltd
Priority to CN202210381313.0A priority Critical patent/CN114745255B/en
Publication of CN114745255A publication Critical patent/CN114745255A/en
Application granted granted Critical
Publication of CN114745255B publication Critical patent/CN114745255B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/22Alternate routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/54Organization of routing tables
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Abstract

The embodiment of the application discloses a hardware chip, a DPU, a server, a communication method and a related device, wherein the hardware chip comprises a first port and a second port, the first port is a first proxy port of an actual physical port of the hardware chip on a data plane development kit, and user mode message receiving and sending of the actual physical port are completed through the first port; the second port is a second proxy port of the data plane development suite, wherein the first port is used as an aggregation port of member ports; the second agent port is a virtual agent port on the data plane development kit side in actual physics, and is used for receiving and sending messages in a user mode according to the second agent port. By adopting the embodiment of the application, the flexibility of the dual-homing network access application can be improved.

Description

Hardware chip, DPU, server, communication method and related device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a hardware chip, a DPU, a server, a communication method, and a related apparatus.
Background
In network planning and design, in order to ensure reliability and continuity of services, various redundancy designs, such as link redundancy, node redundancy, and the like, need to be considered, and furthermore, a server usually adopts dual-homing access to a network. Under the background, a network equipment stacking technology is generated, but the stacking technology has the problems of control surface normalization, difficulty in smooth upgrading, incapability of realizing heterogeneous structures of multiple manufacturers, high operation and maintenance difficulty and the like. Therefore, how to improve the flexibility of the dual-homed network access application is urgently needed to be solved.
Disclosure of Invention
The embodiment of the application provides a hardware chip, a DPU, a server, a communication method and a related device, which can improve the flexibility of dual-homing network access application.
In a first aspect, embodiments of the present application provide a hardware chip including a first port and a second port, wherein,
the first port is a first proxy port of the actual physical port of the hardware chip on the data plane development kit, and user mode message receiving and sending of the actual physical port are completed through the first port;
the second port is a second proxy port of the data plane development suite, wherein the first port is used as an aggregation port of member ports;
the second agent port is a virtual agent port on the data plane development kit side in actual physics, and is used for receiving and sending messages in a user mode according to the second agent port.
In a second aspect, an embodiment of the present application provides a server, which includes a hardware chip as described in the first aspect.
In a third aspect, an embodiment of the present application provides a communication method, which is applied to an electronic device including a hardware chip as described in the first aspect or the second aspect, where the method includes:
receiving a message through an eth port;
searching and forwarding and miss uploading are carried out through the hardware chip according to the received message, and an input port is modified into a bond port according to the LAG Table;
packaging the received message information expansion header of the bond port, and realizing uploading to a designated function;
analyzing the received message information expansion header through the specified function to judge whether the received message is a rep received message;
if the received message is an LACP message, the received message is processed through LACP negotiation, the received message is sent to the hardware chip, and a flow table entry with an inlet end being a bond port is set for directly looking up and forwarding a subsequent message.
In a fourth aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, and the program includes instructions for executing the steps in the third aspect of the embodiment of the present application.
In a fifth aspect, the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program for electronic data exchange, where the computer program makes a computer perform part or all of the steps as described in the third aspect of the present application.
In a sixth aspect, embodiments of the present application provide a computer program product, where the computer program product includes a non-transitory computer-readable storage medium storing a computer program, where the computer program is operable to cause a computer to perform some or all of the steps as described in the third aspect of embodiments of the present application. The computer program product may be a software installation package.
The embodiment of the application has the following beneficial effects:
it can be seen that, in the hardware chip, the DPU, the server, the communication method, and the related apparatus described in the embodiment of the present application, the hardware chip includes a first port and a second port, where the first port is a first proxy port of an actual physical port of the hardware chip on a data plane development kit, and user mode packet transmission and reception of the actual physical port is completed through the first port; the second port is a second proxy port of the first port, which is used as an aggregation port of the member port and is arranged on the data plane development suite; the second agent port is a virtual agent port on the data plane development kit side in actual physics, and is used for receiving and sending messages in a user mode according to the second agent port, and the data plane is unloaded by realizing a receiving and sending mechanism of a user mode bond rep and combining a flow table, so that the separation of the data plane and the control plane is realized, and the method has better flexibility and forwarding performance.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a networking diagram of a data center with a server-side network card "single access" according to an embodiment of the present application;
fig. 2 is a networking diagram of a server-side network card "dual-homing" access data center provided in the embodiment of the present application;
fig. 3 is a schematic structural diagram of a communication system according to an embodiment of the present application;
FIG. 4 is a schematic flow chart of bond rep creation provided in the embodiment of the present application;
FIG. 5 is a schematic flow chart of bond rep initialization according to an embodiment of the present disclosure;
FIG. 6 is a schematic structural diagram of a bond reppmd driver functional L0 structure according to an embodiment of the present application;
FIG. 7 is a schematic structural diagram of a transfer control separated data plane according to an embodiment of the present disclosure;
fig. 8 is a timing diagram of a packet forwarding and separating process provided in an embodiment of the present application;
fig. 9 is a timing chart of a transfer control separation packet receiving process according to an embodiment of the present application;
FIG. 10 is a timing diagram illustrating software and hardware interactions provided by an embodiment of the present application;
fig. 11 is a flowchart illustrating a communication method according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The electronic device described in the embodiment of the present application may include a smart Phone (e.g., an Android Phone, an iOS Phone, a Windows Phone, etc.), a tablet computer, a palm computer, a car data recorder, a notebook computer, a Mobile Internet device (MID, Mobile Internet Devices) or a wearable device (e.g., a smart watch, a bluetooth headset), a Data Processing Unit (DPU) constructed by taking data as a center, and the like, which are merely examples, and are not exhaustive, and include but not limited to the foregoing electronic device, and the electronic device may further include a server, for example, a cloud server. The server may include a hardware chip, which may include a network card, for example, which may include: a Field-Programmable Gate Array (FPGA), for example, when the hardware chip is a network card, may be inserted into the server, and for example, the network card may include a DPU network card.
In the embodiment of the present application, the DPU may be a dedicated processor configured by taking data as a center, and may support infrastructure layer resource virtualization by using a software defined technology route, so as to support infrastructure layer services such as storage, security, quality of service management, and the like.
In the embodiment of the application, a user mode (user mode) refers to two similar concepts in a computer structure. In the design of the CPU, the user state refers to a non-privileged state. In this state, the executed code is limited by hardware, and certain operations cannot be performed, such as writing into a memory space of other processes, so as to prevent a potential safety hazard from being brought to an operating system. In the design of operating systems, user states are also similar, referring to unprivileged execution states. The kernel prohibits code in this state from potentially dangerous operations, such as writing system configuration files, killing other users' processes, restarting the system, and the like.
In the embodiment of the present application, the flow table may be regarded as an abstraction of the data forwarding function of the network device by OpenFlow. In a conventional network device, data forwarding of a switch and a router needs to depend on a two-layer MAC address forwarding table or a three-layer IP address routing table stored in the device, and the same applies to a flow table used in an OpenFlow switch, but network configuration information of each layer in a network is integrated in an entry of the flow table, so that richer rules can be used when data forwarding is performed.
In the embodiment of the present application, the OVS may include: a link aggregation proxy port, or, alternatively, an Open VSwitch.
The following describes embodiments of the present application in detail.
In the related art, in order to solve the defects of the stacking technology, the MLAG technology is provided in a scene with a high requirement on networking reliability, but the technical scheme of the MLAG also has the problems that multiple manufacturers cannot be heterogeneous, the configuration is synchronous, a peer link TOR port resource needs to be additionally provided, and the like. To better address the above issues, more and more internet companies are currently adopting a "de-stacking" dual-homing scheme that addresses almost all of the deficiencies in virtualized stacking and MLAG. In order to enable the server network card to meet the requirement of 'dual homing' access to the data center networking, the following solution is mainly adopted:
the first scheme is as follows: providing an add bond mechanism based on OVS to create a bond port, wherein a member port is an actual physical port, and realizing multiple ARP messages by configuring an ARP message flow table, thereby realizing the table entry synchronization on independent TOR;
and the second scheme is based on a bond mechanism provided by the DPDK, and multiple ARP messages are realized by modifying a bond tx burst function, so that the table entry synchronization on the independent TOR is realized.
In the related art, as shown in fig. 1, it can only satisfy a scenario of "single access" TOR of a server network card, where stacking and MLAG are equivalent to "single access", and further, technologies for implementing link aggregation bond on the server side are roughly classified into an "add-bond" mechanism provided by OVS and a bond mechanism provided by a Data Plane Development Kit (DPDK); the two mechanisms (schemes) only provide the capability of single-homing access networking of the server and the TOR, and cannot independently realize the networking requirement of double-homing access of the network card at the server side to the data center.
Furthermore, the two schemes have the following defects:
aiming at the scheme one: the OVS is independently transmitted, needs to depend on a kernel data path (data path), cannot realize transfer control separation, and has poor transmission performance and flexibility;
aiming at the scheme II: the bond mechanism realized by the DPDK mechanism is established depending on eal parameters during starting, is relatively complicated to establish and modify, is relatively poor in flexibility depending on an application layer to issue commands; and the transfer control separation can not be realized, and the transfer performance is poor.
In view of the above drawbacks, please refer to fig. 2, where fig. 2 is a networking diagram of a server-side network card dual-homing access data center provided in the present application, a server accesses two different access switches respectively through an interface 0 and an interface 1, and the access switches access a cloud infrastructure (Spine-Leaf Fabric) through an Equal-Cost Multi-path Routing (ECMP).
Optionally, the hardware chip may include a first port and a second port, wherein,
the first port is a first proxy port of an actual physical port of the hardware chip on a data plane development kit, and user mode message receiving and sending of the actual physical port are completed through the first port;
the second port is a second proxy port of the aggregation port of the first port as a member port on the data plane development kit;
the second agent port is a virtual agent port on the data plane development kit side in actual physics, and is used for receiving and sending messages in a user mode according to the second agent port.
For example, the hardware chip may include a server or a chip in the server, where the server may include a first port and a second port, the first port may be an eth port (eth proxy port), and the second port may be a bond rep port (bond proxy port). For example, interface 0 is an eth rep port, which may also be referred to as an eth port; the interface 1 is a bond rep port, which may also be referred to as a bond port. The two interfaces can be accessed into the corresponding access switch, and then the access switch is accessed into the cloud framework. The communication modes corresponding to the interface 0 and the interface 1 can be realized by link aggregation.
Further, as shown in fig. 3, an embodiment of the present application provides a forwarding framework based on OVS + DPDK, and provides a technical solution for implementing "dual homing" of a network card of a server DPU to access a data center network by user mode bond rep port + handover control separation. The scheme provides the concepts of a bond rep port and an eth rep port, wherein the eth rep is a proxy (responder) port of an actual physical port of a network card on a DPDK, and the user mode message receiving and sending of the actual physical port are completed through the eth rep port. The bond rep is an eth rep as a proxy port of the aggregation port of the member port on the DPDK. The port is mainly used for realizing message receiving and sending of the user mode aggregation port so as to reduce hardware dependence. The presenter port is a virtual agent port on the DPDK side in the actual physical, and performs message transmission and reception according to the presenter port in the user mode.
According to the embodiment of the application, the data plane and the control plane (encapsulation of load (data plane) and transmission of load (control plane)) are separated by realizing a receiving and sending mechanism of the user mode bond rep and combining the flow table to carry out data plane unloading, so that better flexibility and forwarding performance are achieved.
Optionally, the hardware chip includes: a message matching action processing module and the data plane development kit; the message matching action processing module comprises a matching search engine and an action processing engine, the action processing engine comprises a data packet descriptor processor and a bond forwarding processing module, the data packet descriptor processor is connected with the bond forwarding processing module, and the data plane development kit comprises a bond rep user state driver;
the bond forwarding processing module is configured to maintain an LAG table, and the LAG table is issued to the hardware chip after the second port is successfully created;
the bond forwarding processing module is further configured to perform hash routing and member port information mapping according to the LAG table of the bond port after the data plane is unloaded, where the corresponding output port of the bond forwarding processing module is the traffic of the bond port;
the bond forwarding processing module is also used for matching the received message; and when the MAC address of the target terminal is the MAC message corresponding to the bond port, packaging the corresponding message expansion header, and sending the message expansion header to the stream forwarding engine for searching and forwarding.
In a specific implementation, as shown in fig. 3, the hardware chip includes: the device comprises a message matching action processing module (match lookup engine) and a data plane development kit (OVS _ DPDK), wherein the message matching action processing module comprises a matching search engine and an action processing engine (action engine), the action processing engine comprises a packet descriptor processor (packet descriptor processor) and a bond forwarding processing module (bond), the packet descriptor processor is connected with the bond forwarding processing module and can also be connected with a Receiving Side Scaling (RSS) module, the data plane development kit comprises a bond rep user state driver, and the data plane development kit further comprises the following modules: control plane (offset) dpi (Control plane), Hardware offload agent (Hardware offload agent), and Data plane (OVS _ dp _ update Data plane).
The bond forwarding processing module may implement packet description, for example, first implement bond LAG hash operation (bond LAG hash), and then find port mapping (Lookup port Map), where the bond forwarding processing module part can implement the following functions:
1. maintaining LAG Table, wherein the Table entry is issued to the chip after the creation of the bond rep is successful by software, and the Table includes but is not limited to the following information: bond port and member port information, member port link information, bond mode \ mac \ ip \ xmit policy and other key information;
2. after the data plane is unloaded, performing hash routing and member port information mapping according to a bond port LAG Table entry aiming at the flow of which the output port is a bond port;
3. for the received message, performing message matching, packaging a corresponding message expansion header aiming at the message of which the MAC address (DMAC) of the target terminal is a bond port MAC, and sending the message expansion header into a flow forwarding engine to look up a table for forwarding; the extension header information includes but is not limited to: port information, queue information, message parsing information, device information, and the like; wherein, the information in the expansion header can also carry out software and hardware interaction by modifying descriptors or self-defining messages and other forms. Wherein the device information may include at least one of: device name, device model, IP address, MAC address, user name, etc., without limitation.
Optionally, the bond rep user state driver is used to implement creation, deletion and attribute configuration of a bond rep port;
the bond rep user state driver supports LACP protocol and is used for realizing LAG dynamic negotiation;
the bond rep user state driver is used for realizing multiple member ports aiming at the ARP message and externally issuing routing information;
and the bond rep user state driver is used for supporting physical link detection and physical link and rep state synchronization, and refreshing a forwarding table entry corresponding to the bond rep according to the rep state change.
In a specific implementation, as shown in fig. 3, the bond rep module may be configured to support the following functions: rtt, link aggregation control protocol (LACP protocol), Address Resolution Protocol (ARP), connection slave link check (bond slave link check), and bond rep user state driver (bond rep pmd driver). The update of the port mapping table (update port map) is realized between the bond forwarding processing module and the bond rep module in an up call (update) mode.
In the specific implementation, a bond repp pmd driver of the network card on the DPDK side needs to be implemented, and the following functions need to be implemented in this part:
1. creating and deleting a bond rep port, and configuring attributes;
2. the method supports LACP protocol negotiation and is used for the LAG dynamic negotiation of TOR access networking of an external 'unstacking' scene;
3. multiple member ports are realized aiming at ARP messages and externally issued routing and other information, and the method is used for realizing the synchronization of table item information between TORs in the 'de-stacking' scene;
4. detecting a physical link, synchronizing the physical link with a rep state, and refreshing a forwarding table item corresponding to a bond rep according to the rep state change;
5. issuing chip LAG Table and maintaining Table items.
Further, the bond rep user mode driver comprises an external function interface layer, a self-defined scheme functional layer and a user mode driver basic functional layer;
the external function interface layer is used for creating a bond port and setting the attribute of the bond port;
the user-defined scheme function layer is used for realizing at least one of the following functions: ARP protocol message processing and double-sending reply, physical link monitoring mechanism realization, LACP protocol service function realization, and PFGA table entry issuing and maintenance;
the user state driving basic function layer is used for realizing at least one of the following functions: sending and receiving the bond, driving registration information by the network card and driving basic configuration parameters by the user mode.
In a specific implementation, the attribute setting may refer to the following settings:
"slave ═ ifc > - - - - -member port id, currently only two ids 2048 and 2049 are supported, representing eth rep;
primary ═ ifc > - - - - -custom primary member port; defaulting the first joining member port as a main member port;
a mode of [0-6] "-bond port mode, which is currently only supported to be configured as mode-4;
"xmit _ policy ═ l2| l23| l34]" - - - - - "bond port selection member port hash policy configuration; recommended configurations are l 23;
"agg _ mode ═ count | stable | band width ]" - - - - -bond port aggregation id selection mode; a default stable;
the mac setting of a "bond _ mac ═ mac addr > - - - - - - - - - - -, is used for the lacp to negotiate the message system _ id;
"lsc _ poll _ period _ ms ═ int >" - — link lsc polling detection period;
"up _ delay >" - - -link down- > up state change delay time;
"down _ delay > - - - -link up-down state change delay time.
In a specific implementation, as shown in fig. 4, the creating of the bond rep port may include the following steps: creating parameter analysis: rte _ kvargs _ parse; checking parameters; if so, creating a bond rte _ eth _ dev by the bond _ alloc; setting a bond rte _ eth _ dev private data parameter; a bond LACP timing mechanism callback is realized; setting a bond port TX RX callback function according to a mode parameter; and implementing and hooking an eth _ dev _ ops callback function. If not, the operation is directly ended.
In a specific implementation, as shown in fig. 5, the bond rep initialization complete process may include the following steps: bond _ create; bond dev _ configure — call dev _ configure callback; a band slave add completes slave port configuration-slave _ configure; a band slave link detection mechanism callback is set, and the function needs to be refreshed synchronously with an FPGA table entry; bond tx rx queue setup; bond dev start; a bond slave active; and issuing the FPGA table entry.
Optionally, the bond rep user state driver includes an external function interface layer, a user-defined scheme functional layer, and a user state driver basic functional layer;
the external function interface layer is used for creating a bond port and setting the attribute of the bond port;
the user-defined scheme function layer is used for realizing at least one of the following functions: ARP protocol message processing and double-sending reply, physical link monitoring mechanism realization, LACP protocol service function realization, and PFGA table entry issuing and maintenance;
the user state driving basic function layer is used for realizing at least one of the following functions: sending and receiving the bond, driving registration information by the network card and driving basic configuration parameters by the user mode.
In specific implementation, please refer to fig. 6, where fig. 6 is a function L0 structural body driven by the bond rep user state, an external function interface layer is used for creating a bond port and setting attributes of the bond port, and a self-defined scheme functional layer is used for implementing at least one of the following functions: ARP protocol message processing and double-sending reply, physical link monitoring mechanism realization, LACP protocol service function realization, and PFGA table entry issuing and maintenance; the user state driving basic function layer is used for realizing at least one of the following functions: and sending and receiving the bond, network card drive registration information (such as eth _ dev _ ops) and user mode drive basic configuration parameters (pmd drive basic configuration).
Optionally, the bond forwarding processing module is configured to be configured between a data channel stream forwarding engine and an ethernet MAC; calculating a hash value for the sending flow, and selecting an output port according to an LAG (LAG markup language) Table; and for the received flow, setting an input port as a bond0 port to input the flow forwarding engine to look up a flow table according to the condition that the DMAC is the MAC of the bond0 port.
In a specific implementation, in a data forwarding plane: the logical bond forwarding plane is between the data channel stream forwarding engine and the Ethernet MAC; computing hash for Tx flow, and selecting an output port according to an LAG (LAG markup language) Table; for Rx traffic, the ingress port is set to the bond0 ingress flow forwarding engine look-up flow table, depending on DMAC being the MAC for bond 0.
Optionally, the bond rep user mode driver is responsible for processing the LACP protocol monitoring link state and updating the chip LAG Table through the bond driver, and is also responsible for extracting the extension header and updating the ingress port to the memory cache for the packet sent by the miss on the stream forwarding engine; and for the message sent to the bond0 port, the device is responsible for adding an expansion header with an output port being the bond0 port and sending the expansion header to a logic data channel.
In the specific implementation, in the protocol control plane, a bond driver of the bond control plane on the OVS _ DPDK is responsible for processing the LACP protocol monitoring link state and updating the chip LAG Table, and for a message sent by the miss on the stream forwarding engine, is responsible for extracting an extension header and updating an ingress port to a memory cache (mbuf); and for the message sent to the bond0, the extension header with the output port being the bond0 is added to be sent to the logic data channel.
Further, as shown in fig. 7, fig. 7 is a diagram of a transfer control separated data plane structure, wherein PPE is a flow forwarding processing engine in a chip; for example, the hardware chip may include a DPU, and send a packet to a bond0 port through the DPU, perform LAG Hash calculation, and select an output port according to a LAG Table.
The following description will be given by taking a hardware chip as an FPGA, i.e., a chip/FPGA.
For example, as shown in fig. 8, fig. 8 is a timing diagram of a transfer control separation packet sending process, in a specific implementation, a user side (user) may send a packet, and send the packet with OVS soft forwarding, an output port is a packet sending of a bond rep, after a user state drive of the bond rep receives the packet, member port replication is performed on service packets such as ARP and a routing protocol, and the packet is sent in full, a corresponding member port is selected according to mode and policy to send the packet to a member port packet (slots port tx), packet sending expansion header encapsulation is performed from a sending port, and a PF packet sending function is multiplexed to send the packet to a chip/FPGA; after numerical control separation, sending a data message, and inserting a chip/FPGA under the message sent by the application; the chip selects an eth port to send out according to the hash of the flow Table and the LAG Table; and returning a sending result.
For another example, as shown in fig. 9, fig. 9 is a timing diagram of transfer control separation packet receiving processing, in a specific implementation, a user side (user) receives and sends a packet through an eth port, after a chip/FPGA receives the packet, Table lookup and forwarding are performed according to the received packet, a miss is sent up, and port information is modified into a bond port according to an LAG Table; packaging a bond port to receive a message information expansion header, sending the message information expansion header to an analysis function (a PF rx patch, a Pcie PF packet receiving hook function) on a PF channel, analyzing the expansion header information by the PF rx patch, and judging whether the message is a bond received message; sending the message into a bond rep user state driver (bond rep pmd/bond rep sw _ ring) after message stripping expansion; the bond rep receives the packet according to the mode and hash policy member port; if the message is an LACP message, entering rx _ ring of a corresponding slave port, and remaining in an LACP cb function for timing processing for LACP negotiation; and sending the corresponding message to the chip/FPGA, wherein the input port is a flow table entry of the bond port and is used for directly looking up the table and forwarding the subsequent message.
To further illustrate, as shown in fig. 10, fig. 10 is a sequence diagram of software and hardware interaction, where a user side (user) creates a bond rep; the bond rep pmd establishes a configuration through the bond rep and starts the flow to be successfully executed; issuing the LAG Table corresponding to the bond rep to the chip/FPGA; generating by a chip/FPGA hardware table entry, and returning a creation result to the bond rep; when the bond rep needs to be deleted/set, deleting the bond port and releasing/refreshing the table entry by the resource; deleting/refreshing the corresponding hardware table entry; and refreshing the table entry according to the port state change of the bond rep member.
The embodiment of the application provides a technical scheme for realizing the networking of a server DPU network card through double-homing access data center based on user mode bond rep port + transfer control separation, and the scheme can well solve the problem of the networking of the server DPU network card through double-homing access data center by realizing the separation of a user mode bond rep pmd driver and the transfer control. On the basis of solving the problems, compared with the related technology, the scheme can provide better forwarding performance, scheme flexibility and service stability.
It can be seen that, in the hardware chip described in the embodiment of the present application, the hardware chip includes a first port and a second port, where the first port is a first proxy port of an actual physical port of the hardware chip on the data plane development kit, and user mode packet transmission and reception of the actual physical port is completed through the first port; the second port is a second proxy port of the aggregation port of the first port as a member port on the data plane development suite; the second agent port is a virtual agent port on the data plane development kit side in actual physics, and is used for receiving and transmitting messages in a user mode according to the second agent port, and the data plane is separated from the control plane by realizing a receiving and transmitting mechanism of the user mode bond rep and combining a flow table to carry out data plane unloading, so that the method has better flexibility and forwarding performance.
Referring to fig. 11, fig. 11 is a schematic flowchart of a communication method provided in the embodiment of the present application, and is applied to the hardware chip or the server, as shown in the figure, the communication method includes:
101. and receiving the message through an eth port.
102. And searching and forwarding and miss uploading the received message through the hardware chip, and modifying the input port into a bond port according to the LAG Table.
103. And packaging the received message information expansion header of the bond port, and realizing uploading to a designated function.
104. And analyzing the received message information expansion header through the specified function to judge whether the received message is a rep received message.
105. If the received message is an LACP message, the received message is processed through LACP negotiation, the received message is sent to the hardware chip, and a flow table entry with an inlet end being a bond port is set for directly looking up and forwarding a subsequent message.
The above-mentioned specified function may be set by the user or default by the system, for example, the specified function may include PF rx patch.
In the specific implementation, a user side (user) sends a message through an eth port, after a chip/FPGA receives the message, the message is looked up and forwarded according to the received message, miss is sent upwards, and the information of the ingress port is modified into a bond port according to LAG Table; packaging a message information expansion head received by a bond port, sending the message information expansion head to an analysis function (PF rx patch, Pcei PF packet receiving hook function) through a PF channel, analyzing the expansion head information by the PF rx patch, and judging whether the message is a bond rep received message; and sending the message to a bond rep user state drive (bond rep pmd/bond rep sw _ ring) after message stripping expansion; the bond rep receives the packet according to the mode and hash policy member port; if the message is an LACP message, entering rx _ ring of a corresponding slave port, and remaining in an LACP cb function for timing processing for LACP negotiation; and sending the corresponding message to the chip/FPGA, wherein the input port is a flow table entry of the bond port and is used for directly looking up the table and forwarding the subsequent message.
It can be seen that, in the communication method described in the embodiment of the present application, a message is received through an eth port; searching for forwarding and miss uploading according to the received message through a hardware chip, and modifying an input port into a bond port according to the LAG Table; packaging the received message information expansion header of the bond port, and realizing uploading to a designated function; analyzing the received message information expansion header through a specified function to judge whether the received message is a rep received message; if the received message is an LACP message, the received message is processed through LACP negotiation, the received message is sent to a hardware chip, a flow table item with an inlet end being a bond port is set for direct table look-up and forwarding of the subsequent message, and the data plane and the control plane are separated by realizing a receiving and sending mechanism of the user mode bond rep and combining the flow table to carry out data plane unloading, so that the method has better flexibility and forwarding performance.
Referring to fig. 12, fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, where as shown, the electronic device includes a processor, a memory, a communication interface, and one or more programs, the one or more programs are stored in the memory and configured to be executed by the processor, and in an embodiment of the present disclosure, the programs include instructions for performing the following steps:
receiving a message through an eth port;
searching and forwarding and miss uploading are carried out through the hardware chip according to the received message, and an input port is modified into a bond port according to the LAG Table;
packaging the received message information expansion header of the bond port, and realizing uploading to a designated function;
analyzing the received message information expansion header through the specified function to judge whether the received message is a rep received message;
and if so, stripping and expanding the received message to send the message to a bond rep user mode driver, receiving the packet through the bond rep user mode driver, if the received message is an LACP message, processing the received message through LACP negotiation, sending the received message to the hardware chip, and setting a flow table entry with an inlet end as a bond port for directly looking up and forwarding a subsequent message.
It can be seen that the electronic device described in the embodiment of the present application receives a message through an eth port; searching and forwarding and miss uploading are carried out through a hardware chip according to the received message, and an input port is modified into a bond port according to the LAG Table; packaging the received message information expansion header of the bond port, and realizing uploading to a designated function; analyzing the received message information expansion header through a specified function to judge whether the received message is a rep received message; if the received message is an LACP message, the received message is processed through LACP negotiation, the received message is sent to a hardware chip, a flow table item with an inlet end being a bond port is set for direct table look-up and forwarding of the subsequent message, and the data plane and the control plane are separated by realizing a receiving and sending mechanism of the user mode bond rep and combining the flow table to carry out data plane unloading, so that the method has better flexibility and forwarding performance.
Embodiments of the present application also provide a computer storage medium, where the computer storage medium stores a computer program for electronic data exchange, the computer program enabling a computer to execute part or all of the steps of any one of the methods described in the above method embodiments, and the computer includes an electronic device.
Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any of the methods as described in the above method embodiments. The computer program product may be a software installation package, the computer comprising an electronic device.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some interfaces, indirect coupling or communication connection between devices or units, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit may be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device) to execute all or part of the steps of the above-mentioned method of the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A hardware chip comprising a first port and a second port, wherein,
the first port is a first proxy port of an actual physical port of the hardware chip on a data plane development kit, and user mode message receiving and sending of the actual physical port are completed through the first port;
the second port is a second proxy port of the data plane development suite, wherein the first port is used as an aggregation port of member ports;
the second agent port is a virtual agent port on the data plane development kit side in actual physics, and is used for receiving and sending messages in a user mode according to the second agent port.
2. The hardware chip of claim 1, wherein the hardware chip comprises: a message matching action processing module and the data plane development kit; the message matching action processing module comprises a matching search engine and an action processing engine, the action processing engine comprises a data packet descriptor processor and a bond forwarding processing module, the data packet descriptor processor is connected with the bond forwarding processing module, and the data plane development kit comprises a bond rep user state driver;
the bond forwarding processing module is configured to maintain an LAG table, and the LAG table is issued to the hardware chip after the second port is successfully created;
the bond forwarding processing module is further configured to perform hash routing and member port information mapping according to the LAG table of the bond port after the data plane is unloaded, where the corresponding output port of the bond forwarding processing module is the traffic of the bond port;
the bond forwarding processing module is also used for matching the received message; and when the MAC address of the target terminal is the MAC message corresponding to the bond port, packaging the corresponding message expansion header, and sending the message expansion header to the stream forwarding engine for searching and forwarding.
3. The hardware chip of claim 2, wherein the bond rep user state driver is configured to implement creation, deletion and attribute configuration of a bond rep port;
the bond rep user mode driver supports LACP protocol and is used for realizing LAG dynamic negotiation;
the bond rep user state driver is used for realizing multiple member ports aiming at the ARP message and externally issuing routing information;
and the bond rep user state driver is used for supporting physical link detection and physical link and rep state synchronization, and refreshing a forwarding table entry corresponding to the bond rep according to the rep state change.
4. The hardware chip of claim 3, wherein the bond rep user state driver comprises an external function interface layer, a custom scheme function layer, a user state driver basic function layer;
the external function interface layer is used for creating a bond port and setting the attribute of the bond port;
the user-defined scheme function layer is used for realizing at least one of the following functions: ARP protocol message processing and double-sending reply, a physical link monitoring mechanism, an LACP protocol service function, and PFGA table entry issuing and maintaining;
the user state driving basic function layer is used for realizing at least one of the following functions: sending and receiving the bond, driving registration information by the network card and driving basic configuration parameters by the user mode.
5. The hardware chip of claim 2, wherein the bond forwarding processing module is configured to interface between a data channel stream forwarding engine and an ethernet MAC; calculating a hash value for the sending flow, and selecting an output port according to an LAG (LAG markup language) Table; and for the received flow, setting the input port as a bond0 port to send the flow forwarding engine to look up the flow table according to the condition that the DMAC is the MAC of the bond0 port.
6. The hardware chip of claim 5, wherein the bond rep user state driver is responsible for processing the link state monitored by the LACP protocol and updating the LAG Table through the bond driver, and is further responsible for extracting the extension header and updating the ingress port to the memory cache for the packet sent by the miss on the stream forwarding engine; and for the message sent to the bond0 port, the message is responsible for adding an expansion header with an egress port being the bond0 port, and sending the expansion header to a logical data channel.
7. A server, characterized in that it comprises a hardware chip according to any one of claims 1 to 7.
8. A communication method, applied to an electronic device comprising a hardware chip according to any one of claims 1 to 7, the method comprising:
receiving a message through an eth port;
searching and forwarding and miss uploading are carried out through the hardware chip according to the received message, and an input port is modified into a bond port according to the LAG Table;
packaging the received message information expansion header of the bond port, and realizing uploading to a designated function;
analyzing the received message information expansion header through the specified function to judge whether the received message is a rep received message;
if the received message is an LACP message, the received message is processed through LACP negotiation, the received message is sent to the hardware chip, and a flow table entry with an inlet end being a bond port is set for directly looking up and forwarding a subsequent message.
9. An electronic device comprising a processor, a memory for storing one or more programs and configured to be executed by the processor, the programs comprising instructions for performing the steps in the method of claim 8, the electronic device comprising a DPU.
10. A computer-readable storage medium, in which a computer program for electronic data exchange is stored, wherein the computer program causes a computer to carry out the method according to claim 8.
CN202210381313.0A 2022-04-12 2022-04-12 Hardware chip, DPU, server, communication method and related device Active CN114745255B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210381313.0A CN114745255B (en) 2022-04-12 2022-04-12 Hardware chip, DPU, server, communication method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210381313.0A CN114745255B (en) 2022-04-12 2022-04-12 Hardware chip, DPU, server, communication method and related device

Publications (2)

Publication Number Publication Date
CN114745255A true CN114745255A (en) 2022-07-12
CN114745255B CN114745255B (en) 2023-11-10

Family

ID=82280749

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210381313.0A Active CN114745255B (en) 2022-04-12 2022-04-12 Hardware chip, DPU, server, communication method and related device

Country Status (1)

Country Link
CN (1) CN114745255B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115114219A (en) * 2022-07-22 2022-09-27 深圳星云智联科技有限公司 PCI-E topological method, device, equipment and storage medium
CN117692382A (en) * 2024-02-04 2024-03-12 珠海星云智联科技有限公司 Link aggregation method, network card, equipment and medium
WO2024061179A1 (en) * 2022-09-24 2024-03-28 华为技术有限公司 Logic bonding port management method, apparatus and system, and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103401797A (en) * 2013-07-24 2013-11-20 杭州华三通信技术有限公司 Message processing method and equipment
CN105991439A (en) * 2015-02-06 2016-10-05 杭州华三通信技术有限公司 Management method and device of data center server (DC server)
CN106533943A (en) * 2016-12-06 2017-03-22 中国电子科技集团公司第三十二研究所 Method for realizing microcode and flow table based on network switching chip
CN106961363A (en) * 2017-03-29 2017-07-18 云络动力(北京)科技有限公司 A kind of method and system for capturing virtual switch User space data plane data message
CN107947977A (en) * 2017-11-21 2018-04-20 北京邮电大学 A kind of collocation method of interchanger, device, electronic equipment and storage medium
CN109861839A (en) * 2017-11-30 2019-06-07 华为技术有限公司 The unbroken virtual switch upgrade method of business and relevant device
US20200007405A1 (en) * 2018-06-29 2020-01-02 Juniper Networks, Inc. Monitoring and policy control of distributed data and control planes for virtual nodes
CN110875844A (en) * 2018-08-30 2020-03-10 丛林网络公司 Multiple virtual network interface support for virtual execution elements
CN112671869A (en) * 2020-12-15 2021-04-16 北京天融信网络安全技术有限公司 Network bridge transparent proxy method, device, electronic equipment and storage medium
CN113821310A (en) * 2021-11-19 2021-12-21 阿里云计算有限公司 Data processing method, programmable network card device, physical server and storage medium
CN113923158A (en) * 2020-07-07 2022-01-11 华为技术有限公司 Message forwarding, routing sending and receiving method and device

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103401797A (en) * 2013-07-24 2013-11-20 杭州华三通信技术有限公司 Message processing method and equipment
CN105991439A (en) * 2015-02-06 2016-10-05 杭州华三通信技术有限公司 Management method and device of data center server (DC server)
CN106533943A (en) * 2016-12-06 2017-03-22 中国电子科技集团公司第三十二研究所 Method for realizing microcode and flow table based on network switching chip
CN106961363A (en) * 2017-03-29 2017-07-18 云络动力(北京)科技有限公司 A kind of method and system for capturing virtual switch User space data plane data message
CN107947977A (en) * 2017-11-21 2018-04-20 北京邮电大学 A kind of collocation method of interchanger, device, electronic equipment and storage medium
CN109861839A (en) * 2017-11-30 2019-06-07 华为技术有限公司 The unbroken virtual switch upgrade method of business and relevant device
US20200007405A1 (en) * 2018-06-29 2020-01-02 Juniper Networks, Inc. Monitoring and policy control of distributed data and control planes for virtual nodes
CN110875844A (en) * 2018-08-30 2020-03-10 丛林网络公司 Multiple virtual network interface support for virtual execution elements
CN113923158A (en) * 2020-07-07 2022-01-11 华为技术有限公司 Message forwarding, routing sending and receiving method and device
CN112671869A (en) * 2020-12-15 2021-04-16 北京天融信网络安全技术有限公司 Network bridge transparent proxy method, device, electronic equipment and storage medium
CN113821310A (en) * 2021-11-19 2021-12-21 阿里云计算有限公司 Data processing method, programmable network card device, physical server and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115114219A (en) * 2022-07-22 2022-09-27 深圳星云智联科技有限公司 PCI-E topological method, device, equipment and storage medium
CN115114219B (en) * 2022-07-22 2023-10-20 深圳星云智联科技有限公司 PCI-E topology method, device, equipment and storage medium
WO2024061179A1 (en) * 2022-09-24 2024-03-28 华为技术有限公司 Logic bonding port management method, apparatus and system, and storage medium
CN117692382A (en) * 2024-02-04 2024-03-12 珠海星云智联科技有限公司 Link aggregation method, network card, equipment and medium

Also Published As

Publication number Publication date
CN114745255B (en) 2023-11-10

Similar Documents

Publication Publication Date Title
CN114745255B (en) Hardware chip, DPU, server, communication method and related device
CN106789526B (en) method and device for connecting multiple system networks
WO2018120798A1 (en) Vxlan packet processing method, device and system
US9692697B2 (en) Control channel establishing method, forwarding point, and controller
EP2747381B1 (en) Method, network device and system for implementing network card offloading function
WO2017173142A1 (en) Interworking between physical network and virtual network
CN107809807A (en) The communication means and device of large-size screen monitors and mobile terminal are controlled in a kind of Android
EP4142219A1 (en) Message processing method and apparatus, announcement method and apparatus, network bridge node, source device, storage medium, and message processing system
CN103401773A (en) Method and network equipment realizing interboard communication
EP4047888A1 (en) Method for issuing oam configuration information and control node
CN115208888B (en) Communication method and device for cloud instance to cross available areas and electronic equipment
CN110868278B (en) Method for dual-computer redundancy of communication front-end processor of rail transit comprehensive monitoring system
WO2017008641A1 (en) Method of switching redundancy port and device utilizing same
US20230370899A1 (en) Packet forwarding method, packet processing method, and device
KR20220089708A (en) Implementation of multipath communication
CN112383472A (en) Network transmission method, device, storage medium and electronic equipment
CN113965521B (en) Data packet transmission method, server and storage medium
CN113839862B (en) Method, system, terminal and storage medium for synchronizing ARP information between MCLAG neighbors
US20160295301A1 (en) Communication protocol control method and device in smart tv apparatus
CN109379760B (en) MEC bypass system and method
US11558286B2 (en) Wireless control and fabric links for high-availability cluster nodes
CN114172841B (en) Message forwarding method and device, electronic equipment and read storage medium
CN108123865B (en) Message processing method and device
WO2023284231A1 (en) Message processing method and message processing system
TWI701925B (en) Method for providing network service through edge computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant