CN114745255B - Hardware chip, DPU, server, communication method and related device - Google Patents

Hardware chip, DPU, server, communication method and related device Download PDF

Info

Publication number
CN114745255B
CN114745255B CN202210381313.0A CN202210381313A CN114745255B CN 114745255 B CN114745255 B CN 114745255B CN 202210381313 A CN202210381313 A CN 202210381313A CN 114745255 B CN114745255 B CN 114745255B
Authority
CN
China
Prior art keywords
port
bond
message
rep
forwarding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210381313.0A
Other languages
Chinese (zh)
Other versions
CN114745255A (en
Inventor
李吉
孙路遥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xingyun Zhilian Technology Co ltd
Original Assignee
Shenzhen Xingyun Zhilian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Xingyun Zhilian Technology Co ltd filed Critical Shenzhen Xingyun Zhilian Technology Co ltd
Priority to CN202210381313.0A priority Critical patent/CN114745255B/en
Publication of CN114745255A publication Critical patent/CN114745255A/en
Application granted granted Critical
Publication of CN114745255B publication Critical patent/CN114745255B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/22Alternate routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/54Organization of routing tables
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the application discloses a hardware chip, a DPU, a server, a communication method and a related device, wherein the hardware chip comprises a first port and a second port, the first port is a first proxy port of an actual physical port of the hardware chip on a data plane development kit, and user mode message receiving and transmitting of the actual physical port is completed through the first port; the second port is a second agent port of the aggregation port taking the first port as a member port on the data plane development kit; the second proxy port is a virtual proxy port on the data plane development suite side in actual physics, and is used for receiving and transmitting messages according to the second proxy port in a user mode. The embodiment of the application can improve the flexibility of the access application of the dual-home network.

Description

Hardware chip, DPU, server, communication method and related device
Technical Field
The application relates to the technical field of computers, in particular to a hardware chip, a DPU, a server, a communication method and related devices.
Background
In network planning and design, in order to ensure reliability and continuity of service, various redundancy designs, such as link redundancy, node redundancy, etc., need to be considered, and further, a server usually adopts dual-homing access to the network. Under the background, a network equipment stacking technology is generated, but the stacking technology has the problems of control surface normalization, difficult smooth upgrading, incapability of heterogeneous of multiple manufacturers, high operation and maintenance difficulty and the like. Therefore, the problem of how to improve the flexibility of dual home network access applications is to be solved.
Disclosure of Invention
The embodiment of the application provides a hardware chip, a DPU, a server, a communication method and a related device, which can improve the flexibility of dual-homing network access application.
In a first aspect, embodiments of the present application provide a hardware chip including a first port and a second port, where,
the first port is a first proxy port of an actual physical port of the hardware chip on a data plane development kit, and user state message receiving and transmitting of the actual physical port is completed through the first port;
the second port is a second agent port of the aggregation port taking the first port as a member port on the data plane development kit;
the second proxy port is a virtual proxy port on the data plane development suite side in actual physics, and is used for receiving and transmitting messages according to the second proxy port in a user mode.
In a second aspect, embodiments of the present application provide a server comprising a hardware chip as described in the first aspect.
In a third aspect, an embodiment of the present application provides a communication method, applied to an electronic device including the hardware chip according to the first aspect or the second aspect, where the method includes:
receiving a message through an eth port;
searching forwarding and sending a MISS according to the received message through the hardware chip, and modifying an input port into a bond port according to a LAG Table;
packaging a received message information expansion header of the bond port, and realizing uploading to a designated function;
analyzing the information expansion header of the received message through the designated function to judge whether the received message is a rep received message or not;
if yes, the received message is stripped and expanded to be sent to a bond rep user mode driver, a packet is received through the bond rep user mode driver, if the received message is an LACP message, the received message is processed through LACP negotiation, the received message is sent to the hardware chip, and an entry end is set as a flow table entry of a bond port for direct table lookup forwarding of a subsequent message.
In a fourth aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, the programs including instructions for performing the steps in the third aspect of the embodiment of the present application.
In a fifth aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program for electronic data exchange, wherein the computer program causes a computer to execute some or all of the steps described in the third aspect of the embodiments of the present application.
In a sixth aspect, embodiments of the present application provide a computer program product, wherein the computer program product comprises a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform part or all of the steps described in the third aspect of the embodiments of the present application. The computer program product may be a software installation package.
The embodiment of the application has the following beneficial effects:
it can be seen that, in the embodiment of the present application, the hardware chip, the DPU, the server, the communication method and the related devices are described, where the hardware chip includes a first port and a second port, the first port is a first proxy port of an actual physical port of the hardware chip on a data plane development suite, and user state packet sending and receiving of the actual physical port is completed through the first port; the second port is a second agent port of the aggregation port taking the first port as a member port on the data plane development kit; the second proxy port is a virtual proxy port on the data plane development suite side in the actual physical, and is used for receiving and transmitting messages in a user state according to the second proxy port, and the data plane unloading is performed by realizing a receiving and transmitting mechanism of a user state bond rep and combining a flow table, so that the separation of the data plane and the control plane is realized, and the method has better flexibility and forwarding performance.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a network diagram of a server-side network card "single access" data center provided by an embodiment of the present application;
fig. 2 is a network diagram of a server-side network card "dual homing" access data center according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a communication system according to an embodiment of the present application;
FIG. 4 is a schematic flow chart of bond rep creation according to an embodiment of the present application;
FIG. 5 is a schematic flow chart of bond rep initialization according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a bond rep pmd driver functional L0 structure according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a control-rotation separated data plane according to an embodiment of the present application;
fig. 8 is a timing chart of a forwarding control separation packet forwarding process according to an embodiment of the present application;
fig. 9 is a timing chart of a packet processing and separating operation according to an embodiment of the present application;
FIG. 10 is a timing diagram of a software/hardware interaction provided by an embodiment of the present application;
FIG. 11 is a schematic flow chart of a communication method according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms first, second and the like in the description and in the claims and in the above-described figures are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
The electronic device described in the embodiments of the present application may include a smart Phone (such as an Android mobile Phone, an iOS mobile Phone, a Windows Phone mobile Phone, etc.), a tablet computer, a palm computer, a vehicle recorder, a notebook computer, a mobile internet device (MID, mobile Internet Devices), or a wearable device (such as a smart watch, a bluetooth headset), a special processor (data processing unit, DPU) configured with data as a center, etc., which are merely exemplary, but not exclusive, including but not limited to the electronic device described above, and the electronic device may also include a server, such as a cloud server. The server may include a hardware chip that may include a network card, for example, the hardware chip may include: a Field programmable gate array (Field-Programmable Gate Array, FPGA) may be inserted into the server, for example, when the hardware chip is a network card, which may include a DPU network card, for example.
In an embodiment of the present application, the DPU may be a special purpose processor configured with data as a center, which may support infrastructure layer resource virtualization using software defined technology routing for supporting infrastructure layer services such as storage, security, quality of service management, and the like.
In the embodiment of the application, the user mode refers to two similar concepts in the computer structure. In the design of a CPU, a user state refers to a non-privileged state. In this state, the code being executed is defined by hardware and cannot perform certain operations, such as writing into the memory space of other processes, to prevent security risks to the operating system. In the design of an operating system, user states are also similar, referring to non-privileged execution states. The kernel prohibits potentially dangerous operations from being performed by the code in this state, such as writing system configuration files, killing other user processes, restarting the system, etc.
In the embodiment of the application, the flow table can be regarded as an abstraction of the data forwarding function of the network device by the OpenFlow. In conventional network devices, the data forwarding of the switch and the router needs to rely on a two-layer MAC address forwarding table or a three-layer IP address routing table stored in the device, and the flow table used in the OpenFlow switch is also the same, but the table entry of the flow table integrates network configuration information of each layer in the network, so that richer rules can be used when data forwarding is performed.
In an embodiment of the present application, the OVS may include: link aggregation proxy ports, or virtual switches (Open vswitchs).
Embodiments of the present application are described in detail below.
In the related art, in order to solve the deficiency of stacking technology, in the scene with higher requirement of networking reliability, an MLAG technology is proposed, but the technical scheme of MLAG also has the problems that multiple manufacturers cannot heterogeneous, configuration is synchronous, and peer link TOR port resources need to be additionally provided. To better address the above issues, more and more internet companies now employ a "unstacking" dual homing scheme that almost addresses many of the deficiencies in virtualized stacking and MLAGs. In order to enable the server network card to meet the networking requirement of the dual-homing access data center, the following solution is mainly adopted:
scheme one: providing an add bond mechanism based on the OVS to create a bond port, wherein the member port is an actual physical port, and realizing ARP message multiple sending by configuring an ARP message flow table, thereby realizing the table item synchronization on an independent TOR;
scheme II, based on DPDK provided bond mechanism, through modifying bond tx brust function to realize ARP message multiple, thereby realizing list item synchronization on independent TOR.
In the related art, as shown in fig. 1, it can only meet the scenario of "single access" TOR of the server network card, where stacking and MLAG are equivalent to "single access", and further, the technology of implementing link aggregation bond at the server side is roughly divided into "add-bond" mechanism provided by OVS and bond mechanism provided by the data plane development suite (data plane development kit, DPDK); both mechanisms (schemes) only provide the networking capability of the server and the TOR single-homing access, and the networking requirement of the server-side network card double-homing access data center cannot be independently realized.
Furthermore, the two solutions described above have the following drawbacks:
aiming at scheme one: the OVS is forwarded independently, a kernel data path (data path) is needed to be relied on, and the forwarding control separation cannot be realized, so that the forwarding performance and the flexibility are poor;
aiming at a scheme II: the bond mechanism realized by the DPDK mechanism is complicated to create and modify depending on eal parameters during starting, and has poor flexibility depending on an application layer to issue commands; and the transfer control separation can not be realized, and the transfer performance is poor.
With reference to fig. 2, fig. 2 is a network diagram of a server-side network card "dual homing" access data center, where a server accesses two different access switches through an interface 0 and an interface 1 respectively, and the access switches access a cloud architecture (Spine-Leaf Fabric) through an equivalent-Cost Multi-path Routing (ECMP).
Alternatively, the hardware chip may include a first port and a second port, wherein,
the first port is a first proxy port of an actual physical port of the hardware chip on a data plane development kit, and user state message receiving and transmitting of the actual physical port is completed through the first port;
the second port is a second agent port of the aggregation port taking the first port as a member port on the data plane development kit;
the second proxy port is a virtual proxy port on the data plane development suite side in actual physics, and is used for receiving and transmitting messages according to the second proxy port in a user mode.
By way of illustration, the hardware chip may comprise a server or one of the chips in the server, the server may comprise a first port, which may be an eth rep port (eth proxy port), and a second port, which may be a bond rep port (bond proxy port). For example, interface 0 is an eth rep port, which may also be referred to as an eth port; interface 1 is a bond rep port, which may also be referred to as a bond port. The two interfaces can be connected to the corresponding access switch, and then the access switch is connected to the cloud framework. The communication modes corresponding to the interface 0 and the interface 1 can be realized through link aggregation.
Furthermore, as shown in fig. 3, the embodiment of the application provides a forwarding framework based on ovs+dpdk, and provides a technical scheme for realizing "dual homing" access data center networking of a server DPU network card by user mode bond rep port+transfer control separation. The proposal provides the concepts of a bond rep port and an eth rep port, wherein eth rep is a proxy (representator) port of the actual physical port of the network card on the DPDK, and user mode message receiving and transmitting of the actual physical port is completed through the eth rep port. bond rep is a proxy port on DPDK for an aggregate port with eth rep as a member port. The port is mainly used for realizing message transceiving of the user mode aggregation port so as to reduce hardware dependence. The representator port is a virtual proxy port on the DPDK side in actual physical, and the message is sent and received according to the representator port in user mode.
According to the embodiment of the application, the data plane is unloaded by realizing the transceiving mechanism of the user mode bond rep and combining the flow table, so that the separation of the data plane and the control plane (the encapsulation of the load (the data plane) and the transmission of the load (the control plane)) is realized, and the flexibility and the forwarding performance are better.
Optionally, the hardware chip includes: the message matching action processing module and the data plane development suite; the message matching action processing module comprises a matching search engine and an action processing engine, the action processing engine comprises a data packet descriptor processor and a bond forwarding processing module, the data packet descriptor processor is connected with the bond forwarding processing module, and the data plane development suite comprises a bond rep user mode driver;
the bond forwarding processing module is configured to maintain a LAG table, where the LAG table is issued to the hardware chip after the second port is successfully created;
the bond forwarding processing module is further configured to perform hash routing and member port information mapping according to the LAG table of the bond port, where after the data plane is unloaded, the corresponding output port is the flow of the bond port;
the bond forwarding processing module is further used for matching the received message; and when the MAC address of the target terminal is the message of the MAC corresponding to the bond port, packaging the corresponding message expansion head, and sending the message expansion head to a flow forwarding engine for searching and forwarding.
In a specific implementation, as shown in fig. 3, the hardware chip includes: a packet matching action processing module (match lookup engine) and a data plane development suite (ovs_dpdk), the packet matching action processing module including a matching lookup engine and an action processing engine (action engine), the action processing engine including a packet descriptor processor (packet descriptor processor) and a bond forwarding processing module (bond), the packet descriptor processor being connected to the bond forwarding processing module and also being connectable to a receiving end scaling (receive side scaling, RSS) module, the data plane development suite including a bond rep user state driver, the data plane development suite further including: a Control plane (ofproco) dpif operation module (Control plane (ofproco) dpif operation), a hardware offload agent (Hardware offloading agent), and a data plane (ovs_dp_ upcall Data plane).
The bond forwarding processing module may implement packet description, for example, implement bond LAG hash operation (bond LAG hash) first, and then find a port Map (look up port Map), where the bond forwarding processing module portion may implement the following functions:
1. maintaining a LAG Table that issues chips after the creation of bond rep by software is successful, including but not limited to the following information: bond port and member port information, member port link information, bond mode\mac\ip\xmit policy and other key information;
2. after the data surface is unloaded, aiming at the flow of which the outlet port is a bond port, carrying out hash routing and member port information mapping according to the bond port LAG Table entry;
3. for the received message, performing message matching, encapsulating a corresponding message expansion head aiming at the message of which the MAC address (DMAC) of the target terminal is the bond port MAC, and sending the message expansion head to a flow forwarding engine for table lookup and forwarding; wherein the extension header information includes, but is not limited to: port information, queue information, message parsing information, device information, etc.; the information in the expansion head can also be interacted with software and hardware in the form of modifying descriptors or custom messages. Wherein the device information may include at least one of: device name, device model, IP address, MAC address, user name, etc., are not limited herein.
Optionally, the bond rep user mode driver is used for realizing creation, deletion and attribute configuration of a bond rep port;
the bond rep user state driver supports LACP protocol and is used for realizing LAG dynamic negotiation;
the bond rep user mode driver is used for realizing multiple member ports aiming at ARP messages and externally issued route information;
the bond rep user state driver is used for supporting physical link detection and synchronization of a physical link and a rep state, and refreshing forwarding table entries corresponding to the bond rep according to the rep state change.
In a specific implementation, as shown in fig. 3, the bond rep module may be used to support the following functions: rte eth dev, link aggregation control protocol (link aggregation control protocol), address resolution protocol (address resolution protocol, ARP) double-shot, connection sidelink check (bond slave link check), bond rep user mode driver (bond rep pmd driver). The port mapping table (update port map) is updated by an upward call (upcall) mode between the bond forwarding processing module and the bond rep module.
In a specific implementation, bond rep pmd driver of the network card on the DPDK side needs to be implemented, and the following functions need to be implemented in this part:
1. creating and deleting bond rep port, configuring attribute;
2. supporting LACP protocol negotiation, which is used for LAG dynamic negotiation of TOR access networking of external 'unstacking' scene;
3. aiming at ARP messages, external release routes and other information, the member ports are multiple, and the member ports are used for realizing the table item information synchronization between TORs in a 'unstacking' scene;
4. the physical link detection and the physical link are synchronous with the rep state, and the forwarding table entry corresponding to bond rep is refreshed according to the rep state change;
5. issuing a chip LAG Table and maintaining Table entries.
Further, the bond rep user mode driver comprises an external function interface layer, a user-defined scheme function layer and a user mode driver basic function layer;
the external function interface layer is used for creating a bond port and setting the attribute of the bond port;
the custom scheme functional layer is used for realizing at least one of the following functions: ARP protocol message processing, double-sending reply, physical link monitoring mechanism realization, LACP protocol service function realization, PFGA table entry issuing and maintenance;
the user mode driving basic function layer is used for realizing at least one of the following functions: bond transceiver, network card driver registration information, user mode driver basic configuration parameters.
In a specific implementation, the attribute settings may be referred to as follows:
"slave= < ifc >" -member port id, only two ids 2048 and 2049 are supported at present, representing eth rep;
"primary= < ifc >" -custom master member port; defaulting the first added member port to be the main member port;
"mode= [0-6]" -bond port mode, only configuration as mode=4 is supported at present;
"xmit_policy= [ l2|l23|l34]" -bond port select member port hash policy configuration; recommended configuration is l23;
"agg_mode= [ count|stable|bandwidth ]" -bond port aggregation id selection mode; default stable;
"bond_mac= < mac addr >" -mac setting of bond rep port for lacp co-quotient message system_id;
"lsc_poll_period_ms= < int >" -link lsc poll detection period;
"up_delay= < int >" -link down- > up state change delay time;
"Down_delay= < int >" -link up-a down state change delay time.
In a specific implementation, as shown in fig. 4, for creation of a bond rep port, the following steps may be included: creating parameter analysis: rte _kvargs_parameter; checking parameters; if yes, creating a bond_alloc and a bond rte_eth_dev; setting a bond rte_eth_dev private data parameter; the bond LACP timing mechanism is realized by a callback; setting a bond port TX RX callback function according to the mode parameter; the eth_dev_ops callback function implements and hooks. If not, the operation is directly ended.
In a specific implementation, as shown in fig. 5, the bond rep initialization complete process may include the following steps: bond_create; bond dev_configuration-call, dev_configuration callback; bond slave add and complete slave port configuration—slave_configuration; setting a bond slave link detection mechanism callback, wherein the function needs to be refreshed synchronously with an FPGA table entry; bond tx rx queue setup; bond dev start; bond slave active; and issuing the FPGA table item.
Optionally, the bond rep user mode driver includes an external function interface layer, a user-defined scheme function layer, and a user mode driver basic function layer;
the external function interface layer is used for creating a bond port and setting the attribute of the bond port;
the custom scheme functional layer is used for realizing at least one of the following functions: ARP protocol message processing, double-sending reply, physical link monitoring mechanism realization, LACP protocol service function realization, PFGA table entry issuing and maintenance;
the user mode driving basic function layer is used for realizing at least one of the following functions: bond transceiver, network card driver registration information, user mode driver basic configuration parameters.
In specific implementation, referring to fig. 6, fig. 6 is a functional L0 structure driven by a bond rep user mode, and an external functional interface layer is used for creating a bond port and setting properties of the bond port, and a custom scheme functional layer is used for implementing at least one of the following functions: ARP protocol message processing, double-sending reply, physical link monitoring mechanism realization, LACP protocol service function realization, PFGA table entry issuing and maintenance; the user mode driving basic function layer is used for realizing at least one of the following functions: bond transceiver, network card driver registration information (e.g., eth_dev_ops), user mode driver basic configuration parameters (pmd driver basic configuration).
Optionally, the bond forwarding processing module is configured to, between the data channel flow forwarding engine and the ethernet MAC; firstly calculating a hash value for the transmitted flow, and then selecting a port according to the LAG Table; for received traffic, the ingress port is set to the bond0 port according to the MAC that the DMAC is the bond0 port, and the flow forwarding engine looks up the flow table.
In a specific implementation, on a data forwarding plane: logically bond forwarding plane between data channel stream forwarding engine and ethernet MAC; firstly calculating hash for Tx traffic, and then selecting a port according to a LAG Table; for Rx traffic, according to the MAC that DMAC is bond0, the ingress port is set to bond0 and the stream forwarding engine looks up the stream table.
Optionally, the bond rep user state driver is responsible for processing the LACP protocol to monitor the link state and updating the LAG Table of the chip through the bond driver, and is also responsible for extracting the extension header and updating the port into the memory cache for the message sent by the MIss on the flow forwarding engine; and for the message sent to the bond0 port, adding an extension head with an output port being the bond0 port, and sending the extension head to a logic data channel.
In a specific implementation, on a protocol control plane, a bond driver of a bond control plane on an OVS_DPDK is responsible for processing a LACP protocol monitoring link state and updating a chip LAG Table, and for a message sent by a MIss on a stream forwarding engine, the bond driver is responsible for extracting an extension header and updating an input port to a memory cache (mbuf); and for the message sent to bond0, adding an extension header with an output port of bond0, and sending the extension header to a logic data channel.
Further, as shown in fig. 7, fig. 7 is a diagram of a forwarding control split data plane, in which PPE is a flow forwarding processing engine in a chip; for example, the hardware chip may include a DPU, and send a message to the bond0 port through the DPU, and then perform LAG Hash calculation, and select the port according to the LAG Table.
The following is an example of a hardware chip as an FPGA, i.e. chip/FPGA.
Under the illustration, as shown in fig. 8, fig. 8 is a timing chart of forwarding and control separation packet sending process, in a specific implementation, a user side (user) can send a packet, and uses OVS to forward in a soft way, an output port is a bond rep for sending the packet, after a bond rep user state driver receives the packet, member port copying is performed on service messages such as ARP, routing protocol and the like, and full transmission is performed, corresponding member ports are selected according to mode and policy for sending packets to member port sending packets (slave port tx), packet sending expansion header encapsulation is performed from a sending port, and a PF packet sending function is multiplexed for sending a packet down chip/FPGA; after numerical control separation, sending a data message, and inserting a chip/FPGA under the message sent by the application; the chip performs hash selection and eth port sending according to the flow Table and the LAG Table; and returning a sending result.
For further illustration, as shown in fig. 9, fig. 9 is a timing chart of a transfer control separation packet receiving process, in a specific implementation, a user end (user) receives a message through an eth port, after a chip/FPGA receives the message, the message is forwarded according to a Table lookup Table of the received message, a mass is sent, and the port information is modified according to a LAG Table to be a bond port; packaging a bond port receiving message information expansion header, sending the expansion header to an analysis function (PF rx patch, pcie PF receiving packet hook function) by a PF channel, and analyzing the expansion header information by the PF rx patch to judge whether the expansion header is a bond rep receiving message; sending the message to a bond rep user mode driver (bond rep pmd/bond rep sw_ring) after message stripping expansion; the bond rep receives packets according to the mode and the hash poll member port; if the message is an LACP message, entering the rx_ring of the corresponding slave port, and remaining in the LACP cb function for timing processing for LACP negotiation; and sending the corresponding message to the chip/FPGA, wherein the input port is a flow table entry of the bond port and is used for directly looking up and forwarding the subsequent message.
For further illustration, as shown in fig. 10, fig. 10 is a timing diagram of software and hardware interaction, and a client (user) creates bond rep; the bond rep pmd is successfully executed by creating a configuration starting flow through bond rep; issuing LAG tables corresponding to bond rep to the chip/FPGA; generating by the chip/FPGA hardware table item, and returning the creation result to bond rep; when the bond rep needs to be deleted/set, the bond port deletes and resource releases/refreshes the list item; deleting/refreshing the corresponding hardware table item; and refreshing the table entry according to the change of the bond rep member port state.
The embodiment of the application provides a technical scheme for realizing the networking of the server DPU network card double-homing access data center based on the separation of a user mode bond rep port and a transfer control, which can well solve the problem of the networking of the server DPU network card double-homing access data center by realizing the separation of the user mode bond rep pmd driver and the transfer control. On the basis of solving the problems, the scheme can provide better forwarding performance, scheme flexibility and service stability compared with the related technology.
It can be seen that, in the hardware chip described in the embodiment of the present application, the hardware chip includes a first port and a second port, where the first port is a first proxy port of an actual physical port of the hardware chip on a data plane development suite, and user state packet sending and receiving of the actual physical port is completed through the first port; the second port is a second agent port of the aggregation port taking the first port as a member port on the data plane development kit; the second proxy port is a virtual proxy port on the data plane development suite side in the actual physical, and is used for receiving and transmitting messages in a user state according to the second proxy port, and the data plane unloading is performed by realizing a receiving and transmitting mechanism of a user state bond rep and combining a flow table, so that the separation of the data plane and the control plane is realized, and the method has better flexibility and forwarding performance.
Referring to fig. 11, fig. 11 is a flow chart of a communication method according to an embodiment of the present application, which is applied to the above hardware chip or server, and as shown in the drawings, the communication method includes:
101. and receiving the message through an eth port.
102. And searching and forwarding and sending the message through the hardware chip according to the received message, and modifying the input port into a bond port according to the LAG Table.
103. And packaging the received message information expansion header of the bond port, and realizing uploading to a designated function.
104. Analyzing the received message information expansion header through the specified function to judge whether the received message is a rep received message or not.
105. If yes, the received message is stripped and expanded to be sent to a bond rep user mode driver, a packet is received through the bond rep user mode driver, if the received message is an LACP message, the received message is processed through LACP negotiation, the received message is sent to the hardware chip, and an entry end is set as a flow table entry of a bond port for direct table lookup forwarding of a subsequent message.
The specified function may be set by the user or default by the system, for example, the specified function may include a PF rx patch.
In the specific implementation, a user side (user) receives a message through an eth port, after the chip/FPGA receives the message, the message is checked and forwarded according to the received message, the message is sent up, and the port information is modified as a bond port according to the LAG Table; packaging a bond port receiving message information expansion header, sending the expansion header to an analysis function (PF rx patch, pcie PF receiving packet hook function) on a PF channel, analyzing the expansion header information by the PF rx patch, and judging whether the expansion header information is a bond rep receiving message; sending the message to a bond rep user mode driver (bond rep pmd/bond rep sw_ring) after message stripping expansion; the bond rep receives packets according to the mode and the hash poll member port; if the message is an LACP message, entering the rx_ring of the corresponding slave port, and remaining in the LACP cb function for timing processing for LACP negotiation; and sending the corresponding message to the chip/FPGA, wherein the input port is a flow table entry of the bond port and is used for directly looking up and forwarding the subsequent message.
It can be seen that, in the communication method described in the embodiment of the present application, a message is received through an eth port; the hardware chip searches for forwarding and sending a MISS according to the received message, and modifies the input port to be a bond port according to the LAG Table; packaging a message information receiving expansion head of a bond port, and realizing uploading to a designated function; analyzing the information expansion header of the received message through a designated function to judge whether the received message is a rep received message or not; if yes, the received message is stripped and expanded to be sent to a bond rep user mode driver, a packet is received through the bond rep user mode driver, if the received message is an LACP message, the received message is processed through LACP negotiation, the received message is issued to a hardware chip, a flow table entry with an inlet end being a bond port is set for directly looking up and forwarding the subsequent message, and the data surface is unloaded by realizing a receiving and transmitting mechanism of the user mode bond rep and combining a flow table, so that the separation of the data surface and a control surface is realized, and the method has better flexibility and forwarding performance.
In accordance with the above embodiment, referring to fig. 12, fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application, as shown in the drawing, the electronic device includes a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, and in the embodiment of the present application, the programs include instructions for executing the following steps:
receiving a message through an eth port;
searching forwarding and sending a MISS according to the received message through the hardware chip, and modifying an input port into a bond port according to a LAG Table;
packaging a received message information expansion header of the bond port, and realizing uploading to a designated function;
analyzing the information expansion header of the received message through the designated function to judge whether the received message is a rep received message or not;
if yes, the received message is stripped and expanded to be sent to a bond rep user mode driver, a packet is received through the bond rep user mode driver, if the received message is an LACP message, the received message is processed through LACP negotiation, the received message is sent to the hardware chip, and an entry end is set as a flow table entry of a bond port for direct table lookup forwarding of a subsequent message.
It can be seen that, the electronic device described in the embodiment of the present application receives a message through the eth port; the hardware chip searches for forwarding and sending a MISS according to the received message, and modifies the input port to be a bond port according to the LAG Table; packaging a message information receiving expansion head of a bond port, and realizing uploading to a designated function; analyzing the information expansion header of the received message through a designated function to judge whether the received message is a rep received message or not; if yes, the received message is stripped and expanded to be sent to a bond rep user mode driver, a packet is received through the bond rep user mode driver, if the received message is an LACP message, the received message is processed through LACP negotiation, the received message is issued to a hardware chip, a flow table entry with an inlet end being a bond port is set for directly looking up and forwarding the subsequent message, and the data surface is unloaded by realizing a receiving and transmitting mechanism of the user mode bond rep and combining a flow table, so that the separation of the data surface and a control surface is realized, and the method has better flexibility and forwarding performance.
The embodiment of the application also provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program makes a computer execute part or all of the steps of any one of the above method embodiments, and the computer includes an electronic device.
Embodiments of the present application also provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program operable to cause a computer to perform part or all of the steps of any one of the methods described in the method embodiments above. The computer program product may be a software installation package, said computer comprising an electronic device.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, such as the above-described division of units, merely a division of logic functions, and there may be additional manners of dividing in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, or may be in electrical or other forms.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a memory, comprising several instructions for causing a computer device (which may be a personal computer, an electronic device or a network device, etc.) to perform all or part of the steps of the above-described method of the various embodiments of the present application. And the aforementioned memory includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the above embodiments may be implemented by a program that instructs associated hardware, and the program may be stored in a computer readable memory, which may include: flash disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.
The foregoing has outlined rather broadly the more detailed description of embodiments of the application, wherein the principles and embodiments of the application are explained in detail using specific examples, the above examples being provided solely to facilitate the understanding of the method and core concepts of the application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (9)

1. A hardware chip, characterized in that the hardware chip comprises a first port and a second port, wherein,
the first port is a first proxy port of an actual physical port of the hardware chip on a data plane development kit, and user state message receiving and transmitting of the actual physical port is completed through the first port;
the second port is a second agent port of the aggregation port taking the first port as a member port on the data plane development kit;
the second proxy port is a virtual proxy port on the data plane development suite side in actual physics, and is used for receiving and transmitting messages according to the second proxy port in a user state;
wherein, the hardware chip includes: the message matching action processing module and the data plane development suite; the message matching action processing module comprises a matching search engine and an action processing engine, the action processing engine comprises a data packet descriptor processor and a bond forwarding processing module, the data packet descriptor processor is connected with the bond forwarding processing module, and the data plane development suite comprises a bond rep user mode driver;
the bond forwarding processing module is configured to maintain a LAG table, where the LAG table is issued to the hardware chip after the second port is successfully created;
the bond forwarding processing module is further configured to perform hash routing and member port information mapping according to the LAG table of the bond port, where after the data plane is unloaded, the corresponding output port is the flow of the bond port;
the bond forwarding processing module is further used for matching the received message; and when the MAC address of the target terminal is the message of the MAC corresponding to the bond port, packaging the corresponding message expansion head, and sending the message expansion head to a flow forwarding engine for searching and forwarding.
2. The hardware chip of claim 1, wherein the bond rep user state driver is configured to implement creation, deletion, and attribute configuration of bond rep ports;
the bond rep user state driver supports LACP protocol and is used for realizing LAG dynamic negotiation;
the bond rep user mode driver is used for realizing multiple member ports aiming at ARP messages and externally issued route information;
the bond rep user state driver is used for supporting physical link detection and synchronization of a physical link and a rep state, and refreshing forwarding table entries corresponding to the bond rep according to the rep state change.
3. The hardware chip of claim 2, wherein the bond rep user state driver comprises an external function interface layer, a custom scheme function layer, a user state driver basic function layer;
the external function interface layer is used for creating a bond port and setting the attribute of the bond port;
the custom scheme functional layer is used for realizing at least one of the following functions: ARP protocol message processing, double-sending reply, a physical link monitoring mechanism, LACP protocol service functions, and PFGA table entry issuing and maintaining;
the user mode driving basic function layer is used for realizing at least one of the following functions: bond transceiver, network card driver registration information, user mode driver basic configuration parameters.
4. The hardware chip of claim 1, wherein the bond forwarding processing module is configured to be between a data channel stream forwarding engine and an ethernet MAC; firstly calculating a hash value for the transmitted flow, and then selecting a port according to the LAG Table; for received traffic, the ingress port is set to the bond0 port according to the MAC that the DMAC is the bond0 port, and the flow forwarding engine looks up the flow table.
5. The hardware chip of claim 4, wherein the bond rep user state driver is responsible for processing LACP protocol monitoring link state and updating chip LAG Table through bond driver, and for the message sent by the flow forwarding engine in a mass, is also responsible for extracting extension header and updating ingress port to memory cache; and for the message sent to the bond0 port, adding an extension head with an output port being the bond0 port, and sending the extension head to a logic data channel.
6. A server comprising a hardware chip as claimed in any one of claims 1-5.
7. A communication method, characterized in that it is applied to an electronic device comprising a hardware chip according to any of claims 1-6, said method comprising:
receiving a message through an eth port;
searching forwarding and sending a MISS according to the received message through the hardware chip, and modifying an input port into a bond port according to a LAG Table;
packaging a received message information expansion header of the bond port, and realizing uploading to a designated function;
analyzing the information expansion header of the received message through the designated function to judge whether the received message is a rep received message or not;
if yes, the received message is stripped and expanded to be sent to a bond rep user mode driver, a packet is received through the bond rep user mode driver, if the received message is an LACP message, the received message is processed through LACP negotiation, the received message is sent to the hardware chip, and an entry end is set as a flow table entry of a bond port for direct table lookup forwarding of a subsequent message.
8. An electronic device comprising a processor, a memory for storing one or more programs and configured to be executed by the processor, the program comprising instructions for performing the steps in the method of claim 7, the electronic device comprising a DPU.
9. A computer-readable storage medium, characterized in that a computer program for electronic data exchange is stored, wherein the computer program causes a computer to perform the method according to claim 7.
CN202210381313.0A 2022-04-12 2022-04-12 Hardware chip, DPU, server, communication method and related device Active CN114745255B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210381313.0A CN114745255B (en) 2022-04-12 2022-04-12 Hardware chip, DPU, server, communication method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210381313.0A CN114745255B (en) 2022-04-12 2022-04-12 Hardware chip, DPU, server, communication method and related device

Publications (2)

Publication Number Publication Date
CN114745255A CN114745255A (en) 2022-07-12
CN114745255B true CN114745255B (en) 2023-11-10

Family

ID=82280749

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210381313.0A Active CN114745255B (en) 2022-04-12 2022-04-12 Hardware chip, DPU, server, communication method and related device

Country Status (1)

Country Link
CN (1) CN114745255B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115114219B (en) * 2022-07-22 2023-10-20 深圳星云智联科技有限公司 PCI-E topology method, device, equipment and storage medium
CN117768292A (en) * 2022-09-24 2024-03-26 华为技术有限公司 Management method, device and system of logic binding port and storage medium
CN117278620A (en) * 2023-09-21 2023-12-22 中科驭数(北京)科技有限公司 Configuration method and system of data plane forwarding rule of DPU
CN117692382B (en) * 2024-02-04 2024-06-07 珠海星云智联科技有限公司 Link aggregation method, network card, equipment and medium
CN117978758B (en) * 2024-03-29 2024-06-07 珠海星云智联科技有限公司 Adaptation method for data processing unit, computer device and medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103401797A (en) * 2013-07-24 2013-11-20 杭州华三通信技术有限公司 Message processing method and equipment
CN105991439A (en) * 2015-02-06 2016-10-05 杭州华三通信技术有限公司 Management method and device of data center server (DC server)
CN106533943A (en) * 2016-12-06 2017-03-22 中国电子科技集团公司第三十二研究所 Method for realizing microcode and flow table based on network switching chip
CN106961363A (en) * 2017-03-29 2017-07-18 云络动力(北京)科技有限公司 A kind of method and system for capturing virtual switch User space data plane data message
CN107947977A (en) * 2017-11-21 2018-04-20 北京邮电大学 A kind of collocation method of interchanger, device, electronic equipment and storage medium
CN109861839A (en) * 2017-11-30 2019-06-07 华为技术有限公司 The unbroken virtual switch upgrade method of business and relevant device
CN110875844A (en) * 2018-08-30 2020-03-10 丛林网络公司 Multiple virtual network interface support for virtual execution elements
CN112671869A (en) * 2020-12-15 2021-04-16 北京天融信网络安全技术有限公司 Network bridge transparent proxy method, device, electronic equipment and storage medium
CN113821310A (en) * 2021-11-19 2021-12-21 阿里云计算有限公司 Data processing method, programmable network card device, physical server and storage medium
CN113923158A (en) * 2020-07-07 2022-01-11 华为技术有限公司 Message forwarding, routing sending and receiving method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11706099B2 (en) * 2018-06-29 2023-07-18 Juniper Networks, Inc. Monitoring and policy control of distributed data and control planes for virtual nodes

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103401797A (en) * 2013-07-24 2013-11-20 杭州华三通信技术有限公司 Message processing method and equipment
CN105991439A (en) * 2015-02-06 2016-10-05 杭州华三通信技术有限公司 Management method and device of data center server (DC server)
CN106533943A (en) * 2016-12-06 2017-03-22 中国电子科技集团公司第三十二研究所 Method for realizing microcode and flow table based on network switching chip
CN106961363A (en) * 2017-03-29 2017-07-18 云络动力(北京)科技有限公司 A kind of method and system for capturing virtual switch User space data plane data message
CN107947977A (en) * 2017-11-21 2018-04-20 北京邮电大学 A kind of collocation method of interchanger, device, electronic equipment and storage medium
CN109861839A (en) * 2017-11-30 2019-06-07 华为技术有限公司 The unbroken virtual switch upgrade method of business and relevant device
CN110875844A (en) * 2018-08-30 2020-03-10 丛林网络公司 Multiple virtual network interface support for virtual execution elements
CN113923158A (en) * 2020-07-07 2022-01-11 华为技术有限公司 Message forwarding, routing sending and receiving method and device
CN112671869A (en) * 2020-12-15 2021-04-16 北京天融信网络安全技术有限公司 Network bridge transparent proxy method, device, electronic equipment and storage medium
CN113821310A (en) * 2021-11-19 2021-12-21 阿里云计算有限公司 Data processing method, programmable network card device, physical server and storage medium

Also Published As

Publication number Publication date
CN114745255A (en) 2022-07-12

Similar Documents

Publication Publication Date Title
CN114745255B (en) Hardware chip, DPU, server, communication method and related device
JP7079866B2 (en) Packet processing method and device
CN107070691B (en) Cross-host communication method and system of Docker container
US9515890B2 (en) Method, system and controlling bridge for obtaining port extension topology information
WO2018108033A1 (en) Database migration method and device, terminal, system and storage medium
CN106789526B (en) method and device for connecting multiple system networks
EP2747381B1 (en) Method, network device and system for implementing network card offloading function
CN108632145B (en) Message forwarding method and leaf node equipment
WO2015000362A1 (en) Service node configuration method, service node pond register and system
US20160330167A1 (en) Arp Implementation Method, Switch Device, and Control Device
JP2020520612A (en) Packet transmission method, edge device, and machine-readable storage medium
WO2017128953A1 (en) Server virtualization network sharing apparatus and method
EP2924926B1 (en) Lookup table creation method and query method, and controller, forwarding device and system therefor
CN113839862B (en) Method, system, terminal and storage medium for synchronizing ARP information between MCLAG neighbors
CN114501593B (en) Network slice access method, device, system and storage medium
WO2023236858A1 (en) Flow table rule management method, traffic management method and system, and storage medium
CN113965521B (en) Data packet transmission method, server and storage medium
WO2015074537A1 (en) Method and apparatus for controlling communication protocol in smart tv device
US12003417B2 (en) Communication method and apparatus
WO2015188331A1 (en) 转发控制方法、驱动器及sdn网络 forwarding control method, driver and sdn network
CN108512737B (en) Data center IP layer interconnection method and SDN controller
CN109413118B (en) Method, device, storage medium and program product for realizing session synchronization
EP4383674A1 (en) Message processing method and related apparatus
WO2018121443A1 (en) Message transmission method and device
WO2018028592A1 (en) Method and device for receiving and sending messages

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant