CN116438787A

CN116438787A - Low-delay software-defined wide area network architecture

Info

Publication number: CN116438787A
Application number: CN202080106997.2A
Authority: CN
Inventors: 孙岩; 王锡磊
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2023-07-14
Also published as: WO2022139808A1

Abstract

The application discloses a network node comprising a memory and a processor coupled to the memory. The processor includes a first core and a second core, the processor to receive instructions from the memory, which when executed by the processor, cause the network node to: creating a first virtual queue associated with the first core and a first quality of service (quality of service, qoS); creating a second virtual queue associated with the second core and the first QoS; writing a first data packet associated with the first QoS to the first virtual queue through the first core at a first time; a second data packet associated with the first QoS is written to the second virtual queue through the second core substantially at the first time.

Description

Low-delay software-defined wide area network architecture

Technical Field

The present disclosure relates to the technical field of software defined wide area networks, and in particular to low latency software defined wide area network architecture.

Background

Businesses and other entities distributed across different geographic locations often communicate over a telecommunications network. Some networks are wide area networks (wide area network, WAN) that allow computers distributed over networks in different geographical locations to communicate. Some wide area networks operate with leased telecommunication circuits dedicated to the wide area network. On these leased circuits, data may be transported through multiprotocol label switching (MPLS) or multiprotocol label switching (multi-protocol label switching). MPLS provides defined routing through telecommunications circuitry. Some of the data may be transmitted over the internet. Data transmitted over the internet typically does not follow a defined route. Data transmission over the internet is generally less expensive than data transmission over MPLS circuitry. In addition, data is typically transmitted more slowly and less reliably through the internet than through MPLS circuitry. A software defined wide area network (SD-WAN) provides a cheaper alternative than a traditional wide area network. SD-WANs are typically internet dependent and thus the predictable performance provided by SD-WANs may be lower than that provided by leased telecommunication circuits using MPLS.

Disclosure of Invention

In a first aspect, a network node includes a memory, and a processor coupled to the memory. The processor includes a first core and a second core, the processor configured to receive instructions from the memory, the instructions when executed by the processor cause the network node to: creating a first virtual queue associated with the first core and a first quality of service (quality of service, qoS); creating a second virtual queue associated with the second core and the first QoS; writing a first data packet associated with the first QoS to the first virtual queue through the first core at a first time; a second data packet associated with the first QoS is written to the second virtual queue through the second core substantially at the first time.

By writing data to virtual queues having the same QoS at substantially the same time, the associated delay associated with locking the queues for writing is reduced.

In one possible implementation, the instructions further cause the network node to determine the serving rates of the first virtual queue and the second virtual queue based on the first QoS and one or more of a plurality of data packets associated with the first QoS or an amount of data associated with the first QoS.

In another possible implementation, the instructions further cause the network node to transmit data associated with the first QoS according to the provisioning rate.

In another possible implementation, the instructions further cause the network node to determine a first demand rate for the first virtual queue through the first core and a second demand rate for the second virtual queue through the second core.

In another possible implementation, the first demand rate is based on a first number of packets of the first QoS of the first core and the second demand rate is based on a second number of packets of the first QoS of the second core.

In another possible implementation, the processor further includes a scheduler, the instructions further causing the network node to: the scheduler determining a total demand rate comprising the first demand rate and the second demand rate; the scheduler determining a product of the first required rate and a rate limit allocated to the first QoS; the scheduler determining a supply rate of the first virtual queue as a quotient of the product and the total demand rate; the supply rate is transmitted to the first core.

In another possible implementation, the first demand rate includes write rights of the first core and the supply rate includes read rights of the first core.

In another possible implementation, the instructions further cause the network node to: the first core determining a third number of tokens according to the provisioning rate; when the third number is greater than a threshold, the first core sends the first data packet to a destination.

In another possible implementation, the instructions further cause the network node to write the first data packet without locking the first virtual queue.

In a second aspect, a method in a network node comprising a processor, the method comprising: creating a first virtual queue associated with a first core and a first quality of service (quality of service, qoS) of the processor; creating a second virtual queue associated with a second core of the processor and the first QoS; writing a first data packet associated with the first QoS to the first virtual queue through the first core at a first time; a second data packet associated with the first QoS is written to the second virtual queue through the second core substantially at the first time.

The method provides a technique that can improve queue throughput by writing virtual queues with the same QoS at substantially the same time, avoiding delays caused by locking the queues.

In one possible implementation, the method further includes: the supply rates of the first virtual queue and the second virtual queue are determined based on the first QoS and one or more of the plurality of data packets associated with the first QoS or the amount of data associated with the first QoS.

In another possible implementation, the method further includes transmitting data associated with the first QoS according to the provisioning rate.

In another possible implementation, the method further includes: the first core determining a first demand rate for the first virtual queue; the second core determines a second demand rate for the second virtual queue.

In another possible implementation, the method further includes: the scheduler determining a total demand rate comprising the first demand rate and the second demand rate; the scheduler determining a product of the first required rate and a rate limit allocated to the first QoS; the scheduler determining a supply rate of the first virtual queue as a quotient of the product and the total demand rate; the supply rate is transmitted to the first core.

In another possible implementation, the method further includes: the first core determining a third number of tokens according to the provisioning rate; when the third number is greater than a threshold, the first core sends the first data packet to a destination.

In another possible implementation manner, the writing the first data packet includes: the first data packet is written without locking the first virtual queue.

In a third aspect, a computer program product comprising instructions embodied on a computer readable medium, which when executed by a network node comprising a processor, cause the network node to: creating a first virtual queue associated with a first core and a first quality of service (quality of service, qoS) of the processor; creating a second virtual queue associated with a second core of the processor and the first QoS; writing a first data packet associated with the first QoS to the first virtual queue through the first core at a first time; a second data packet associated with the first QoS is written to the second virtual queue through the second core substantially at the first time.

The computer program product includes computer instructions for reducing delay in queues by writing virtual queues having the same QoS at substantially the same time, avoiding delay due to locking the queues.

In another possible implementation, the instructions further cause the network node to: the scheduler determining a total demand rate comprising the first demand rate and the second demand rate; the scheduler determining a product of the first required rate and a rate limit allocated to the first QoS; the scheduler determining a supply rate of the first virtual queue as a quotient of the product and the total demand rate; the supply rate is transmitted to the first core.

In an eighth implementation form of the computer program product of the third aspect, the instructions further cause the network node to write the first data packet without locking the first virtual queue.

A fourth aspect relates to a software defined wide area network (SD-WAN) comprising: a first network node for receiving network status reports and encryption information from a plurality of network nodes; generating a first tunnel through the SD-WAN according to the network status report; transmitting a configuration file to the plurality of network nodes 3, the configuration file including information of the first tunnel, encryption information of the plurality of network nodes, and network state information of the plurality of nodes; and a second network node of the plurality of network nodes, the second network node being for setting a service level obligation (service level obligation, SLO) for an application; setting a priority for the SLO; transmitting a data packet of the application using the first tunnel when the tunnel satisfies the SLO of the application; when the tunnel does not satisfy the SLO of the application, a second tunnel is identified for the data packet.

In a first implementation manner of the SD-WAN of the fourth aspect, the first tunnel comprises an internet-based tunnel, and the second tunnel comprises a multiprotocol label switching (MPLS) tunnel.

In a second implementation form of the SD-WAN of the fourth aspect, each of the network status reports comprises bandwidth available on one of the plurality of network nodes, delay of one of the plurality of network nodes and packet drop rate of one of the plurality of network nodes.

In a third implementation manner of the SD-WAN of the fourth aspect, the second network node is further configured to: identifying a third network node in a down state; reporting the shutdown state of the third network node to the first network node.

In a fourth implementation manner of the SD-WAN of the fourth aspect, the first network node is further configured to: and transmitting the updated configuration file to the plurality of network nodes in response to the shutdown state of the third network node.

In a fifth implementation manner of the SD-WAN of the fourth aspect, the second network node is further configured to: creating a first virtual queue associated with a first quality of service (quality of service, qoS) and a first core of a processor of the second network node; creating a second virtual queue associated with a second core of the processor and the first QoS; writing a first data packet associated with the first QoS to the first virtual queue through the first core at a first time; a second data packet associated with the first QoS is written to the second virtual queue through the second core substantially at the first time.

A sixth aspect relates to a method in a software defined wide area network (SD-WAN), the method comprising: the first network node receives network status reports and encryption information from a plurality of network nodes; the first node generating a first tunnel through the SD-WAN according to the network status report; the first node transmits a configuration file to the plurality of network nodes, wherein the configuration file comprises the information of the first tunnel, the encryption information of the plurality of network nodes and the network state information of the plurality of nodes; a second network node of the plurality of network nodes sets a service level obligation (service level obligation, SLO) for the application; the second network node sets priority for the SLO; when the tunnel satisfies the SLO of the application, the second network node uses the first tunnel to transmit a data packet of the application; when the tunnel does not satisfy the SLO of the application, the second network node identifies a second tunnel for the data packet.

In a first implementation manner of the method of the sixth aspect, the first tunnel comprises an internet-based tunnel, and the second tunnel comprises a multiprotocol label switching (MPLS) tunnel.

In a second implementation of the method of the sixth aspect, each of the network status reports comprises bandwidth available on one of the plurality of network nodes, delay of one of the plurality of network nodes and packet drop rate of the one of the plurality of network nodes.

In a third implementation manner of the method of the sixth aspect, the method further includes: the second network node identifies a third network node in a down state; the second network node reports the shutdown state of the third network node to the first network node.

In a fourth implementation manner of the method of the sixth aspect, the method further includes: in response to the shutdown state of the third network node, the first network node transmits an updated configuration file to the plurality of network nodes.

In a fifth implementation manner of the method of the sixth aspect, the method further includes: the second network node creating a first virtual queue associated with a first quality of service (quality of service, qoS) and a first core of a processor of the second network node; the second network node creating a second virtual queue associated with a second core of the processor and the first QoS; the second network node writing a first data packet associated with the first QoS to the first virtual queue through the first core at a first time; the second network node writes a second data packet associated with the first QoS to the second virtual queue through the second core at substantially the first time.

Drawings

For a more complete understanding of the present invention, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.

FIG. 1 is a schematic diagram of an embodiment of a low latency SD-WAN architecture;

FIG. 2 is a schematic diagram of an embodiment of a controller and proxy for a low latency SD-WAN;

FIG. 3 is a schematic diagram of an embodiment of a traffic flow control technique;

FIG. 4 is a schematic diagram of an embodiment of a single core rate limiting engine;

FIG. 5 is a schematic diagram of an embodiment of a multi-core rate limiting engine;

FIG. 6 is a schematic diagram of an embodiment of a multi-core lockless rate limiting engine;

FIG. 7 is a schematic diagram of an embodiment of a lock-free rate limiting method;

FIG. 8 is a schematic diagram of an electronic device provided in accordance with an embodiment of the present invention;

fig. 9 is a schematic diagram of a lock-free rate limiting device.

Detailed Description

It should be understood at the outset that although an illustrative implementation of one or more embodiments are provided below, the disclosed systems and/or methods may be implemented using any number of techniques, whether currently known or in existence. The invention is in no way limited to the illustrative implementations, drawings, and techniques illustrated below, including the exemplary designs and implementations illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.

Described herein is a low latency SD-WAN architecture for implementing a lock-free rate limiting mechanism. SD-WANs have certain advantages over traditional hardware-based WANs. SD-WAN may be used to provide network redundancy using software-based routing. When one route fails, the other route may be determined and used for data transmission. Failure in a hardware-based WAN can result in communication loss, requiring reconfiguration or repair of the hardware. SD-WANs may be used to transfer traffic from more expensive MPLS circuits to less expensive internet-based circuits. In addition, higher priority traffic may be transported over MPLS circuitry while lower priority traffic may be transported over internet-based circuitry. Traffic priority may be determined based on the type of data carried by the traffic. The data type may be determined based on several factors, including: a source of the data, a destination of the data, an application generating or receiving the data, etc. Finally, SD-WAN provides a more secure solution, utilizing various encryption standards to secure data transmissions. However, SD-WANs may have unpredictable performance, particularly for delay-sensitive applications.

The low latency SD-WAN architecture described herein provides a high performance software defined client-server architecture for routing data packets to encrypted endpoints over an overlay network. Furthermore, a lock-free rate limiting method is described herein to provide delay sensitive data packets with higher bandwidth and shorter delay. Controllers and agents may be deployed in the network to enable low latency SD-WANs. The agent may reside on a network node of the SD-WAN and one or more controllers may be deployed to control the behavior of the agent and the SD-WAN at different locations in the network. A network node may comprise any device on a network capable of creating, receiving, or transmitting information over an SD-WAN. Each agent gathers network information such as latency and bandwidth between network devices and sends the network information to the controller. The controller generates a configuration file according to the network information and distributes the configuration file to the agents. The proxy may also act as an endpoint or router. If the agent acts as a router, the agent will decrypt the outer encrypted header of the packet and forward it to the next agent in the route.

The controller has an overall view that includes all agents in the network and gathers network status, such as bandwidth and latency, and other communication characteristics from the agents. Although this example provides a single controller, multiple controllers may be used in a network, e.g., for redundancy. The controller coordinates creation of an encrypted tunnel for communication between the controller and the agent. A tunnel is a communication channel between network nodes. The tunnel may be encrypted. The tunnel is created by the controller by identifying the ingress and egress points of the tunnel and identifying the intermediate network nodes that are used in some cases to transmit data packets over the network. The tunneled data packets may be encapsulated with an identifier to allow them to be transported over the network using the tunnel. The identifier may identify a tunnel for transmitting the data packet to the network node. The controller generates configuration files for all agents according to the network state and distributes the configuration files to the agents. Each profile includes encryption information (e.g., public key) of the peer agent and network status (e.g., latency, bandwidth, and other communication characteristics) of the peer agent.

The agent may select a route for transmitting the data packet from among a plurality of routes to satisfy a service-level object (SLO). SLO may be implemented using routes that provide quality of service (quality of service, qoS) that meets the SLO requirements. The proxy may set SLOs for different applications and different priorities for different SLOs before encryption. The agent selects a route based on the network status information received from the controller and the SLO of the data packet to be transmitted. The agent processes the data packets according to their priorities to achieve a particular QoS. Packet priority may be set based at least on its delay sensitivity level. In a SD-WAN/MPLS hybrid network, the proxy can only transport high priority packets via MPLS. The proxy may also act as a router and forward the packet to a new destination. When an agent is used as a router, the sender adds another external packet header to the packet for the router, the agent processes and strips the external header, and forwards the packet to the destination or next agent that also acts as a router. The proxy makes routing decisions locally to reduce latency (i.e., without latency due to waiting for the controller to provide routing decisions). If an agent is down or disconnected, the other agents will report to the controller, which will reconfigure the affected agent.

As described above, qoS includes quantifying the performance of communication routes or other services. QoS may be used to prioritize certain types of traffic over other types of traffic to ensure that high priority traffic is prioritized for SLO. The rate limiting is used to control the traffic rate received and transmitted by the communication device. The use of rate limiting in low latency SD-WAN architectures may ensure the transfer of delay sensitive traffic to meet SLO.

In some rate limiting approaches, performance may be limited on multi-core platforms due to locking, which may increase end-to-end latency in the SD-WAN. Locking includes locking the packet queues while the core writes to the queues. Any other core that has a packet to write to the queue must wait until the first core completes its write operation and releases the queue. The low latency SD-WAN architecture described herein proposes a lock-free rate limiting approach. The lock-free rate limiting uses virtual queues to isolate simultaneous accesses by cores of different central processing units (central processing unit, CPUs) to the same queue. The method proposes two parameters to synchronize QoS constraints between the cores and the virtual queues, the parameters being a demand rate determined by the cores and a supply rate determined by the scheduler. The rate limiting efficiency of the lock-free rate limiting method is 50% higher than the lock-in rate limiting method. The lockless rate limits may be extended with any number of cores in the multi-core system to support multiple virtual queues, each virtual queue associated with one of a set of QoS supported by the multi-core system.

The virtual queues associated with QoS classes are used to isolate simultaneous accesses by different CPU cores to the same queue. A particular QoS has virtual queues associated with multiple cores in a multi-core system. These two parameters, namely, demand rate and supply rate, are associated with each virtual queue to synchronize QoS constraints among multiple cores. The provisioning rate is one or a set of parameters that determine the actual packet transmission rate of the virtual queue. Given the supply rate parameters, the virtual queues are expected to dequeue at a particular average rate. The demand rate is a parameter or set of parameters that the core determines based on the use of the virtual queues. Each CPU core needs to have write permission for the demand rate parameter and read permission for the supply rate parameter for its virtual queue. The scheduler periodically obtains the demand rate for each virtual queue and recalculates the supply rate parameters accordingly. The scheduler is responsible for collecting the demand rate and updating the supply rate.

The lock-free rate limiting may be used in a variety of multi-core devices or systems including, but not limited to, switches, routers, firewalls, intrusion detection systems, operator networks, virtual networks, and other cloud-based environments.

Fig. 1 is a schematic diagram of an embodiment of a low-latency SD-WAN architecture 100. Architecture 100 includes location a 110, location B120, location C130, and location D140. The embodiment shown in fig. 1 includes 4 positions, and other embodiments may include any number of positions. The locations may be different geographic locations where sharing of data or other electronic communications is desired, the locations may be at the same geographic location but on separate communication networks, or a combination of different and the same geographic location. The locations communicate via one or more communication channels. The communication channels may include various forms of electronic communication between locations. Channel 112 is used for communication between location a 110 and location B120. Channel 122 is used for communication between location B120 and location C130. Channel 116 is used for communication between location a 110 and location D140. Channel 114 is used for communication between location a 110 and location C130. Channel 142 is used for communication between location D140 and location C130. In other embodiments, more or fewer communication channels may be used between locations.

The SD-WAN architecture 100 further includes a controller 150 for controlling communications in the SD-WAN. In some embodiments, more than one controller may be used to control communications in the SD-WAN. The controller 150 may communicate with the

agents

115, 125, 135, and 145 at each of the

locations

110, 120, 130, 140.

Agents

115, 125, 135 and 145 are used to communicate with controller 150 to control communications at their respective locations. The controller 150 communicates with

agents

115, 125, 135 and 145 through

secure communication channels

152, 154, 156 and 158.

Agents

115, 125, 135 and 145 may transmit various statistics about the communication channels connected to their respective locations. For example, the available bandwidth and delay of the communication channel may be transmitted to the controller 150. The controller 150 may then make a communication decision based on the received statistics.

Fig. 2 is a schematic diagram of an embodiment of a controller 210 and proxy 220 for a low latency SD-WAN. For example, controller 210 may function as controller 150 and agent 220 may function as

agents

115, 125, 135, and 145. Controller 210 includes a tunnel establishment engine 212 in communication with a tunnel establishment engine 225 at agent 220, a configuration engine 214, and a global tunnel state monitor 216 in communication with a tunnel state monitor 226 at agent 220. Agent 220 includes a service-level-objective (SLO) setup engine 221, a routing component 222, an encryption/decryption engine 223, a rate limiting engine 224, a tunnel establishment engine 225, and a tunnel state monitor 226.

Proxy 220 is used to process packets transmitted over the SD-WAN. Data packets may be received for transmission from a data source in communication with agent 220. The SLO setting engine 221 determines SLO of the packet. The SLO may include various expected values or ranges of values for transmission characteristics of the transmitted data packets. For example, the SLO may include delay, error rate, throughput, bandwidth, path availability, and/or other transmission characteristics. The SLO may be determined based on the type of data packet to be transmitted. Some applications may require shorter delays than others, for example, real-time video conferencing may require shorter delays than large file transfers. Thus, the application that created the packet may be considered in determining the SLO. Other factors may also be considered, such as the sender of the data packet, the receiver of the data packet, or other characteristics.

After determining the SLO for the packet, the routing module 222 routes the packet through the SD-WAN. The route that satisfies the required SLO may be selected. Other factors may be considered while satisfying the desired SLO. For example, the cost of transmission may be considered. If both paths can meet SLO, a lower priced path may be selected for transmission. Other factors may also be considered in selecting a route. After considering the available data and the expected SLO of the data packet, the routing module 222 identifies a tunnel to route the data packet to the destination.

The tunnel establishment engine 225 receives instructions from the tunnel establishment engine 212 at the controller 210 for establishing a tunnel through the SD-WAN. As a non-exhaustive list of examples, tunnels may be created to meet certain SLO standards, provide direct routing between locations, or implement redundancy through SD-WANs. If a tunnel fails, the tunnel establishment engine 212 may create or identify other tunnels to replace the failed tunnel. Tunnel state monitor 226 receives from global tunnel state monitor 216 a state update of the tunnel available for routing the data packet. The routing module 222 may identify tunnels for routing data packets based on the status updates. Tunnel state monitor 226 also provides global tunnel state monitor 216 with a state update for the tunnel in communication with agent 220. These updates may be provided to other agents in the SD-WAN for use in routing. One or more tunnels may be used to route packets from a source to a destination on the SD-WAN.

Once the route is selected for the data packet, the encryption/decryption engine 223 may encrypt the data packet to tunnel the data packet. Similarly, the received data packets may be decrypted by encryption/decryption engine 223. Finally, the rate limiting engine 224 controls the rate at which packets are transmitted over the various tunnels to meet SLO requirements and/or other transmission requirements of the SD-WAN. The rate limiting engine 224 may also control the transmission rate of the data packets to prevent tunnel overload in the SD-WAN.

Fig. 3 is a schematic diagram of an embodiment of a traffic flow control technique 300. Traffic flow control 300 may be implemented in an apparatus, such as controller 150 and/or

agents

115, 125, 135 and 145, for controlling packet traffic from the apparatus. Token technology may be used to control packet traffic. The traffic type or transport interface may have an associated bucket 320, which bucket 320 is used to store tokens 310. Bucket 320 and tokens 310 may be implemented in software or hardware as up or down count registers. The bucket 320 may be assigned a number of tokens 310 for transmitting the data packet 340. Tokens 310 may be assigned by a controller (not shown) or some other device to control the data rate in a network implementing traffic flow control 300. The number of tokens 310 may be periodically refreshed and/or adjusted based on factors such as factors associated with the associated transmission interface, packet processing capabilities, or other characteristics that may affect packet transmission. Periodic refresh token 310 may cause each time period to be transmitted at a desired transmission rate. For example, adding 50 tokens per second to a bucket may allow 50 bytes per second to be transmitted. The transmit queue 330 stores data packets 340 awaiting transmission j. The data packet 340 may be divided into smaller sizes for transmission by the transmitting device, e.g., the data packet may be divided into one byte segments. Each byte 360 must use the token 310 to be transmitted. When there are no more tokens 310 in the bucket 320, the transmission stops until additional tokens 310 are received. If the transmit queue 330 overflows, the packet 340 is discarded and no longer transmitted even if there are additional tokens 310 available. Discarded packets 340 may be selected individually or in combination according to various criteria, such as the earliest packet in the queue, the latest packet in the queue, the traffic type of the packet, etc. When matched with token 310, byte 360 is allowed to dequeue 350 to the transport interface. Although only a single bucket 320 and transmit queue 330 are shown, the system may include a plurality of different sized transmit queues 330 for handling a variety of different types of traffic, where each type of traffic may be associated with a single transmit queue 330 and bucket 320. A transmit queue 330 with a higher priority may receive more tokens 310 than a transmit queue 330 with a lower priority. If the transmit queues 330 do not use assigned tokens 310, these tokens 310 may be reassigned to different transmit queues 330. The controller may manage traffic flow control 300, e.g., by controller 210 or an agent under the control of the controller.

Fig. 4 is a schematic diagram of an embodiment of a single core rate limiting engine 400. The single core rate limiting engine 400 controls the transmission rate of packets using the single core 420 of the CPU. The data packets may be received by the network interface 410. The single core 420 packet receiving module 422 may communicate with the network interface 410 to receive packets. The packet classification module 424 may identify the QoS of the packet and write the packet to the corresponding QoS queue 430. Each QoS queue 430 corresponds to a single QoS. QoS may be associated with SLO. The single core rate limiting engine 400 is limited to processing the transmission of a single packet at a time, so the processing power of the single core 420 limits the throughput of the single core rate limiting engine 400. Packets are pushed from QoS queue 430 to packet transfer module 440. The data packets may be pushed based on traffic flow control, such as traffic flow control 300. The data packets are transmitted through the network interface 450. The network interface 450 may be the same as or different from the network interface 410.

FIG. 5 is a schematic diagram of an embodiment of a multi-core rate limiting engine 500. The multi-core rate limiting engine 500 uses four cores 520 of the CPU. Each core 520 includes a packet receiving module 522 and a packet classifying module 524. The data packets may be received through the network interface 510. The data packets may be received by a data packet receiving module 522 in communication with the network interface 510. The packets may be assigned to core 520 in a sequential manner, e.g., a first packet assigned to core 0, a second packet assigned to core 1, etc. The data packets may be assigned to core 520 based on the current workload of core 520, e.g., the core 520 with the lowest workload may be assigned the data packets that need to be processed. The packet classification module 524 may identify the QoS of the packet and write the packet to the corresponding QoS queue 530. Each QoS queue 530 corresponds to a single QoS. When a packet is written to QoS queue 530, qoS queue 530 is locked and only one core 520 is allowed to write to QoS queue 530. If multiple cores 520 are to write packets to the same QoS queue 530, bottlenecks may be encountered because other cores 520 must wait until the core 520 that is currently performing the write operation releases the lock on the particular QoS queue 530. Packets are pushed from QoS queue 530 to packet transfer module 540. The data packets may be pushed based on traffic flow control, such as traffic flow control 300. The data packets are transmitted through the network interface 550. The network interface 550 may be the same as or different from the network interface 510.

The lock-free rate limiting engine may overcome bottlenecks in the multi-core system that occur when queues are locked from writing. The lockless rate limiting engine may include a plurality of virtual queues stored in a memory of a device implementing the lockless rate limiting engine. In some approaches as described above, each time a core writes to a legacy queue, the queue (which may be referred to as a legacy queue in this example) will be locked. Thus, if several cores all need to write data to the legacy queue, each core will sequentially write data to the legacy queue in a serial fashion. One or more virtual queues of the lockless rate limiting engine may be associated with a single core. Each virtual queue may have a relationship with a single core. Associating virtual queues with only one core allows the core to write to the virtual queues, free from competing with other cores. Each of the plurality of virtual queues, associated with a different core, may be used to store data that would have been written to a single legacy queue. Thus, multiple cores may write data in parallel to the virtual queue, while the data would have been written to the conventional queue in a serial fashion. In some approaches, each legacy queue is associated with a particular type of data, e.g., qoS class, application, particular format, etc. Multiple virtual queues are associated with the same type of data in the lockless rate limiting engine. Thus, in a conventional queue system, if multiple cores have application-specific data, the cores will serially write the data to the conventional queue associated with the particular application. Each of the plurality of cores has a virtual queue associated with a particular application in the lockless rate limiting engine. Thus, the cores may write application-specific data to the virtual queues in parallel. Thus, the delay associated with locking a legacy queue is avoided and the overall throughput of the lockless rate limiting engine is increased relative to a legacy queue system. Each core may have a virtual queue associated with one or more types of data. Cores may be associated with different numbers of virtual queues.

The transmission rates of different types of data may be different. One type of data may have an associated committed rate. The committed rate is the transmission rate of this type of data. Multiple virtual queues may be associated with the same type of data. Thus, virtual queues associated with the same data type would share the committed rate for that data type. The controller may be operable to control the transmission rate of each virtual queue to ensure that the combined traffic from the virtual queues does not exceed the committed rate for that data type. Each core provides a demand rate for the data type to the controller. The demand rate for each type of data processed by the core may be provided to the controller. If multiple cores are all waiting in line to transmit the same type of data, the controller will read the demand rate from each core and determine the supply rate for each core. The provisioning rate refers to the rate at which a particular core may dequeue packets of a particular data type from a corresponding virtual queue. Each data type may be associated with QoS. Thus, each core has a demand rate for each virtual queue associated with the core and a supply rate assigned to each virtual queue associated with the core.

The supply rate and the demand rate may be provided as tokens for data packet transmission, as data amounts, e.g. bits or bytes, or as some other unit for measuring the data flow into and out of the virtual queue. The supply rate may be updated periodically by the controller. For example, the core is allocated a certain supply rate, e.g., 500 bits per millisecond, in a given period. The core may transfer 500 bits of data from the corresponding virtual queue in one millisecond. The controller may read the demand rate and determine a new supply rate for the virtual queue. The core accordingly transfers data from the virtual queue according to the new provisioning rate allocated to the virtual queue. Each virtual queue has a defined memory size. When the virtual queue is full, the core may not be able to write additional data to the virtual queue. When the virtual queue is full, the data in the virtual queue may be discarded or new data may be discarded and not added to the virtual queue. Whether a packet is discarded may be determined based on a number of criteria, such as the priority of the data, the source of the data, the destination of the data, the size of the data, etc.

FIG. 6 is a schematic diagram of an embodiment of a multi-core lockless rate limiting engine 600. The multi-core lockless rate limiting engine 600 processes packets using the four cores 620 of the CPU. The fifth core 630 is used to schedule packet transmissions. Each core 620 includes a packet receiving module 622 and a packet classifying module 624. The data packets may be received through the network interface 610. The data packets may be received by a data packet receiving module 622 in communication with the network interface 610. The packets may be assigned to core 620 in a sequential manner, e.g., a first packet assigned to core 0, a second packet assigned to core 1, etc. The data packets may be assigned to the cores 620 based on the current workload of the cores 620, e.g., the cores 620 with the lowest workload may be assigned data packets that need to be processed. Other methods may be used to assign data packets to core 620.

Each core 620 is associated with several virtual queues 640. Virtual queue 640 may occupy memory or some other data storage area of a network node that includes multi-core lockless rate limiting engine 600. The complete queue includes one virtual queue 640 associated with each core 620. For example, the full queue associated with QoS class 0 includes virtual queues 640 identified as QoS classes 00, 10, 20, and 30. Each core 620 writes several dedicated virtual queues 640 associated with different QoS values. Thus, the multi-core lockless rate limiting engine 600 is advantageous over the multi-core rate limiting engine 500 in that the multi-core lockless rate limiting engine 600 does not need to lock queues so that multiple cores 620 can write to the queues substantially simultaneously. Writing substantially simultaneously includes writing to two virtual queues of the same QoS class without locking the queues. Thus, the time required to lock and unlock the physical queue for writing may be considered to be substantially simultaneous.

Different cores 630 are used to schedule transmissions from virtual queues 640. Core 630 includes scheduler 632. Scheduler 632 monitors demand rate 642 and determines a supply rate 644 for each core's virtual queue 640 based on demand rate 642. Scheduler 632 may synchronize QoS constraints between cores 620 using demand rate 642 and supply rate 644 associated with each virtual class queue. The supply rate 644 determines the packet transfer rate of the one or more virtual queues 640. Based on the provisioning rate 644 parameter, the virtual queue 640 is expected to dequeue at a particular average rate. Scheduler 632 assigns a supply rate 644 to virtual queue 640, and virtual queue 640 dequeues some number of bytes or packets based on the assigned supply rate 644. Each core 620 needs to have write permission for its demand rate 642 parameter and read permission for the supply rate 644 parameter of its virtual queue 640. The scheduler 632 periodically obtains the demand rate 642 for each virtual queue 640 and recalculates the supply rate 644 parameter accordingly. If the demand rate 642 is high, the scheduler 632 may allocate additional supply rates 644. Other factors may be considered in allocating the provisioning rate 644, such as QoS of the virtual queue 640, priority of traffic in the virtual queue 640, and the like.

Packets are pushed from virtual queue 640 to packet transfer module 650. The data packets are transmitted through the network interface 660. The network interface 660 may be the same as or different from the network interface 610. The multi-core lockless rate limiting engine 600 may be implemented with any number of cores and any number of QoS classes/virtual queues. In some embodiments, scheduler 632 may be executed by one of cores 620. The multi-core lockless rate limiting engine 600 may be implemented in an SD-WAN, MPLS system, or other system having multi-core processors and multiple QoS classes for transmission.

Table 1 is algorithm 1 used by core 620 to process packets in virtual queue 640.

TABLE 1

Inputs to algorithm 1 include: the supply rate sr (i, j) received from scheduler 632 for a particular core i and data type j on that core i, the data packet pkt with length len, and the current time t. Data type j is associated with one of the virtual queues of core i. The data type j may also be associated with a QoS class. Time difference t _diff By subtracting the last processed time t from the time t _last To determine. num (num) _tokens Is the number of available tokens (e.g., provisioning tokens 644) for data type j. num (num) _tokens By combining previous num _tokens And t _diff And sr (i, j) to determineA kind of electronic device. If num is _tokens Smaller than len, the current packet will be processed by the droppacket (pkt) function. The functions of the droppacket (pkt) function include discarding the packet, queuing the packet for future transmission, segmenting the packet, or other functions for processing the packet when the provisioning token of the virtual queue is exhausted during a particular transmission period. Otherwise, the current packet will be transferred from the virtual queue by the sendpacket (pkt) function, e.g., dequeuing the packet. When data packets are transmitted, num is adjusted by subtracting len _tokens . Thus, when there are a sufficient number of tokens available, the data packets are transmitted from the virtual queue. When transmitting data packets, the number of tokens is adjusted in a decreasing manner according to the length of the data packets. The supply rate 644 assigned to the virtual queue 640 by the scheduler 632 is used to determine the token data that the virtual queue 640 can use to transmit data packets from the virtual queue 640. The supply rate 644 is calculated based on the demand rate 642 using algorithm 2 in table 2.

TABLE 2

/>

Inputs to algorithm 2 include: a plurality of CPU cores m, a plurality of QoS classes n, demand rates for specific cores and virtual queues dr (i, j), and committed rate limits for QoS class cr (j). The committed rate cr (j) of the QoS class may be received from the controller 210 in the SD-WAN. The committed rate cr (j) is the transmission rate associated with the QoS class. The committed rate cr (j) is a target transmission rate in order to meet the transmission rate associated with the QoS class. The committed rate cr (j) may be allocated by a controller internal to the multi-core lockless rate limiting engine 600, such as the scheduler 632, or by a controller external to the multi-core lockless rate limiting engine 600, such as an internet service provider (internet service provider, ISP). The committed rate cr (j) may be allocated based on the QoS class used to transmit the packet. Each QoS class is associated with a virtual queue on each core. The demand rate dr (i, j) is determined by each core based on the number of data packets associated with a particular QoS. The core provides the demand rate dr (i, j) to the scheduler 632 for calculating the source rate sr. For each QoS class j, the demand rate dr for each core is summed to determine the total demand rate sum for QoS class j. The source rate sr for each core and virtual queue combination is determined by: the demand rate of the core and virtual queue combination is multiplied by the committed rate cr (j) of QoS associated with the virtual queue, and the product is divided by the total demand rate sum. As used herein, a core and virtual queue combination refers to one virtual queue on one of the cores. Using the above algorithm, the committed rate of a particular QoS for traffic can be maintained across multiple virtual queues. Virtual queues associated with a single QoS may be written by their respective cores without locking the entire queue of QoS.

Fig. 7 is a flow chart of an embodiment of a lock-free rate limiting method 700. The method 700 begins at step 710 when the network device creates a first queue associated with a first core of the CPU and a first QoS class of traffic handled by the network device. The method 700 continues at step 720 when the network device creates a second queue associated with a second core of the CPU and a first QoS class of traffic. In step 730, the first CPU writes the data packet to the first virtual queue. At substantially the same time, the second CPU writes the data packet to the second virtual queue at step 740. Thus, multiple cores may simultaneously write packets associated with the same QoS class to the QoS virtual queue. Previously, qoS had only one queue that would be locked by the first core to write data, and the second core had to wait until the first core completed the write operation. Thus, lock-free rate limiting achieves higher packet throughput, thereby supporting a higher maximum limiting rate.

Fig. 8 is a schematic diagram of an electronic device 800 provided in accordance with an embodiment of the invention. The electronic device 800 is suitable for implementing the disclosed embodiments described herein. The electronic device 800 includes: an ingress port 810 and a receiver unit (Rx) 820 for receiving data, a processor, logic unit or central processing unit (central processing unit, CPU) 830 for processing data, a transmitter unit (Tx) 840 and an egress port 850 for transmitting data, and a memory 860 for storing data. The electronic device 800 may also include an optical-to-electrical (OE) component and an electro-optical (EO) component coupled to the inlet port 810, the receiver unit 820, the transmitter unit 840, and the outlet port 850 for the outlet or inlet of the optical or electrical signals.

Processor 830 is implemented in hardware and software. Processor 830 may be implemented as one or more CPU chips, one or more cores (e.g., implemented as a multi-core processor), one or more field-programmable gate arrays (field-programmable gate array, FPGAs), one or more application specific integrated circuits (application specific integrated circuit, ASICs), and one or more digital signal processors (digital signal processor, DSPs). Processor 830 is in communication with an ingress port 810, a receiver unit 820, a transmitter unit 840, an egress port 850, and a memory 860. Memory 860 includes one or more disks, tape drives, and solid state drives, and may be used as an overflow data storage device to store programs as they are selected for execution, as well as to store instructions and data that are read during program execution. The memory 860 may be volatile and/or nonvolatile, and may be read-only memory (ROM), random access memory (random access memory, RAM), ternary content-addressable memory (TCAM), and/or static random-access memory (SRAM).

Fig. 9 is a schematic diagram of a lock-free rate limiting device 900, such as a network node in an SD-WAN. The lock-free rate limiting device 900 includes a memory device 920, such as memory 860, and a processing device 910, such as a processor 930.

While the invention has been provided with several embodiments, it should be understood that the disclosed systems and methods may be embodied in other various specific forms without departing from the spirit or scope of the invention. The present examples are to be considered as illustrative and not restrictive, and the intention is not to be limited to the details given herein. For example, various elements or components may be combined or integrated in another system, or some features may be omitted or not implemented.

In addition, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, modules, techniques, or methods without departing from the scope of the present invention. Other items shown or described as coupled or directly coupled or communicating with each other may be indirectly coupled or communicating through some interface, device, or intermediate component, whether electrically, mechanically, or otherwise. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and could be made without departing from the spirit and scope disclosed herein.

Claims

1. A network node, comprising: a memory; a processor coupled to the memory, wherein the processor includes a first core and a second core, the processor to receive instructions from the memory, the instructions when executed by the processor cause the network node to:

creating a first virtual queue associated with the first core and a first quality of service (quality of service, qoS);

creating a second virtual queue associated with the second core and the first QoS;

writing a first data packet associated with the first QoS to the first virtual queue through the first core at a first time;

a second data packet associated with the first QoS is written to the second virtual queue through the second core substantially at the first time.

2. The network node of claim 1, wherein the instructions further cause the network node to determine the serving rates of the first virtual queue and the second virtual queue based on the first QoS and one or more of a plurality of data packets associated with the first QoS or an amount of data associated with the first QoS.

3. The network node of claim 2, wherein the instructions further cause the network node to transmit data associated with the first QoS according to the provisioning rate.

4. The network node of claim 1, wherein the instructions further cause the network node to:

the first core determining a first demand rate of the first virtual queue;

the second core determines a second demand rate for the second virtual queue.

5. The network node of claim 4, wherein the first demand rate is based on a first number of packets for the first QoS of the first core and the second demand rate is based on a second number of packets for the first QoS of the second core.

6. The network node of claim 4 or 5, wherein the processor further comprises a scheduler, the instructions further causing the network node to:

the scheduler determining a total demand rate comprising the first demand rate and the second demand rate;

the scheduler determining a product of the first required rate and a rate limit allocated to the first QoS;

The scheduler determining a supply rate of the first virtual queue as a quotient of the product and the total demand rate;

the supply rate is transmitted to the first core.

7. The network node of claim 6, wherein the first demand rate comprises write rights of the first core and the supply rate comprises read rights of the first core.

8. The network node of claim 6 or 7, wherein the instructions further cause the network node to:

the first core determining a third number of tokens according to the provisioning rate;

when the third number is greater than a threshold, the first core sends the first data packet to a destination.

9. The network node of any of claims 1 to 8, wherein the instructions further cause the network node to write the first data packet without locking the first virtual queue.

10. A method in a network node comprising a processor, the method comprising:

creating a first virtual queue associated with a first core and a first quality of service (quality of service, qoS) of the processor;

Creating a second virtual queue associated with a second core of the processor and the first QoS;

11. The method according to claim 10, wherein the method further comprises: the provisioning rates of the first virtual queue and the second virtual queue are determined based on the first QoS and one or more of a plurality of data packets associated with the first QoS or an amount of data associated with the first QoS.

12. The method of claim 11, further comprising transmitting data associated with the first QoS according to the provisioning rate.

13. The method according to claim 10, wherein the method further comprises:

the first core determining a first demand rate of the first virtual queue;

the second core determines a second demand rate for the second virtual queue.

14. The method of claim 13, wherein the first demand rate is based on a first number of packets for the first QoS of the first core and the second demand rate is based on a second number of packets for the first QoS of the second core.

15. The method according to claim 13 or 14, characterized in that the method further comprises:

a scheduler determining a total demand rate comprising the first demand rate and the second demand rate;

the supply rate is transmitted to the first core.

16. The method of claim 15, wherein the first demand rate comprises write rights of the first core and the supply rate comprises read rights of the first core.

17. The method according to claim 15 or 16, characterized in that the method further comprises:

18. The method of any of claims 10 to 17, wherein the writing the first data packet comprises: the first data packet is written without locking the first virtual queue.

19. A computer program product comprising instructions embodied on a computer readable medium, which when executed by a network node comprising a processor, cause the network node to:

20. The computer program product of claim 19, wherein the instructions further cause the network node to determine the serving rates of the first virtual queue and the second virtual queue based on the first QoS and one or more of a plurality of data packets associated with the first QoS or an amount of data associated with the first QoS.

21. The computer program product of claim 20, wherein the instructions further cause the network node to transmit data associated with the first QoS according to the provisioning rate.

22. The computer program product of claim 19, wherein the instructions further cause the network node to:

the first core determining a first demand rate of the first virtual queue;

the second core determines a second demand rate for the second virtual queue.

23. The computer program product of claim 22, wherein the first demand rate is based on a first number of packets for the first QoS of the first core and the second demand rate is based on a second number of packets for the first QoS of the second core.

24. The computer program product of claim 22 or 23, wherein the instructions further cause the network node to:

the supply rate is transmitted to the first core.

25. The computer program product of claim 24, wherein the first demand rate comprises write rights of the first core and the supply rate comprises read rights of the first core.

26. The computer program product of claim 24 or 25, wherein the instructions further cause the network node to:

27. The computer program product of any of claims 19 to 26, wherein the instructions further cause the network node to write the first data packet without locking the first virtual queue.

28. A software defined wide area network (SD-WAN) comprising:

a first network node for: receiving network status reports and encryption information from a plurality of network nodes; generating a first tunnel through the SD-WAN according to the network status report; transmitting a configuration file to the plurality of network nodes, wherein the configuration file comprises information of the first tunnel, encryption information of the plurality of network nodes and network state information of the plurality of nodes;

A second network node of the plurality of network nodes, the second network node being configured to: setting service level obligations (service level obligation, SLO) for applications; setting a priority for the SLO; transmitting data packets of the application using the first tunnel when the tunnel satisfies the SLO of the application; and when the tunnel does not meet the SLO of the application, identifying a second tunnel for the data packet.

29. The SD-WAN of claim 28, wherein the first tunnel comprises an internet-based tunnel and the second tunnel comprises a multiprotocol label switching (MPLS) tunnel.

30. The SD-WAN of claim 28 wherein each of the network status reports includes bandwidth available on one of the plurality of network nodes, delay of one of the plurality of network nodes, and packet drop rate of one of the plurality of network nodes.

31. The SD-WAN according to any one of claims 28 to 30, wherein the second network node is further adapted to:

identifying a third network node in a down state;

Reporting the shutdown state of the third network node to the first network node.

32. The SD-WAN of claim 31 wherein the first network node is further configured to transmit an updated configuration file to the plurality of network nodes in response to a shutdown state of the third network node.

33. The SD-WAN according to any one of claims 28 to 32, wherein the second network node is further adapted to:

creating a first virtual queue associated with a first quality of service (quality of service, qoS) and a first core of a processor of the second network node;

34. A method in a software defined wide area network (SD-WAN), the method comprising:

the first network node receives network status reports and encryption information from a plurality of network nodes;

The first network node generates a first tunnel through the SD-WAN according to the network status report;

the first network node transmits configuration files to the plurality of network nodes, wherein the configuration files comprise information of the first tunnel, encryption information of the plurality of network nodes and network state information of the plurality of nodes;

a second network node of the plurality of network nodes sets a service level obligation (service level obligation, SLO) for an application;

the second network node sets priority for the SLO;

when the tunnel meets the SLO of the application, the second network node uses the first tunnel to transmit data packets of the application;

when the tunnel does not satisfy the SLO of the application, the second network node identifies a second tunnel for the data packet.

35. The method of claim 34, wherein the first tunnel comprises an internet-based tunnel and the second tunnel comprises a multiprotocol label switching (MPLS) tunnel.

36. The method of claim 34, wherein each of the network status reports comprises bandwidth available on one of the plurality of network nodes, delay of one of the plurality of network nodes, and packet drop rate of one of the plurality of network nodes.

37. The method according to any one of claims 34 to 36, further comprising:

the second network node identifies a third network node in a shutdown state;

and the second network node reports the shutdown state of the third network node to the first network node.

38. The method of claim 37, wherein the method further comprises: the first network node transmits an updated configuration file to the plurality of network nodes in response to the shutdown state of the third network node.

39. The method according to any one of claims 34 to 38, further comprising:

the second network node creating a first virtual queue associated with a first quality of service (quality of service, qoS) and a first core of a processor of the second network node;

the second network node creating a second virtual queue associated with a second core of the processor and the first QoS;

the second network node writing a first data packet associated with the first QoS to the first virtual queue through the first core at a first time;

the second network node writes a second data packet associated with the first QoS to the second virtual queue through the second core substantially at the first time.