WO2022139808A1

WO2022139808A1 - Low-latency software defined wide area network architecture

Info

Publication number: WO2022139808A1
Application number: PCT/US2020/066589
Authority: WO
Inventors: Yan Sun; Xilei WANG
Original assignee: FutureWei Technologies Inc
Current assignee: FutureWei Technologies Inc
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2022-06-30
Anticipated expiration: 2023-06-22
Also published as: CN116438787A

Abstract

A network node includes a memory; and a processor coupled to the memory. The processor includes a first core and a second core, the processor configured to receive instructions from the memory which, when executed by the processor, cause the network node to create a first virtual queue associated with the first core and a first quality of service (QoS); create a second virtual queue associated with the second core and the first QoS; write, at a first time, a first packet associated with the first QoS to the first virtual queue by the first core; and write, at substantially the first time, a second packet associated with the first QoS to the second virtual queue by the second core.

Description

Low Latency Software Defined Wide Area Network Architecture

TECHNICAL FIELD

[0001] This disclosure is related to the technical field of software defined wide area networks, in particular low latency software defined wide area network architecture.

BACKGROUND

[0002] Businesses and other entities often communicate between geographically separate locations via a telecommunications network. Some networks are a wide area network (WAN) to allow computers at the geographically separate networks to communicate. Some WANs operate over leased telecommunication circuits dedicated to the WAN. Data transmitted over these leased circuits can use multi-protocol label switching (MPLS). MPLS provides a defined route through the telecommunication circuits. Some data may be transmitted via the Internet. Data transmitted via the Internet does not typically follow a defined route. Transmitting data via the Internet is typically cheaper than transmitting data via an MPLS circuit. Further, transmitting data via the Internet is typically slower and less reliable than transmitting data via an MPLS circuit. A software defined WAN (SD-WAN) provides a cheaper alternative to traditional WANs. SD-WANs typically rely on the Internet and may therefore offer less predictable performance than a leased telecommunication circuit using MPLS.

SUMMARY

[0003] A first aspect relates to network node comprising a memory; and a processor coupled to the memory, the processor comprising a first core and a second core, the processor configured to receive instructions from the memory which, when executed by the processor, cause the network node to create a first virtual queue associated with the first core and a first quality of service (QoS); create a second virtual queue associated with the second core and the first QoS; write, at a first time, a first packet associated with the first QoS to the first virtual queue by the first core; and write, at substantially the first time, a second packet associated with the first QoS to the second virtual queue by the second core.

[0004] By writing to virtual queues of the same QoS at substantially the same time, delay associated with locking a queue for writing is reduced. [0005] In a first implementation form of the network node according to the first aspect as such, the instructions further cause the network node to determine supply rates for the first virtual queue and the second virtual queue based on the first QoS and one or more of a number of packets associated with the first QoS or a volume of data associated with the first QoS.

[0006] In a second implementation form of the network node according to the first aspect as such, the instructions further cause the network node to transmit data associated with the first QoS according to the supply rates

[0007] In a third implementation form of the network node according to the first aspect as such, the instructions further cause the network node to determine, by the first core, a first demand rate of the first virtual queue; and determine, by the second core, a second demand rate of the second virtual queue.

[0008] In a fourth implementation form of the network node according to the first aspect as such, the first demand rate is based upon a first quantity of packets of the first QoS at the first core, and wherein the second demand rate is based upon a second quantity of packets of the first QoS at the second core.

[0009] In a fifth implementation form of the network node according to the first aspect as such, the processor further comprises a scheduler, and wherein the instructions further cause the network node to determine, by the scheduler, a total demand rate comprising the first demand rate and the second demand rate; determine, by the scheduler, a product of the first demand rate and a rate limit assigned to the first QoS; determine, by the scheduler, a supply rate for the first virtual queue as a quotient of the product and the total demand rate; and transmit the supply rate to the first core.

[0010] In a sixth implementation form of the network node according to the first aspect as such, the first demand rate comprises a write permission for the first core, and wherein the supply rate comprises a read permission for the first core.

[0011] In a seventh implementation form of the network node according to the first aspect as such, the instructions further cause the network node to determine, by the first core, a third quantity of tokens based upon the supply rate; and send, by the first core, the first packet to a destination when the third quantity is above a threshold.

[0012] In an eighth implementation form of the network node according to the first aspect as such, the instructions further cause the network node to write the first packet without locking the first virtual queue. [0013] A second aspect relates to a method in a network node comprising a processor, the method comprising creating a first virtual queue associated with a first core of the processor and a first quality of service (QoS); creating a second virtual queue associated with a second core of the processor and the first QoS; writing, at a first time, a first packet associated with the first QoS to the first virtual queue by the first core; and writing, at substantially the first time, a second packet associated with the first QoS to the second virtual queue by the second core.

[0014] The method provides techniques that improve throughput of a queue by writing to virtual queues of the same QoS at substantially the same time, delay associated with locking a queue is avoided.

[0015] In a first implementation form of the method according to the second aspect as such, the method further comprises determining supply rates for the first virtual queue and the second virtual queue based on the first QoS and one or more of a number of packets associated with the first QoS or a volume of data associated with the first QoS.

[0016] In a second implementation form of the method according to the second aspect as such, the method further comprises transmitting data associated with the first QoS according to the supply rates.

[0017] In a third implementation form of the method according to the second aspect as such, the method further comprises determining, by the first core, a first demand rate of the first virtual queue; and determining, by the second core, a second demand rate of the second virtual queue.

[0018] In a fourth implementation form of the method according to the second aspect as such, the first demand rate is based upon a first quantity of packets of the first QoS at the first core, and the second demand rate is based upon a second quantity of packets of the first QoS at the second core.

[0019] In a fifth implementation form of the method according to the second aspect as such, the method further comprises determining, by a scheduler, a total demand rate comprising the first demand rate and the second demand rate; determining, by the scheduler, a product of the first demand rate and a rate limit assigned to the first QoS; determining, by the scheduler, a supply rate for the first virtual queue as a quotient of the product and the total demand rate; and transmitting the supply rate to the first core. In a sixth implementation form of the method according to the second aspect as such, the first demand rate comprises a write permission for the first core, and wherein the supply rate comprises a read permission for the first core.

[0020] In a seventh implementation form of the method according to the second aspect as such, the method further comprises determining, by the first core, a third quantity of tokens based upon the supply rate; and sending, by the first core, the first packet to a destination when the third quantity is above a threshold.

[0021] In an eighth implementation form of the method according to the second aspect as such, the writing the first packet comprises writing the first packet without locking the first virtual queue. [0022] A third aspect relates to a computer program product comprising instructions embodied on a computer readable medium which, when executed by a network node comprising a processor, cause the network node to create a first virtual queue associated with a first core of the processor and a first quality of service (QoS); create a second virtual queue associated with a second core of the processor and the first QoS; write, at a first time, a first packet associated with the first QoS to the first virtual queue by the first core; and write, at substantially the first time, a second packet associated with the first QoS to the second virtual queue by the second core.

[0023] The computer program product includes computer instructions to reduce latency in a queue by writing to virtual queues of the same QoS at substantially the same time, delay associated with locking a queue is avoided.

[0024] In a first implementation form of the computer program product according to the third aspect as such, the instructions further cause the network node to determine supply rates for the first virtual queue and the second virtual queue based on the first QoS and one or more of a number of packets associated with the first QoS or a volume of data associated with the first QoS.

[0025] In a second implementation form of the computer program product according to the third aspect as such, the instructions further cause the network node to transmit data associated with the first QoS according to the supply rates.

[0026] In a third implementation form of the computer program product according to the third aspect as such, the instructions further cause the network node to determine, by the first core, a first demand rate of the first virtual queue; and determine, by the second core, a second demand rate of the second virtual queue. [0027] In a fourth implementation form of the computer program product according to the third aspect as such, the first demand rate is based upon a first quantity of packets of the first QoS at the first core, and wherein the second demand rate is based upon a second quantity of packets of the first QoS at the second core.

[0028] In a fifth implementation form of the computer program product according to the third aspect as such, the instructions further cause the network node to determine, by a scheduler, a total demand rate comprising the first demand rate and the second demand rate; determine, by the scheduler, a product of the first demand rate and a rate limit assigned to the first QoS; determine, by the scheduler, a supply rate for the first virtual queue as a quotient of the product and the total demand rate; and transmit the supply rate to the first core.

[0029] In a sixth implementation form of the computer program product according to the third aspect as such, the first demand rate comprises a write permission for the first core, and wherein the supply rate comprises a read permission for the first core.

[0030] In a seventh implementation form of the computer program product according to the third aspect as such, the instructions further cause the network node to determine, by the first core, a third quantity of tokens based upon the supply rate; and send, by the first core, the first packet to a destination when the third quantity is above a threshold.

[0031] In an eighth implementation form of the computer program product according to the third aspect as such, the instructions further cause the network node to write the first packet without locking the first virtual queue.

[0032] A fourth aspect relates to a software-defined wide area network (SD-WAN) comprising a first network node configured to receive, from a plurality of network nodes, network status reports and encryption information; generate a first tunnel through the SD-WAN based on the network status reports; and transmit, to the plurality of network nodes, configuration files comprising information of the first tunnel, encryption information of the plurality of network nodes, and network status information of the plurality of nodes; and a second network node of the plurality of network nodes, the second network node configured to set a service level obligation (SLO) for an application; set a priority for the SLO; transmit a packet of the application using the first tunnel when the SLO of the application is met by the tunnel; and identify a second tunnel for the packet when the SLO of the application is not met by the tunnel. [0033] In a first implementation form of the SD-WAN according to the fourth aspect as such, the first tunnel comprises an internet-based tunnel, and wherein the second tunnel comprises a multiprotocol label switching (MPLS) tunnel.

[0034] In a second implementation form of the SD-WAN according to the fourth aspect as such, each of the network status reports comprises bandwidth available at one of the plurality of network nodes, latency of the one of the plurality of network nodes, and packet drop rate of the one of the plurality of network nodes.

[0035] In a third implementation form of the SD-WAN according to the fourth aspect as such, the second network node is further configured to identify a third network node in a down state; and report the down state of the third network node to the first network node.

[0036] In a fourth implementation form of the SD-WAN according to the fourth aspect as such, the first network node is further configured to transmit updated configuration files to the plurality of network nodes in response to the down state of the third network node.

[0037] In a fifth implementation form of the SD-WAN according to the fourth aspect as such, the second network node is further configured to create a first virtual queue associated with a first quality of service (QoS) and a first core of a processor of the second network node; create a second virtual queue associated with a second core of the processor and the first QoS; write, at a first time, a first packet associated with the first QoS to the first virtual queue by the first core; and write, at substantially the first time, a second packet associated with the first QoS to the second virtual queue by the second core.

[0038] A sixth aspect relates to a method in a software-defined wide area network (SD-WAN), the method comprising receiving, by a first network node from a plurality of network nodes, network status reports and encryption information; generating, by the first network node, a first tunnel through the SD-WAN based on the network status reports; transmitting, by the first network node to the plurality of network nodes, configuration files comprising information of the first tunnel, encryption information of the plurality of network nodes, and network status information of the plurality of nodes; setting, by a second network node of the plurality of network nodes, a service level obligation (SLO) for an application; setting, by the second network node, a priority for the SLO; transmitting, by the second network node, a packet of the application using the first tunnel when the SLO of the application is met by the tunnel; and identifying, by the second network node, a second tunnel for the packet when the SLO of the application is not met by the tunnel. [0039] In a first implementation form of the method according to the sixth aspect as such, the first tunnel comprises an internet-based tunnel, and wherein the second tunnel comprises a multiprotocol label switching (MPLS) tunnel.

[0040] In a second implementation form of the method according to the sixth aspect as such, each of the network status reports comprises bandwidth available at one of the plurality of network nodes, latency of the one of the plurality of network nodes, and packet drop rate of the one of the plurality of network nodes.

[0041] In a third implementation form of the method according to the sixth aspect as such, the method further comprises identifying, by the second network node, a third network node in a down state; and reporting, by the second network node, the down state of the third network node to the first network node.

[0042] In a fourth implementation form of the method according to the sixth aspect as such, the method further comprises transmitting, by the first network node, updated configuration files to the plurality of network nodes in response to the down state of the third network node.

[0043] In a fifth implementation form of the method according to the sixth aspect as such, the method further comprises creating, by the second network node, a first virtual queue associated with a first quality of service (QoS) and a first core of a processor of the second network node; creating, by the second network node, a second virtual queue associated with a second core of the processor and the first QoS; writing, by the second network node at a first time, a first packet associated with the first QoS to the first virtual queue by the first core; and writing, by the second network node at substantially the first time, a second packet associated with the first QoS to the second virtual queue by the second core.

BRIEF DESCRIPTION OF THE DRAWINGS

[0044] For a more complete understanding of this disclosure, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.

[0045] FIG. 1 is a diagram of an embodiment of a low-latency SD-WAN architecture.

[0046] FIG. 2 is a diagram of an embodiment of a controller and agent for a low-latency SD-

WAN.

[0047] FIG. 3 is a diagram of an embodiment of a traffic flow control technique.

[0048] FIG. 4 is a diagram of an embodiment of a single-core rate limiting engine. [0049] FIG. 5 is a diagram of an embodiment of a multi-core rate limiting engine.

[0050] FIG. 6 is a diagram of an embodiment of a multi-core lock free rate limiting engine.

[0051] FIG. 7 is a diagram of an embodiment of a method for lock-free rate limiting.

[0052] FIG. 8 is a schematic diagram of an electronic device according to an embodiment of the disclosure.

[0053] FIG. 9 is a diagram of a means for lock-free rate limiting.

DETAILED DESCRIPTION

[0054] It should be understood at the outset that, although an illustrative implementation of one or more embodiments are provided below, the disclosed systems and/or methods may be implemented using any number of techniques, whether currently known or in existence. The disclosure should in no way be limited to the illustrative implementations, drawings, and techniques illustrated below, including the exemplary designs and implementations illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.

[0055] Described herein is a low-latency SD-WAN architecture implementing a lock-free rate limiting mechanism. An SD-WAN provides certain benefits relative to traditional hardware based WANs. An SD-WAN may be configured to provide network redundancy, using software based routes, when one route fails, another route may be determined and used for data transmission. A failure in a hardware based WAN results in communication loss requiring hardware reconfiguration or repair. An SD-WAN may be used to offload traffic from more expensive MPLS circuits to less expensive internet based circuits. Further, higher priority traffic may be prioritized for MPLS circuits while lower priority traffic may be transmitted using internet based circuits. Traffic priority may be determined based upon the type of data carried by the traffic. The type of data may be based on several factors including, source of the data, destination of the data, an application generating or receiving the data, etc. Finally, SD-WAN provides more secure solutions leveraging various encryption standards to secure data transmissions. However, SD-WANs may have unpredictable performance, especially for latency sensitive applications.

[0056] The low-latency SD-WAN architecture described herein provides a high-performance software-defined client-server architecture for routing packets to encrypted endpoints via an overlay network. Further, a lock-free rate-limiting approach is described to provide latency sensitive packets with higher bandwidth and shorter latency. To enable the low-latency SD-WAN, a controller and agents may be deployed in the network. Agents may reside at network nodes of the SD-WAN and one or more controllers may be deployed to control the agents and SD-WAN behavior at various locations in the network. Network nodes may include any device on the network that is capable of creating, receiving, or transmitting information through the SD-WAN. Each agent will collect network information such as latency and bandwidth between network equipment and send the network information to the controller. The controller will generate configuration files based on the network information and distribute the configuration files to the agents. The agent can also be used as an endpoint or a router. If an agent is used as a router, the agent will decrypt the outer layer encrypted header of a packet and forward to the next agent in the route.

[0057] The controller has an overall view of all agents in the network and gathers network status such as bandwidth and latency and other communication characteristics from the agents. While the example provides for a single controller, multiple controllers may be used in a network, for example for redundancy. The controller coordinates creation of encrypted tunnels for communication between the controller and agents. A tunnel is a communication channel between network nodes. Tunnels may be encrypted. A tunnel is created by a controller by identifying ingress and egress points of the tunnel and in some cases intermediate network nodes used to transmit packets through a network. Packets traversing the tunnel may be encapsulated with identifiers to allow transmission through the network using he tunnel. The identifiers may identify to network nodes a tunnel to transmit the packet over. The controller will generate configuration files for all agents based on network status, and distribute configuration files to the agents. Each configuration file includes peer agents’ encryption information such as public key and peer agents' network status such as latency, bandwidth and other communication characteristics.

[0058] The agent can select from multiple routes for transmitting packets to meet service-level objectives (SLOs). An SLO may be achieved by using routes that provide a quality of service (QoS) meeting the needs of the SLO. Agents may set SLOs for different applications before encryption, and set different priority for different SLOs. The agent selects a route based on the network status information received from the controller and the SLO of the to-be transmitted packet. Agents process packets based on their priorities to achieve a particular QoS. Packet priorities may be set based at least on their latency sensitivity levels. In a hybrid SD-WAN/MPLS network, the agent may transmit only high priority packets via the MPLS. Agents can also be used as a router and forward packets to a new destination. When an agent is used as a router, the sender will add another outer packet header for this router, and the agent will process and strip off the outer header and forward the packet to the destination or to next agent which also works as a router. Routing decisions are made locally by the agent to reduce latency (i.e., there is no delay waiting for the controller to provide routing decisions). If an agent goes down or otherwise loses connectivity, other agents will report to the controller and the controller will reconfigure affected agents.

[0059] As described above, QoS includes quantifying the performance of a communication route or other service. QoS may be used to prioritize certain types of traffic over others in order to ensure higher priority traffic is communicated to meet a SLO. Rate limiting is used to control the rate of traffic in and out of a communication device. Rate limiting is used in the low-latency SD- WAN architecture to ensure latency sensitive traffic is transmitted to meet SLOs.

[0060] In some rate limiting approaches, performance is restricted on a multi-core platform because of locking, which will increase end-to-end latency in SD-WAN. Locking includes locking a packet queue when a core writes to the queue. Any other core that has packets to write to the queue must wait until the first core finishes its write operation and releases the queue. The low-latency SD-WAN architecture described herein proposes a lock-free rate limiting approach. The lock-free rate limiting uses virtual queues to isolate simultaneous access to the same queue by different central processing unit (CPU) cores. Two parameters are proposed to synchronize the QoS constraints among multiple cores and multiple virtual queues; the parameters are a demand rate determined by the core and a supply rate determined by a scheduler. The lock-free rate limiting may experience a 50% higher limiting rate than locking rate limiting approaches. The lock-free rate limiting is scalable over any number of cores in a multi-core system to support multiple virtual queues each associated with one of a group of QoS supported by the multi-core system.

[0061] Virtual queues associated with a QoS class are used to isolate simultaneous access to the same queue by different CPU cores. A particular QoS has a virtual queue associated with multiple cores in a multi-core system. Two parameters, demand rate and supply rate, are associated with each virtual queue to synchronize the QoS constraints among multi cores. Supply rate is one or a set of parameters which determines the actual packet transmission rate of a virtual queue. Under given supply rate parameters, the virtual queue is expected to dequeue at a specific average rate. Demand rate is one or a set of parameters determined by a core based upon usage of a virtual queue. Each CPU core needs write permission to the demand rate parameters and read permission to the supply rate parameters of its own virtual queues. A scheduler, which periodically fetches the demand rate of each virtual queue and recalculates the supply rate parameters correspondingly, takes charge of demand rate collection and supply rate update.

[0062] The lock-free rate limiting can be used in a variety of multi-core apparatuses or systems including, but not limited to, switches, routers, firewalls, intrusion detection systems, carrier networks, virtualized networks and other cloud based environments.

[0063] FIG. 1 is a diagram of an embodiment of a low-latency SD-WAN architecture 100. The architecture 100 includes location A 110, location B 120, location C 130, and location D 140. The embodiment of FIG. 1 includes 4 locations, other embodiments may include any number of locations. The locations may be geographically separate locations that have a need to share data or other electronic communications, the locations may be co-located in a same geographic location but on separate communication networks, or a combination of geographically separate and co-located locations. The locations communicate via one or more communication channels. The communication channels may include various forms of electronic communication between the locations. Channel 112 is used for communication between location A 110 and location B 120. Channel 122 is used for communication between location B 120 and location C 130. Channel 116 is used for communication between location A 110 and location D 140. Channel 114 is used for communication between location A 110 and location C 130. Channel 142 is used for communication between location D 140 and location C 130. In other embodiments, more or less communication channels may be used for communication between locations.

[0064] The SD-WAN architecture 100 also includes a controller 150 to control communications in the SD-WAN. In some embodiments, more than one controller may be used to control communications in the SD-WAN. Controller 150 may be in communication with agents 115, 125, 135, and 145 at each of the locations 110, 120, 130, 140. The agents 115, 125, 135, and 145 are configured to communicate with the controller 150 to control communications at their respective location. The controller 150 communicates with agents 115, 125, 135, and 145 via secured communication channels 152, 154, 156, and 158. The agents 115, 125, 135, and 145 may communicate various statistics related to the communications channels connected to their respective location. For example, available bandwidth and latency of the communications channel may be transmitted to the controller 150. The controller 150 may then make communications decisions based upon the received statistics. [0065] FIG. 2 is a diagram of an embodiment of a controller 210 and agent 220 for an SD-WAN. For example, controller 210 may function as controller 150 and agent 220 may function as agents 115, 125, 135, and 145. Controller 210 includes tunnel setup engine 212 in communication with tunnel setup engine 225 at agent 220, a configuration engine 214, and a global tunnel status monitor 216 in communication with a tunnel status monitor 226 at agent 220. Agent 220 includes a servicelevel objective (SLO) setting engine 221, a route selection component 222, an encryption/ decryption engine 223, a rate limiting engine 224, the tunnel setup engine 225, and the tunnel status monitor 226.

[0066] Agent 220 is configured to process packets for transmission over an SD-WAN. A packet may be received for transmission from a data source in communication with the agent 220. The SLO setting engine 221 determines an SLO for the packet. An SLO may include various desired values or ranges of values for transmission characteristics used for transmitting the packet. For example, SLO may include latency, error rate, throughput, bandwidth, path availability, and/or other transmission characteristics. The SLO may be determined based upon the type of packet to be transmitted. Certain applications may require less latency than others, for example video conferencing in real-time may require less latency than a large file transfer. Thus, the application that created that packet may be taken into consideration when determining an SLO. Other factors may be considered, for example, the sender of the packet, the receiver of the packet, or other characteristics.

[0067] After the SLO is determined for the packet, the route selection module 222 selects a route for the packet through the SD-WAN. The route may be selected to meet the desired SLO. Additional factors may be considered in combination with meeting the desired SLO. For example, the cost of the transmission may be taken into consideration. If two paths can meet the SLO, the cheaper alternative may be selected for transmission. Other factors may be considered during route selection. After considering available data and the desired SLO of the packet, the route selection module 222 identifies a tunnel to route the packet through to the destination.

[0068] Tunnel setup engine 225 receives instructions from tunnel setup engine 212 at the controller 210 for setting up tunnels through the SD-WAN. As a non-exhaustive list of examples, the tunnels may be created to meet certain SLO criteria, to provide direct routes between locations, or for redundancy through the SD-WAN. If a tunnel fails, the tunnel setup engine 212 may create or identify other tunnels to replace the failed tunnel. Tunnel status monitor 226 receives status updates from the global tunnel status monitor 216 of tunnels available for routing packets. These status updates may be used by the route selection module 222 in identifying a tunnel for routing the packet. Tunnel status monitor 226 also provides status updates to the global tunnel status monitor 216 of tunnels in communication with agent 220. These updates may be provided to other agents in the SD-WAN for use in route selection. One or more tunnels may be used to route a packet from source to destination in the SD-WAN.

[0069] Once a route for the packet has been selected, the encryption/decryption engine 223 may encrypt the packet for transmission through a tunnel. Similarly, received packets may be decrypted by the encryption/decryption engine 223. Finally, the rate limiting engine 224 controls the rate at which packets are transited over various tunnels to meet SLO requirements and/or other transmission requirements of the SD-WAN. The rate limiting engine 224 may also control packet transmission rate to prevent overloading of tunnels in the SD-WAN.

[0070] FIG. 3 is a diagram of an embodiment of a traffic flow control 300. Traffic flow control 300 may be implemented in apparatuses, e.g. controller 150 and/or agents 115, 125, 135, and 145, for controlling the flow of packet traffic from the apparatus. Packet traffic may be controlled using a token technique. A traffic type or transmission interface may have an associated bucket 320 for storing tokens 310. The bucket 320 and tokens 310 may be implemented in software or hardware as a count-up or count-down register. The bucket 320 may be assigned a quantity of tokens 310 used for transmitting packets 340. Tokens 310 may be assigned by a controller (not pictured) or some other device to control data rates in a network where the traffic flow control 300 is implemented. The quantity of tokens 310 may be periodically refreshed and or adjusted based upon factors associated with an associated transmission interface, packet processing capabilities, or other characteristics that may affect packet transmission. Periodically refreshed tokens 310 may result in a desired transmission rate per time period. For example, fifty tokens per second added to the bucket may allow for fifty bytes per second to be transmitted. Transmission queue 330 stores packets 340 that are awaiting transmission. The packets 340 may be broken up into smaller sizes for transmission by a transmitting apparatus, e.g. packets may be broken up into one byte segments. Each byte 360 must use a token 310 in order to be transmitted. When no more tokens 310 are in bucket 320, transmission is stopped until additional tokens 310 are received. If transmission queue 330 overflows, packets 340 may be dropped and not transmitted even when additional tokens 310 become available. Packets 340 that are dropped may be selected on various criteria individually or combined, for example, oldest packet in the queue, newest packet in the queue, traffic type of packet, etc. The bytes 360 are allowed to dequeue 350 to the transmission interface when paired with a token 310. While only a single bucket 320 and transmission queue 330 are shown, a system may include multiple transmission queues 330 of varying sizes for processing of various different types of traffic, where each type of traffic may be associated with a single transmission queue 330 and bucket 320. Higher priority transmission queues 330 may receive more tokens 310 than lower priority transmission queues 330. If a transmission queue 330 is not using assigned tokens 310, those tokens 310 may be reassigned to a different transmission queue 330. Management of the traffic flow control 300 may be performed by a controller, e.g., controller 210 or an agent under control of a controller.

[0071] FIG. 4 is a diagram of an embodiment of a single-core rate limiting engine 400. The single-core rate limiting engine 400 controls transmission rate of packets using a single-core 420 of a CPU. A packet may be received by a network interface 410. The single-core 420 packet reception module 422 may communicate with the network interface 410 to receive the packet. Packet classification module 424 may identify a QoS of the packet and write the packet to a corresponding QoS queue 430. Each QoS queue 430 corresponds to a single QoS. A QoS may be associated with an SLO. The single-core rate limiting engine 400 is limited to processing a single packet at a time for transmission and thus the processing capabilities of single-core 420 limit the throughput of the single-core rate limiting engine 400. Packets are pushed from the QoS queues 430 to packet transmission module 440. The packets may be pushed based on a traffic flow control, for example traffic flow control 300. The packets are transmitted using network interface 450. Network interface 450 may be the same as network interface 410 or different.

[0072] FIG. 5 is a diagram of an embodiment of a multi-core rate limiting engine 500. The multi-core rate limiting engine 500 uses four cores 520 of a CPU. Each core 520 includes a packet reception module 522 and a packet classification module 524. A packet may be received by a network interface 510. The packet may be received by a packet reception module 522 in communication with the network interface 510. Packets may be assigned to the cores 520 in a sequential manner, e.g., first packet is assigned to core 0, second packet assigned to core 1, and so forth. Packets may be assigned to the cores 520 based on a current workload of the cores 520, e.g., a core 520 with the lowest workload may be assigned a packet for processing. Packet classification module 524 may identify a QoS of the packet and write the packet to a corresponding QoS queue 530. Each QoS queue 530 corresponds to a single QoS. When a packet is written to a QoS queue 530, the QoS queue 530 is locked and only one core 520 is allowed to write to the QoS queue 530. A bottleneck may occur if multiple cores 520 have packets to be written to the same QoS queue 530 because other cores 520 have to wait until a currently writing core 520 releases the lock on the particular QoS queue 530. Packets are pushed from the QoS queues 530 to packet transmission module 540. The packets may be pushed based on a traffic flow control, for example traffic flow control 300. The packets are transmitted using network interface 550. Network interface 550 may be the same as network interface 510 or different.

[0073] A lock-free rate limiting engine may be used to overcome the bottle neck that occurs in a multi-core system when queues are locked for writing. The lock-free rate limiting engine may include a plurality of virtual queues stored in a memory of the device where the lock-free rate limiting engine is implemented. In some approaches as described above, a queue (which may be referred to in this example as a traditional queue) will be locked each time a core writes to the traditional queue. Thus, if several cores have data to write to the traditional queue, each core sequentially writes to the traditional queue in a serial fashion. One or several virtual queues of the lock-free rate limiting engine may be associated with a single core. Each virtual queue may have a relationship with a single core. Associating a virtual queue with only one core permits the core to write to the virtual queue without contending with other cores. Several virtual queues, each associated with different cores, may function to store data that would have been written to a single traditional queue. Thus, more than one core may write data in parallel to the virtual queues that would have been written serially using a traditional queue. In some approaches, each traditional queue is associated with a particular type of data, e.g., a QoS class, an application, a specific format, etc. Several virtual queues are associated with the same type of data in the lock-free rate limiting engine. Thus, in a traditional queue system, if several cores have data for a particular application, the cores would write the data serially to the traditional queue associated with the particular application. Several cores will each have a virtual queue associated with the particular application in the lock-free rate limiting engine. Thus, the cores may write data for the particular application to virtual queues in parallel. Consequently, the delay associated with locking a traditional queue may be avoided and overall throughput of the lock-free rate limiting engine is increased relative to a traditional queue system. Each core may have a virtual queue for one or more types of data. The cores may be associated with different numbers of virtual queues. [0074] Transmission rates may vary for different types of data. A type of data may have an associated committed rate. The committed rate is a transmission rate for the type of data. Several virtual queues may be associated with the same type of data. Thus, the virtual queues associated with the same type of data share the committed rate for the type of data. A controller may be configured to control the transmission rate from each of the virtual queues to ensure the combined traffic from the virtual queues does not exceed the committed rate for the type of data. Each core may provide a demand rate for the type of data to the controller. A demand rate for each type of data handled by the core may be provided to the controller. If several cores are all queuing the same type of data for transmission, the controller will read the demand rate from each of the cores and determine a supply rate for each core. The supply rate is the rate at which a particular core may dequeue data packets of a particular data type from the corresponding virtual queue. Each data type may be associated with a QoS. Thus, each core will have a demand rate for each of the virtual queues associated with the core and an assigned supply rate for each of the virtual queues associated with the core.

[0075] Supply rates and demand rates may be provided as tokens for transmission of packets, as data volumes, e.g., bits or bytes, or as some other unit to measure the flow of data to and from the virtual queues. Supply rates may be periodically updated by the controller. For example, a core is assigned a certain supply rate for a given period, e.g., 500 bits during a one millisecond period. The core may transmit 500 bits of data from the corresponding virtual queue during one millisecond. The controller may read demand rates and determine a new supply rate for the virtual queue. The new supply rate is assigned to the virtual queue, and the core transmits data from the virtual queue accordingly. Each virtual queue has a defined storage size. When the virtual queue is full, the core may not be able to write additional data to the virtual queue. When the virtual queue is full, data in the virtual queue may be dropped or new data may be dropped and not be added to the virtual queue. Whether a packet is dropped or not may be determined on a number of criteria, for example, priority of the data, source of the data, destination of the data, size of the data, etc.

[0076] FIG. 6 is a diagram of an embodiment of a multi-core lock free rate limiting engine 600. The multi-core lock free rate limiting engine 600 uses four cores 620 of a CPU for processing packets. A fifth core 630 is used for scheduling packet transmissions. Cores 620 each include a packet reception module 622 and a packet classification module 624. A packet may be received by a network interface 610. The packet may be received by a packet reception module 622 in communication with the network interface 610. Packets may be assigned to the cores 620 in a sequential manner, e.g., first packet is assigned to core 0, second packet assigned to core 1, and so forth. Packets may be assigned to the cores 620 based on a current workload of the cores 620, e.g., a core 620 with the lowest workload may be assigned a packet for processing. Other methods may be used for assigning packets to the cores 620.

[0077] Each core 620 is associated with several virtual queues 640. Virtual queues 640 may occupy a memory or some other data storage area of a network node comprising the multi-core lock free rate limiting engine 600. A complete queue comprises one virtual queue 640 associated with each core 620. For example, a complete queue associated with QoS Class 0 includes virtual queues 640 identified as QoS class 00, 10, 20, and 30. Each core 620 writes to several dedicated virtual queues 640 associated with different QoS values. Thus, multi-core lock free rate limiting engine 600 provides an advantage over multi-core rate limiting engine 500 in that a queue does not need to be locked for multiple cores 620 to substantially simultaneously write to the queue. Writing substantially simultaneously includes writing to two virtual queues of the same QoS classes without locking a queue. Thus, the period of time it would take to lock and unlock a physical queue for writing may be considered as substantially simultaneously.

[0078] A different core 630 is used to schedule transmissions from the virtual queues 640. Core 630 includes a scheduler 632. Scheduler 632 monitors a demand rate 642 and determines a supply rate 644, based on the demand rate 642, of each core's virtual queues 640. Scheduler 632 may use the demand rate 642 and supply rate 644 associated with each virtual class queue to synchronize the QoS constraints among cores 620. Supply rate 644 determines the packet transmission rate of one or more virtual queues 640. Based on supply rate 644 parameters, the virtual queue 640 is expected to dequeue at a specific average rate. Scheduler 632 assigns a supply rate 644 to a virtual queue 640, and the virtual queue 640 dequeues a number of bytes or packets based on the assigned supply rate 644. Each core 620 has write permission to the demand rate 642 parameters and read permission to the supply rate 644 parameters of its own virtual queues 640. Scheduler 632 periodically fetches the demand rate 642 of each virtual queue 640 and recalculates the supply rate 644 parameters correspondingly. If the demand rate 642 is high, the scheduler 632 may assign additional supply rate 644. Other factors may be considered in assigning supply rate 644, e.g., QoS of the virtual queues 640, priority of traffic in the virtual queue 640, etc. [0079] Packets are pushed from the virtual queues 640 to packet transmission module 650. The packets are transmitted using network interface 660. Network interface 660 may be the same as network interface 610 or different. The multi-core lock free rate limiting engine 600 may be implemented with any number of cores and any number of QoS classes/virtual queues. In some embodiments, the scheduler 632 may be executed by one of cores 620. The multi-core lock free rate limiting engine 600 may be implemented in an SD-WAN, an MPLS system, or other systems with multi-core processors and multiple QoS classes for transmission.

[0080] Table 1 is an algorithm 1 used for processing packets in a virtual queue 640 by a core 620.

Table 1

[0081] Inputs to algorithm 1 include a supply rate sr(ij) received from scheduler 632 for a particular core i and data type j on said core z, a packet pkt with length len, and a current time t. Data type j is associated with one of the virtual queues of core i. Data type j may also be associated with a QoS class. Time difference t_diff determined by subtracting a time of last processing tiast from time t. num_tokens is the number of available tokens (e.g., supply tokens 644) for the data type j. num_tokens is determined by summing a previous num_tokens with the product of t_diff and sr(ij). If num_tokens is less than len, then the current packet is processed by the droppacket(pkt) function. The droppacket(pkt) function may include dropping the packet, queueing the packet for later future transmission, fragmenting the packet, or other functions for handling a packet when the supply tokens for the virtual queue are exhausted in a particular transmission period. Otherwise, the current packet is transmitted, e.g., dequeued, from the virtual queue by the sendpacket(pkt) function. When the packet is transmitted, num_tokens is adjusted by subtracting len. Thus, a packet is transmitted from a virtual queue when a sufficient number of tokens are available. When a packet is transmitted the number of tokens is adjusted downward based on the length of the packet. The supply rate 644 assigned by scheduler 632 to a virtual queue 640 is used to determine the number of tokens the virtual queue 640 may use for transmitting packets from the virtual queue 640. Supply rate 644 is calculated based on demand rate 642 using algorithm 2 in table 2.

Table 2

[0082] Inputs to algorithm 2 include a number of CPU cores m, a number of QoS classes n , a demand rate for a particular core and virtual queue dr(ij), and a committed rate limit for a QoS class cr(j). The committed rate cr(j) for a QoS class may be received from a controller 210 in an SD- WAN. Committed rate cr(j) is a transmission rate associated with the QoS class. The committed rate cr(j) is a target transmission rate used to comply with the transmission rate associated with a QoS class. Committed rate cr(f) may be assigned by a controller internal to the multi-core lock free rate limiting engine 600, e.g, scheduler 632, or external to the multi-core lock free rate limiting engine 600, e.g. an internet service provider (ISP). Committed rate cr(j) may be assigned based on the QoS classes used in transmitting packets. Each QoS class is associated with one virtual queue at each core. The demand rate dr(i,j) is determined by each core based upon the number of packets associated with the particular QoS. The core provides the demand rate dr(i,j) to the scheduler 632 for computing the source rate sr. For each QoS class j, the demand rate dr of each core is summed to determine a total demand rate sum for the QoS class j. The source rate sr for each core and virtual queue combination is determined by multiplying the demand rate for that core and virtual queue combination by the committed rate cr(j) for that QoS associated with the virtual queue and then dividing the product by the total demand rate sum. As used herein, a core and virtual queue combination is one virtual queue at one of the cores. Using the above algorithms a committed rate for a particular QoS of traffic is maintained across a plurality of virtual queues. The virtual queues associated with a single QoS may be written to by their corresponding core without a need to lock the entire queue for the QoS.

[0083] FIG. 7 is a flow diagram of an embodiment of a method 700 for lock-free rate limiting. Method 700 begins at block 710 when a network device creates a first queue associated with a first core of a CPU and a first QoS class of traffic handled by the network device. The method 700 continues at block 720 when the network device creates a second queue associated with a second core of the CPU and the first QoS class of traffic. At block 730, the first CPU writes a packet to the first virtual queue. Substantially simultaneously, at block 740, the second CPU writes a packet to the second virtual queue. Thus, multiple cores may write packets associated with the same QoS class to virtual queues of the QoS at the same time. Previously, only one queue existed for the QoS, the queue would be locked for writing by the first core, and the second core would have to wait until the first core completed the write operation. Thus, the lock-free rate limiting achieves a higher packet throughput resulting in support of a higher maximum supported limiting rate.

[0084] FIG. 8 is a schematic diagram of an electronic device 800 according to an embodiment of the disclosure. The electronic device 800 is suitable for implementing the disclosed embodiments as described herein. The electronic device 800 comprises ingress ports 810 and receiver units (Rx) 820 for receiving data; a processor, logic unit, or central processing unit (CPU) 830 to process the data; transmitter units (Tx) 840 and egress ports 850 for transmitting the data; and a memory 860 for storing the data. The electronic device 800 may also comprise optical-to-electrical (OE) components and electrical-to-optical (EO) components coupled to the ingress ports 810, the receiver units 820, the transmitter units 840, and the egress ports 850 for egress or ingress of optical or electrical signals. [0085] The processor 830 is implemented by hardware and software. The processor 830 may be implemented as one or more CPU chips, cores (e.g., as a multi-core processor), field- programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), and digital signal processors (DSPs). The processor 830 is in communication with the ingress ports 810, receiver units 820, transmitter units 840, egress ports 850, and memory 860. The memory 860 comprises one or more disks, tape drives, and solid-state drives and may be used as an over-flow data storage device, to store programs when such programs are selected for execution, and to store instructions and data that are read during program execution. The memory 860 may be volatile and/or non-volatile and may be read-only memory (ROM), random-access memory (RAM), ternary content-addressable memory (TCAM), and/or static random-access memory (SRAM).

[0086] FIG. 9 is a diagram of a means for lock-free rate limiting 900, for example a network node in a SD-WAN. The means for lock-free rate limiting 900 includes memory means 920, for example memory 860; and processing means 910, for example processor 930.

[0087] While several embodiments have been provided in the present disclosure, it should be understood that the disclosed systems and methods might be embodied in many other specific forms without departing from the spirit or scope of the present disclosure. The present examples are to be considered as illustrative and not restrictive, and the intention is not to be limited to the details given herein. For example, the various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented.

[0088] In addition, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, modules, techniques, or methods without departing from the scope of the present disclosure. Other items shown or discussed as coupled or directly coupled or communicating with each other may be indirectly coupled or communicating through some interface, device, or intermediate component whether electrically, mechanically, or otherwise. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and could be made without departing from the spirit and scope disclosed herein.

Claims

CLAIMS What is claimed is:

1. A network node comprising: a memory; and a processor coupled to the memory, the processor comprising a first core and a second core, the processor configured to receive instructions from the memory which, when executed by the processor, cause the network node to: create a first virtual queue associated with the first core and a first quality of service (QoS); create a second virtual queue associated with the second core and the first QoS; write, at a first time, a first packet associated with the first QoS to the first virtual queue by the first core; and write, at substantially the first time, a second packet associated with the first QoS to the second virtual queue by the second core.

2. The network node of claim 1, wherein the instructions further cause the network node to determine supply rates for the first virtual queue and the second virtual queue based on the first QoS and one or more of a number of packets associated with the first QoS or a volume of data associated with the first QoS.

3. The network node of claim 2, wherein the instructions further cause the network node to transmit data associated with the first QoS according to the supply rates.

4. The network node of claim 1, wherein the instructions further cause the network node to: determine, by the first core, a first demand rate of the first virtual queue; and determine, by the second core, a second demand rate of the second virtual queue.

5. The network node of claim 4, wherein the first demand rate is based upon a first quantity of packets of the first QoS at the first core, and wherein the second demand rate is based upon a second quantity of packets of the first QoS at the second core.

6. The network node of claim 4 or 5, wherein the processor further comprises a scheduler, and wherein the instructions further cause the network node to: determine, by the scheduler, a total demand rate comprising the first demand rate and the second demand rate; determine, by the scheduler, a product of the first demand rate and a rate limit assigned to the first QoS; determine, by the scheduler, a supply rate for the first virtual queue as a quotient of the product and the total demand rate; and transmit the supply rate to the first core.

7. The network node of claim 6, wherein the first demand rate comprises a write permission for the first core, and wherein the supply rate comprises a read permission for the first core.

8. The network node of claim 6 or 7, wherein the instructions further cause the network node to: determine, by the first core, a third quantity of tokens based upon the supply rate; and send, by the first core, the first packet to a destination when the third quantity is above a threshold.

9. The network node of any of claims 1 through 8, wherein the instructions further cause the network node to write the first packet without locking the first virtual queue.

10. A method in a network node comprising a processor, the method comprising: creating a first virtual queue associated with a first core of the processor and a first quality of service (QoS); creating a second virtual queue associated with a second core of the processor and the first QoS; writing, at a first time, a first packet associated with the first QoS to the first virtual queue by the first core; and writing, at substantially the first time, a second packet associated with the first QoS to the second virtual queue by the second core.

11. The method of claim 10, further comprising determining supply rates for the first virtual queue and the second virtual queue based on the first QoS and one or more of a number of packets associated with the first QoS or a volume of data associated with the first QoS.

12. The method of claim 11, further comprising transmitting data associated with the first QoS according to the supply rates.

13. The method of claim 10, further comprising: determining, by the first core, a first demand rate of the first virtual queue; and determining, by the second core, a second demand rate of the second virtual queue.

14. The method of claim 13, wherein the first demand rate is based upon a first quantity of packets of the first QoS at the first core, and wherein the second demand rate is based upon a second quantity of packets of the first QoS at the second core.

15. The method of claim 13 or 14 further comprising: determining, by a scheduler, a total demand rate comprising the first demand rate and the second demand rate; determining, by the scheduler, a product of the first demand rate and a rate limit assigned to the first QoS; determining, by the scheduler, a supply rate for the first virtual queue as a quotient of the product and the total demand rate; and transmitting the supply rate to the first core.

16. The method of claim 15, wherein the first demand rate comprises a write permission for the first core, and wherein the supply rate comprises a read permission for the first core.

17. The method of claim 15 or 16, further comprising: determining, by the first core, a third quantity of tokens based upon the supply rate; and sending, by the first core, the first packet to a destination when the third quantity is above a threshold.

18. The method of any of claims 10 through 17, wherein writing the first packet comprises writing the first packet without locking the first virtual queue.

19. A computer program product comprising instructions embodied on a computer readable medium which, when executed by a network node comprising a processor, cause the network node to: create a first virtual queue associated with a first core of the processor and a first quality of service (QoS); create a second virtual queue associated with a second core of the processor and the first QoS; write, at a first time, a first packet associated with the first QoS to the first virtual queue by the first core; and write, at substantially the first time, a second packet associated with the first QoS to the second virtual queue by the second core.

20. The computer program product of claim 19, wherein the instructions further cause the network node to determine supply rates for the first virtual queue and the second virtual queue based on the first QoS and one or more of a number of packets associated with the first QoS or a volume of data associated with the first QoS.

21. The computer program product of claim 20, wherein the instructions further cause the network node to transmit data associated with the first QoS according to the supply rates.

22. The computer program product of claim 19, wherein the instructions further cause the network node to: determine, by the first core, a first demand rate of the first virtual queue; and determine, by the second core, a second demand rate of the second virtual queue.

23. The computer program product of claim 22, wherein the first demand rate is based upon a first quantity of packets of the first QoS at the first core, and wherein the second demand rate is based upon a second quantity of packets of the first QoS at the second core.

24. The computer program product of claim 22 or 23, wherein the instructions further cause the network node to: determine, by a scheduler, a total demand rate comprising the first demand rate and the second demand rate; determine, by the scheduler, a product of the first demand rate and a rate limit assigned to the first QoS; determine, by the scheduler, a supply rate for the first virtual queue as a quotient of the product and the total demand rate; and transmit the supply rate to the first core.

25. The computer program product of claim 24, wherein the first demand rate comprises a write permission for the first core, and wherein the supply rate comprises a read permission for the first core.

26. The computer program product of claim 24 or 25, wherein the instructions further cause the network node to: determine, by the first core, a third quantity of tokens based upon the supply rate; and send, by the first core, the first packet to a destination when the third quantity is above a threshold.

27 The computer program product of any of claims 19 through 26, wherein the instructions further cause the network node to write the first packet without locking the first virtual queue.

28. A software-defined wide area network (SD-WAN) comprising: a first network node configured to: receive, from a plurality of network nodes, network status reports and encryption information; generate a first tunnel through the SD-WAN based on the network status reports; and transmit, to the plurality of network nodes, configuration files comprising information of the first tunnel, encryption information of the plurality of network nodes, and network status information of the plurality of nodes; and a second network node of the plurality of network nodes, the second network node configured to: set a service level obligation (SLO) for an application; set a priority for the SLO; transmit a packet of the application using the first tunnel when the SLO of the application is met by the tunnel; and identify a second tunnel for the packet when the SLO of the application is not met by the tunnel.

29. The SD-WAN of claim 28, wherein the first tunnel comprises an internet-based tunnel, and wherein the second tunnel comprises a multi-protocol label switching (MPLS) tunnel.

30. The SD-WAN of claim 28, wherein each of the network status reports comprises bandwidth available at one of the plurality of network nodes, latency of the one of the plurality of network nodes, and packet drop rate of the one of the plurality of network nodes.

31. The SD-WAN of any of claims 28 through 30, wherein the second network node is further configured to: identify a third network node in a down state; and report the down state of the third network node to the first network node.

32. The SD-WAN of claim 31, wherein the first network node is further configured to transmit updated configuration files to the plurality of network nodes in response to the down state of the third network node.

33. The SD-WAN of any of claims 28 through 32, wherein the second network node is further configured to: create a first virtual queue associated with a first quality of service (QoS) and a first core of a processor of the second network node; create a second virtual queue associated with a second core of the processor and the first QoS; write, at a first time, a first packet associated with the first QoS to the first virtual queue by the first core; and write, at substantially the first time, a second packet associated with the first QoS to the second virtual queue by the second core.

34. A method in a software-defined wide area network (SD-WAN), the method comprising: receiving, by a first network node from a plurality of network nodes, network status reports and encryption information; generating, by the first network node, a first tunnel through the SD-WAN based on the network status reports; transmitting, by the first network node to the plurality of network nodes, configuration files comprising information of the first tunnel, encryption information of the plurality of network nodes, and network status information of the plurality of nodes; setting, by a second network node of the plurality of network nodes, a service level obligation (SLO) for an application; setting, by the second network node, a priority for the SLO; transmitting, by the second network node, a packet of the application using the first tunnel when the SLO of the application is met by the tunnel; and identifying, by the second network node, a second tunnel for the packet when the SLO of the application is not met by the tunnel.

35. The method of claim 34, wherein the first tunnel comprises an internet-based tunnel, and wherein the second tunnel comprises a multi-protocol label switching (MPLS) tunnel.

36. The method of claim 34, wherein each of the network status reports comprises bandwidth available at one of the plurality of network nodes, latency of the one of the plurality of network nodes, and packet drop rate of the one of the plurality of network nodes.

37. The method of any of claims 34 through 36, further comprising: identifying, by the second network node, a third network node in a down state; and reporting, by the second network node, the down state of the third network node to the first network node.

38. The method of claim 37, further comprising transmitting, by the first network node, updated configuration files to the plurality of network nodes in response to the down state of the third network node.

39. The method of any of claims 34 through 38, further comprising: creating, by the second network node, a first virtual queue associated with a first quality of service (QoS) and a first core of a processor of the second network node; creating, by the second network node, a second virtual queue associated with a second core of the processor and the first QoS; writing, by the second network node at a first time, a first packet associated with the first QoS to the first virtual queue by the first core; and writing, by the second network node at substantially the first time, a second packet associated with the first QoS to the second virtual queue by the second core.