WO2016101099A1 - Techniques for power management associated with processing received packets at a network device - Google Patents

Techniques for power management associated with processing received packets at a network device Download PDF

Info

Publication number
WO2016101099A1
WO2016101099A1 PCT/CN2014/094515 CN2014094515W WO2016101099A1 WO 2016101099 A1 WO2016101099 A1 WO 2016101099A1 CN 2014094515 W CN2014094515 W CN 2014094515W WO 2016101099 A1 WO2016101099 A1 WO 2016101099A1
Authority
WO
WIPO (PCT)
Prior art keywords
power management
management module
polling
processing element
dpdk
Prior art date
Application number
PCT/CN2014/094515
Other languages
English (en)
French (fr)
Other versions
WO2016101099A9 (en
Inventor
Danny Y. ZHOU
Mark D. GRAY
John J. BROWNE
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to PCT/CN2014/094515 priority Critical patent/WO2016101099A1/en
Priority to JP2017530249A priority patent/JP6545802B2/ja
Priority to EP14908670.4A priority patent/EP3238403A4/en
Priority to CN201480083625.7A priority patent/CN107005531A/zh
Priority to KR1020177013532A priority patent/KR102284467B1/ko
Publication of WO2016101099A1 publication Critical patent/WO2016101099A1/en
Publication of WO2016101099A9 publication Critical patent/WO2016101099A9/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/02Power saving arrangements
    • H04W52/0209Power saving arrangements in terminal devices
    • H04W52/0212Power saving arrangements in terminal devices managed by the network, e.g. network or access point is master and terminal is slave
    • H04W52/0222Power saving arrangements in terminal devices managed by the network, e.g. network or access point is master and terminal is slave in packet switched networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3206Monitoring of events, devices or parameters that trigger a change in power modality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0823Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
    • H04L41/0833Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability for reduction of network energy consumption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Definitions

  • Examples described herein are generally related to network packet processing.
  • DPDK dataplanedevelopmentkit
  • VPN virtual private network
  • NAT network address translation
  • DPI deep packet inspection
  • firewall, VPN, NAT, DPI or load balancer applications may be implementedona standardx86serverplatformthat may include, but is not limited to Processorscoupled to Intel 10 gigabit/40 gigabit (10G/40G) network interface cards (NICs) .
  • NICs network interface cards
  • FIG. 1 illustrates anexample system
  • FIG. 2 illustrates an example first logic flow
  • FIG. 3 illustrates an example first process
  • FIG. 4 illustrates an example second process.
  • FIG. 5 illustrates an example third process.
  • FIG. 6 illustrates an example block diagram for an apparatus.
  • FIG. 7 illustrates an examplesecondlogic flow.
  • FIG. 8 illustrates an example third logic flow.
  • FIG. 9 illustrates an exampleof a storage medium.
  • FIG. 10 illustrates an example computing platform.
  • implementation of DPDK for high performance packet processing applications may include deployment of poll mode drivers (PMDs) toconfigure and/or operate NW I/O devices such as Intel1G/10G/40G NICs.
  • PMDs poll mode drivers
  • deployment of the PMD may enable DPDK-basednetworkapplicationstoaccessthe NW I/O device resources (e.g. receive (Rx) andtransmit (Tx) queues) at a highest or maximum processor/core frequencywithouttriggeringinterrupts, inordertoreceive, processandtransmitpacketsinahighperform anceandlowlatencyfashion.
  • Rx receive
  • Tx Transmit
  • Tidal effect may result from significantchangesinnetworktrafficpatternsoccurring indifferentgeographicareasoverperiodsoftime. Soduring a periodofprocessinglow or no networktraffic, athread of a DPDK-based application ( “DPDK polling thread” ) runningonx86 processingcoresmay stillbe busywaiting (e.g. continuously polling forbut not processing networkpackets) . The waiting for network packets maywasteaconsiderableamountofpoweras well ascomputationalresources.
  • OS powermanagementsubsystems or modules may provideanumberoftechnologies (primarilyP-states, C-statesandcontrolintheOSkernel space) toadjustprocessorpower during periods of either peak or low network traffic.
  • theOS powermanagementmodules maydeterminetheappropriateprocessorvoltageand/orfrequency, operatingpoint (P-state) andprocessoridlestate (C-state) , respectivelyfora processoror cores toenter, bysamplingCPUusageperiodically.
  • the OS power management module’s inability to determine the operating state of DPDK polling threads may be problematic to determineappropriatepowerstatesforprocessors or cores supporting these DPDK polling threads to enter inordertosavepowerin datacenter networks impacted by the "tidaleffect" . It is with respect to these challenges that the examples described herein are needed.
  • techniques for power management associated with processing received packets at a NW I/O device may include monitoring received and available packet descriptors maintained at a receive queue for the NW I/O device over multiple polling iterations for a DPDK polling thread. These first examples may also include determining a level of fullness for the receive queue based on available packet descriptors maintained at the receive queue following each polling iteration and incrementing a trend count based on the level of fullness for the receive queue. These first examples may also include sending a performance indication to an OS power management module based on whether the trend count exceeds a tread count threshold. The performance indication may be capable of causing the OS power management module to increase a performance state of a processing element executing the DPDK polling thread.
  • techniques for power management associated with processing received packets at the NW device may also include monitoring received and available packet descriptors maintained at the receive queue for the NW I/O device over multiple polling iterations for the DPDK polling thread. These second examples may also include determining a number of packets received at the receive queue for each polling iteration based on available packet descriptors maintained at the receive queue following each polling iteration and incrementing an idle count by a count of 1 if a number of packets received is 0 for a consecutive number of polling iterations that exceeds a consecutive threshold. These second examples may also include causing the DPDK polling thread to sleep for one of a first or a second time period based on whether the idle count exceeds or is less than a first idle count threshold. The second time period may be longer than the first time period.
  • FIG. 1 illustrates an example system 100.
  • system 100 includes a NW I/O device 110 arranged to receive or transmit packets included in NW traffic 105.
  • a pre-flow load balancer 120 may be capable of distributing packets to receive (Rx) queues 130-1 to 130-n, where “n” is any positive whole integer > 1.
  • Rx queues 130-1 to 130-n may be arranged to at least temporarily maintain received packets for processing by respective DPDK polling threads 140-1 to 140-n.
  • Each DPDK polling thread 140- 1 to 140-n may be capable of executing at least portions of a network packet processing application including, but not limited to firewall, VPN, NAT, DPI or load balancer applications.
  • Each DPDK polling thread 140-1 to 140-n may include respective poll mode drivers 142-1 to 142-n to enable access of Rx queues 130-1 to 130-n in a polling mode of operation, withoutinterrupts, toreceiveprocess or transmitpackets received in NW traffic 105.
  • system 100 may also include DPDK power management components 150 arranged to operate in user space 101 and an OS power management module 160 arranged to operate in kernel space 102.
  • DPDK power management components 150 may include logic and/or features capable of monitoring received and available packet descriptors maintained at Rx queues 130-1 to 130-n for processing by DPDK polling threads 140-1 to 140-n. The logic and/or features at DPDK power management components 150 may also monitor sleeping or idle times for DPDK polling threads 140-1 to 140-n.
  • state algorithms 154 of DPDK power management components 150 may utilize monitored information to cause DPDK polling threads 140-1 to 140-n to be placed in temporary sleep states, send an interrupt turn on hintor message to NW I/O device 110 (via one-shot Rx interrupt on hint180) and cause a change from a polling mode to an interrupt mode of operation, or to enable library 152 to send C-state (sleep) or P-state (performance) hints or indications.
  • the C-state or P-state hints may be sent to processor element (PE) idle 162 or PE frequency 164 at OS power management module 160.
  • OS power management module 160 may then causechanges to either C-state (s) or P-state (s) of one or more PEs 172-1 to 172-n included in one or more processor (s) 170 based on received C-state or P-state hints.
  • C-states or P-states may be changed or controlled according to one or more processor power management standards or specification such as Advanced Configuration and Power Interface Specification (ACPI) , Revision 5.1, published in July, 2014 by the Unified Extensible Firmware Interface Forum ( “the ACPI specification” ) .
  • ACPI Advanced Configuration and Power Interface Specification
  • processor (s) 170 may include a multicore processor and PEs 172-1 to 172-n may each be a core of the multicore processor and/or may be a virtual machine (VM) supported by one or more cores of the multicore processor. Also, for these examples, DPDK polling threads 140-1 to 140-n may be capable of being executed by PEs 172-1 to 172-n.
  • Processor (s) 170 may include various types of commercially available x86 processors including, but not limited to, and processors;Intel Core (2) Core i3, Core i5, Core i7, Xeon orXeon processors.
  • a newAPI may be aninterruptmitigationtechnique for use in kernel space 102 (e.g., for a Linuxkernel) tosupportpacketprocessingby DPDK polling threads 140-1 to 140-n.
  • NAPI is typically designedtoimproveperformanceofhigh-speednetworkingbyintroducinga pollingmode capability for an OS andNICor NW I/O drivers.
  • NAPIallowsfor frequent modeswitches whichmay bewellacceptedfor networkterminalendpoints (e.g., serverorpersonal computer) asthese terminal endpointsmaynothavestrictlow latencyandjitterrequirements.
  • networkterminalendpoints e.g., serverorpersonal computer
  • networkdevices e.g., switch, router andL4-L7networkappliances
  • frequentmodeswitching, servicinginterruptsandwakingupa suspended, paused or sleeping thread may becostlyintermsofprocessorresourceandlatency hits.
  • These processor resource and latency hits may be unacceptablefornetwork devicesinsuch deployments as in adatacenternetwork.
  • poll mode driver 142-1 to 142-n of DPDK polling threads 140-1 to 140-n may be designed or arranged toworkina pollingmodebydefault.
  • DPDK power management components 150 may cause DPDK polling threads 140-1 to 140-n to sleep for one or more periods of time and also may send hints to OS power management module 160. Based on the hints, OS power management module 160 maycause PEs 172-1 to 172-n to scaledownprocessorfrequenciesortransition to idle or sleep statestosavepower.
  • pollmode drivers 142-1 to 142-n mayswitchtoan interruptmodefromthe default pollingmode.
  • DPDK polling threads 140-1 mainlyworkina pollingmodeenvironment tobetter satisfyperformancerequirementswhen processing packets and thus may avoidfrequentmodeswitches.
  • FIG. 2 illustratesan example first logic flow.
  • the example first logic flow includes logic flow 200.
  • logic flow 200 may for power management associated with processing received packets for a NW device (e.g., located at a datacenter) .
  • NW device e.g., located at a datacenter
  • at least some components of system 100 shown in FIG. 1 may be capable of implementing portions of logic flow 200.
  • the example logic flow 200 is not limited to using components of system 100 shown or described in FIG. 1.
  • DPDK power management components may be initialized 202.
  • a DPDK polling thread e.g., DPDK polling thread 140-1
  • a number of used and free packet descriptors maintained at the Rx queue may be obtained by DPDK power management components.
  • the number of iterations for which 0 packets were received may be compared to consecutive number threshold 208 (e.g., 5 iterations). In some examples, the logic flow moves back to 204 if the consecutive number threshold has not been exceeded.
  • an ‘idle’ count may be incremented by 1 at 210.
  • a determination of whether the ‘idle’ count is below or less than THRESHOLD_1 is made at 212.
  • the DPDK polling thread may be placed in sleep for a short period (e.g., several ⁇ s) by the DPDK power management components.
  • the DPDK polling thread may be placed in sleep for a relatively long period (e.g., dozens of ⁇ s).
  • the short sleep period may be a short, several ⁇ s sleep period to reduce power consumption compared to a continuous spin loop.
  • the relatively long sleep period may be such that a lower traffic trend has been identified.
  • the DPDK power management components may cause the DPDK polling thread to sleep for the longer period and may also send a long sleep indication or C-state hint to an OS power management module.
  • the OS power management module may then cause a PE supporting the DPDK polling thread to enter an ACPI C1 power state. This C1 power state may still enable the PE to quickly power to a C0 power state if new NW traffic arrives.Also, if new NW traffic does arrive, the ‘idle’ count is reset to a count of 0.
  • THRESHOLD_1 may be several times lower compared to THRESHOLD_2 to cause the DPDK polling thread to sleep after relatively short periods of 0 packets at the Rx queue. However, if the ‘idle’ count exceeds THRESHOLD_2, a one-shot Rx interrupt may be enabled 220. Once there is incoming packet received by the NW I/O, the one-shot Rx interrupt istriggered 222 that may cause the paused DPDK polling thread to switch back to polling mode from interrupt mode. The idle count exceeding THRESHOLD_2 may indicate a slow or no network traffic period.
  • pausing of the DPDK polling thread may enable the OS power management module to cause the PE executing the DPDK polling thread to be placed in deeper C-states of lower power usage than a C1 state.
  • the DPDK polling thread may now operate in an interrupt mode that switches back to a polling mode responsive to receiving network traffic again.
  • fullness levels of the Rx queue may be determined if packets were received in the Rx queue. For example, if the Rx queue is near empty (e.g., 25% to 50% full) 224, a ‘trend’ count may be incremented by a small number 226. If the RX queue is half full (e.g., 51% to 75%) 234, the trend’ count may be incremented by a large number 236. Also, if the ‘trend’ count exceeds a trend count threshold at 228, the DPDK power management components may send a performance indication or P-state hint to the OS power management module. The performance indication may cause the OS power management module to increase operating frequencies 230 or raise an ACPI P-state for the PE executing the DPDP polling thread. The ‘trend’ count may be set such that gradual increases in network traffic can be detected and P-states for PEs may be gradually increased to balance power saving with performance while processing received packets 232.
  • a number of consecutive polling iterations for which the Rx queue was near full may be compared to a consecutive threshold (e.g., 5 polling iterations) at 240.
  • the DPDK power management components may send a high performance indication or high P-state hint to the OS power management module if the consecutive threshold is exceeded.
  • the high performance indication may cause the OS power management module to scale up operating frequencies to highest frequency 242 or raise an ACPI P-state for the PE executing the DPDK polling thread to a highest performance P-state.
  • Scaling up to the highest performance P-state may allow the OS power management module to quickly react to high network traffic loads and thus switch the PE executing the DPDK polling thread from a balance of power saving and performance to a purely performance-based operating state while processing received packets 232.
  • the DPDK power management components periodically monitor an amount of time the DPDK polling thread is sleeping. For these examples, the DPDK power management components may initialize a periodical timer at 244 that may start a timer intervalfor which monitoring sleep for the DPDK polling thread may occur. Following expiration of the timer at 246, the DPDK power management components may determine whether the DPDK polling thread was placed in sleep for greater than 25%of the polling iterations. As mentioned previously, the DPDK polling thread may have been placed in sleep for various times based on whether or not an ‘idle’ count was less than THRESHOLD_1 or THRESHOLD_2.
  • the DPDK power management components may send a performance indication or P-state hint to the OS power management module at 252.
  • the performance indication may result in OS power management module causing the PE executing the DPDK polling thread to have a lowered or decreased operating frequency.
  • the ACPI P-state for the PE may be lowered to a lower performance P-state.
  • the DPDK power management components may determine whether or not an average packet-per-iteration for packets received during the time interval is less than a threshold of expected average packet-per-interaction at 250. If below the threshold, the DPDK power management components may send a performance indication or P-state hint to the OS power management module at 252. Having the PE lower its P-state responsive to greater than 25%sleep or less packets received than expected may be a way to ramp down performance states to save power based on monitored information that indicates network traffic or at least processing of packets from network traffic is decreasing.
  • FIG. 3 illustrates an example first process 300.
  • the first process includes process 300.
  • process 300 may demonstrate how different elements of system 100 may implement portions of logic flow 200 described above for FIG. 2. In particular those portions related to sleep, idle or interrupt actions taken by DPDK power management components (PMCs) .
  • PMCs DPDK power management components
  • at least some components of system 100 shown in FIG. 1 may be related to process 300.
  • the example process 300 is not limited to implementations using components of system 100 shown or described in FIG. 1.
  • packets may be received by NW I/O device 110 and placed at least temporarily at Rx queues 130.
  • DPDK polling threads 140 may be capable of polling Rx queues 130 and process packets as they are received at Rx queues 130.
  • DPDK power management components 150 may include logic and/or features to monitor received and available packet descriptors maintained at Rx queues 130 over multiple polling iterations for DPDK polling threads 140.
  • DPDK power management components 150 may include logic and/or features to determine if 0 packets have been received for a number of consecutive polling iterations that exceeds a consecutive threshold (e.g., 5 polling iterations) .
  • DPDK power management components 150 may include logic and/or features to increment an idle count by a count of 1 if the consecutive threshold is exceeded.
  • DPDK power management components 150 may include logic and/or features to place DPDK polling threads 140 in sleep for a short period (e.g., a couple ⁇ s) if the increment idle count is found to be less than a first idle count threshold (e.g., THRESHOLD_1) . The process may then come to an end.
  • a short period e.g., a couple ⁇ s
  • THRESHOLD_1 e.g., THRESHOLD_1
  • DPDK power management components 150 may include logic and/or features to place DPDK polling threads 140 in sleep for a relatively long sleep period (e.g., several dozen ⁇ s) if the increment idle count is found to be greater than the first idle count threshold but less than a second idle count threshold (e.g., THRESHOLD_2) .
  • a relatively long sleep period e.g., several dozen ⁇ s
  • DPDK power management components 150 may include logic and/or features to send a long sleep indication to OS power management module (PMM) 160 to indicate that DPDK polling threads 140 were placed in sleep for the long sleep period.
  • PMM OS power management module
  • the OS power management module 160 may cause PE(s) 172 to be placed in a sleep mode for up to the long sleep period.
  • the sleep mode may include an ACPI C1 power state. The process may then come to an end for this first alternative.
  • DPDK power management components 150 may include logic and/or features to send an interrupt turn on hintor message to NW I/O device 110 based on the idle count exceeding the second idle count threshold (e.g., THRESHOLD_2).
  • the second idle count threshold e.g., THRESHOLD_2
  • poll mode driver 142 may switch from working with NW I/O device 110 in a poll mode to an interrupt mode. This mode switch may also cause DPDK polling threads 140 to pause.
  • network traffic 105 may now include packets for processing by DPDK threads 140.
  • poll mode driver 142 may switch back to working with NW I/O device 110 in the poll mode and the process may then come to an end for this second alternative.
  • FIG. 4 illustrates an example second process 400.
  • the second process includes process 400.
  • process 400 may demonstrate how different elements of system 100 may implement portions of logic flow 200 described above for FIG. 2. In particular those portions related to monitoring sleep percentages for DPDK polling threads or average-packet-per-iteration to determine whether to lower a performance state for PE(s) 172.
  • at least some components of system 100 shown in FIG. 1 may be related to process 400.
  • the example process 400 is not limited to implementations using components of system 100 shown or described in FIG. 1.
  • DPDK power management components 150 may include logic and/or features to initialize or initiate a timer having a timer interval.
  • the timer interval may be approximately 100 milliseconds (ms) .
  • DPDK power management components 150 may include logic and/or features to monitor how often DPDK polling threads 140 have been placed in sleep.
  • DPDK power management components 150 may include logic and/or features to determine whether DPDK polling threads 140 were sleeping a percentage of a time above a percentage threshold during the timer interval. In some examples, DPDK power management components 150 may send a performance indication to OS power management module 160 if the sleeping percentage for DPDK polling threads 140 was above or exceeded the percentage threshold (e.g., > 25%) .
  • OS power management module 160 may cause PE (s) 172 to lower their performance state or P-state. The process may then come to an end.
  • DPDK power management components 150 may include logic and/or features to determine whether an average packet-per-iteration is greater or less than a packet average threshold. In some examples, DPDK power management components 150 may send a performance indication to OS power management module 160 if the average packet-per-iteration was below the packet average threshold.
  • OS power management module 160 may cause PE (s) 172 to lower their performance state or P-state. The process may then come to an end for this first alternative.
  • FIG. 5 illustrates an example third process 500.
  • the third process includes process 500.
  • process 500 may demonstrate how different elements of system 100 may implement portions of logic flow 200 described above for FIG. 2. In particular, those portions related to monitoring Rx queue fullness to determine if or when to raise performance for PE (s) .
  • at least some components of system 100 shown in FIG. 1 may be related to process 500.
  • the example process 500 is not limited to implementations using components of system 100 shown or described in FIG. 1.
  • packets may be received by NW I/O device 110 and placed at least temporarily at Rx queues 130.
  • DPDK polling threads 140 may be capable of polling Rx queues 130 and process packets as they are received at Rx queues 130.
  • DPDK power management components 150 may include logic and/or features to monitor received and available packet descriptors maintained at Rx queues 130 over multiple polling iterations for DPDK polling threads 140.
  • DPDK power management components 150 may include logic and/or features to determine a fullness of Rx queues 130 based on available packet descriptors maintained at Rx queues 130 following each polling iteration.
  • DPDK power management components 150 may include logic and/or features to either increment a trend count by a single count or a large count based on the determined level of fullness of Rx queues 130 as described above for logic flow 200 shown in FIG. 2.
  • DPDK power management components 150 may include logic and/or features to send a performance indication to OS power management module 160 based on the incremented tread count exceeding a tread count threshold.
  • OS power management module 160 may increase a performance state or P-State of PE(s) 172.
  • the performance state for PE(s) 172 may be raised a step up or a single P-State which may cause the operating frequency of PE(s) 172 to be increased when executing DPDK polling threads 140. The process may then come to an end.
  • DPDK power management components 150 may include logic and/or features to send a high performance indication to OS power management module 160 responsive to a consecutive number of polling iterations for which Rx queues 130 were determined to be near full (e.g., > 75%) .
  • OS power management module 160 may increase a performance state or P-State of PE(s) 172 to a highest or maximum P-State. In some examples, raising PE(s) 172 to the highest or maximum P-Statue results in PE(s) 172 being raised to their highest or maximum operating frequency when executing DPDK polling threads 140. The process may then come to an end for this first alternative.
  • FIG. 6 illustrates an exampleblock diagram for apparatus600.
  • apparatus 600 shown in FIG. 6 has a limited number of elements in a certain topology, it may be appreciated that the apparatus 600 may include more or less elements in alternate topologies as desired for a given implementation.
  • apparatus 600 may be supported bycircuitry 620 maintained at or a computing platform or network device that may be deployed in a datacenter.
  • Circuitry 620 may be arranged to execute one or more software or firmware implemented modules or components622-a.
  • “a” and “b” and “c” and similar designators as used herein are intended to be variables representing any positive integer.
  • a complete set of softwareor firmware forcomponents622-a may include components622-1, 622-2, 622-3, 622-4, 622-5 or 622-6.
  • the examples presented are not limited in this context and the different variables used throughout may represent the same or different integer values.
  • these “components” may be software/firmware stored in computer-readable media, and although the components are shown in FIG. 6as discrete boxes, this does notlimit these components to storage in distinct computer-readable media components (e.g., a separate memory, etc. ) .
  • circuitry 620 may include a processor, processor circuit or processor circuitry. Circuitry620may be any of various commercially available processors, including without limitation an and processors; Intel Core (2) Core i3, Core i5, Core i7, Xeon orXeon processors. According to some examples circuitry 620 may also include an application specific integrated circuit (ASIC) and at least some components622-amay be implemented as hardware elements of the ASIC.
  • ASIC application specific integrated circuit
  • apparatus 600 may include aqueue component622-1.
  • Queuecomponent622-1 may be executed by circuitry 620to monitor received and available packet descriptors maintained at a receive queue for a NW I/O device over multiple polling iterations for a DPDK polling thread.
  • the received and available packet descriptors may be included in rx’ d&avail pkt descriptors 610 and may be maintain by queue component 622-1 in Rx queue information 623-a (e.g., in a data structure such as a lookup table (LUT)) .
  • LUT lookup table
  • Queue module 622-1 may be capable of determining a level of fullness for the receive queue based on available packet descriptors maintained at the receive queue following each polling iteration. The level of fullness may also be maintained in Rx queue information 623-a.
  • queue component 622-1 may also determine a number of packets received at the receive queue for each polling iteration based on the information included in Rx queue information 623-a.
  • apparatus 600 may also include an increment component622-2.
  • Increment component622-2 may be executed by circuitry 620 to increment a trend count based on the level of fullness for the receive queue following each polling iteration. For these examples, the trend count may be maintained by increment component 622-2 in trend count 624-b (e.g., in a LUT) . Increment component 622-2 may also increment an idle count by at least a count of 1 if queue module 622-1 has determined that 0 packets were received for a consecutive number of polling iterations that exceeds a consecutive threshold. The idle count may be maintained by increment component 622-2 in idle count 625-c (e.g., in a LUT) .
  • apparatus 600 may also include a performance component 622-3.
  • Performancecomponent622-3 may be executed by circuitry 620to send a performance indication 640 to an OS power management module based on whether the trend count exceeds a tread count threshold.
  • the performance indication 640 may be capable of causing the OS power management module to increase a performance state of a processing element executing the DPDK polling thread.
  • the tread count threshold may be maintained by performance component 622-3 in trend count threshold 626-d (e.g., in a LUT) .
  • performance component 622-3 may send a high performance indication 645 to the OS power management module responsive to a consecutive number of polling iterations for which the receive queue is determined to be near full by queue component 622-1 exceeding a consecutive threshold.
  • the high performance indication 645 may be capable of causing the OS power management module to increase the performance state of the processing element to a highest performance state by causing an operating frequency of the processing element to be increased to a highest operating frequency.
  • the consecutive threshold may be maintained by performance component 622-3 in consecutive threshold 627-e (e.g., in a LUT) .
  • apparatus 600 may also include a sleep component 622-4.
  • Sleep component 622-4 may be executed by circuitry 620 to cause the DPDK polling thread to sleep for one of a first or a second time period based on whether the idle count incremented by increment component and maintained in idle count 625-c exceeds or is less than a first idle count threshold maintained by sleep component 622-4 in idle count thresholds 630-h (e.g., in a LUT) .
  • the second time period may be longer than the first time period.
  • Sleep 615 may include indications or commands to cause the DPDK polling thread to sleep.
  • sleep component 622-4 may cause DPDK polling thread to sleep for the second time period based on the idle count exceeding the first idle count threshold and send a long sleep indication 650 to the OS power management module.
  • Long sleep indication 650 may be capable of causing the OS power management module to cause the processing element executing the DPDK polling thread to be placed in a sleep mode for up to the second time period.
  • apparatus 600 may also include an interrupt control component 622-5.
  • Interrupt control component 622-4 may be executed by circuitry 620 to send an message to the NW I/O device to enable one-shot Rx interrupt based on whether the idle count incremented by increment component 622-2 exceeds a second idle count threshold maintained by sleep component 622-5 with idle count thresholds 630-h.
  • the message may be included in one-shot Rx interrupt 635.
  • apparatus 600 may also include a timer component 622-6.
  • Timer component 622-6 may be executed by circuitry 620 to initiate a timer having a timer interval maintained by timer component 622-6 in timer interval 632-j (e.g., in a LUT) .
  • sleep component 622-4 may determine a percentage of iterations for which the DPKK polling thread was sleeping during the timer interval responsive to the timer initiated by timer component 622-6 expiring. The percentage may be maintained by sleep component 622-4 with sleep information 631-i.
  • performance component 622-3 may send performance indication 640 to the OS power management module based on whether the percentage is greater than a percentage threshold. Performance indication 640 may be capable of causing the OS power management module to lower a performance state of the processing element that includes decreasing an operating frequency or decreasing an operating voltage of the processing element. The percentage threshold used for this determination may be maintained with percentage threshold 628-f (e.g., in a LUT) by performance component 622-3.
  • queue component 622-1 may determine an average packet-per-iteration for packets received at the receive queue during the time interval responsive to the timer initiated by timer component 622-6 expiring.
  • the average packet-per-iteration determined by queue component 622-1 may be maintained with Rx queue information 623-a.
  • performance component 622-3 may send a performance indication 640 to the OS power management module based on whether the average packet-per-iteration is greater than a packet average threshold.
  • Performance indication 640 may be capable of causing the OS power management module to lower a performance state of the processing element that includes decreasing an operating frequency of the processing element.
  • the packet average threshold may be maintained by performance component 622-3 with average threshold 629-g (e.g., in a LUT) .
  • the coordination may involve the uni-directional or bi-directional exchange of information.
  • the components may communicate information in the form of signals communicated over the communications media.
  • the information can be implemented as signals allocated to various signal lines. In such allocations, each message is a signal.
  • Further embodiments, however, may alternatively employ data messages. Such data messages may be sent across various connections. Example connections include parallel interfaces, serial interfaces, and bus interfaces.
  • a logic flow may be implemented in software, firmware, and/or hardware.
  • a logic flow may be implemented by computer executable instructions stored on at least one non-transitory computer readable medium or machine readable medium, such as an optical, magnetic or semiconductor storage. The embodiments are not limited in this context.
  • FIG. 7 illustrates an example second logic flow.
  • the example second logic flow includes logic flow 700.
  • Logic flow 700 may be representative of some or all of the operations executed by one or more logic, features, or devices described herein, such as apparatus 600. More particularly, logic flow 700 may be implemented by at least queuecomponent622-1, increment component622-2 or performance component 622-3.
  • logic flow 700at block 702 may monitor received and available packet descriptors maintained at a receive queue for a NW I/O device over multiple polling iterations for a DPDK polling thread.
  • queue component 622-1 may monitor the receive queue.
  • logic flow 700at block 704 maydetermine a level of fullness for the receive queue based on available packet descriptors maintained at the receive queue following each polling iteration. For these examples, queue component 622-1 may determine the level of fullness.
  • logic flow 700 at block 706 may increment a trend count based on the level of fullness for the receive queue.
  • increment component 622-2 may increment the trend count.
  • logic flow 700 at block 708 may send a performance indication to an OS power management module based on whether the trend count exceeds a tread count threshold, the performance indication capable of causing the OS power management module to increase a performance state of a processing element executing the DPDK polling thread.
  • performance component 622-3 may send the performance indication.
  • FIG. 8 illustrates an example third logic flow.
  • the example third logic flow includes logic flow 800.
  • Logic flow 800 may be representative of some or all of the operations executed by one or more logic, features, or devices described herein, such as apparatus 600. More particularly, logic flow 800 may be implemented by at least queue component 622-1, increment component 622-2 or sleep component 622-4.
  • logic flow 800 at block 802 may monitor received and available packet descriptors maintained at a receive queue for a NW I/O device over multiple polling iterations for a DPDK polling thread.
  • queue component 622-1 may monitor the receive queue.
  • logic flow 800 at block 804 may determine a number of packets received at the receive queue for each polling iteration based on available packet descriptors maintained at the receive queue following each polling iteration. For these examples, queue component 622-1 may determine the number of packets received at the receive queue.
  • logic flow 800 at block 806 may increment an idle count by a count of 1 if a number of packets received is 0 for a consecutive number of polling iterations that exceeds a consecutive threshold.
  • increment component 622-2 may increment the trend count.
  • logic flow 800 at block 808 may cause the DPDK polling thread to sleep for one of a first or a second time period based on whether the idle count exceeds or is less than a first idle count threshold, the second time period longer than the first time period.
  • sleep component 622-4 may cause the DPDK polling thread to sleep.
  • FIG. 9 illustrates an example storage medium900.
  • the first storage medium includes a storage medium 900.
  • the storage medium 900 may comprise an article of manufacture.
  • storage medium 900 may include any non-transitory computer readable medium or machine readable medium, such as an optical, magnetic or semiconductor storage.
  • Storage medium 900 may store various types of computer executable instructions, such as instructions to implement logic flow 700 or logic flow 800.
  • Examples of acomputer readable or machine readable storage medium may include any tangible media capable of storing electronic data, including volatile memory or non-volatile memory, removable or non-removable memory, erasable or non-erasable memory, writeable or re-writeable memory, and so forth.
  • Examples of computer executable instructions may include any suitable type of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, object-oriented code, visual code, and the like. The examples are not limited in this context.
  • FIG. 10 illustrates an example computing platform 1000.
  • computing platform1000 may include a processing component 1040, other platform components 1050 or a communications interface 1060.
  • computing platform 1000 may host power and/or performance management elements (e.g., DPDK power management components and an OS power management module) providing power/performance management functionality for a system having DPDK polling threads such as system 100 of FIG. 1.
  • Computing platform 1000 may either be a single physical network device or may be a composed logical network device that includes combinations of disaggregatephysical components or elements composed from a shared pool of configurable computing resources deployed in a datacenter.
  • processing component 1040 may execute processing operations or logic for apparatus 600 and/or storage medium 900.
  • Processing component 1040 may include various hardware elements, software elements, or a combination of both. Examples of hardware elements may include devices, logic devices, components, processors, microprocessors, circuits, processor circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth) , integrated circuits, application specific integrated circuits (ASIC) , programmable logic devices (PLD) , digital signal processors (DSP) , field programmable gate array (FPGA) , memory units, logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth.
  • ASIC application specific integrated circuits
  • PLD programmable logic devices
  • DSP digital signal processors
  • FPGA field programmable gate array
  • Examples of software elements may include software components, programs, applications, computer programs, application programs, device drivers, system programs, software development programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, application program interfaces (API) , instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof. Determining whether an example is implemented using hardware elements and/or software elements may vary in accordance with any number of factors, such as desired computational rate, power levels, heat tolerances, processing cycle budget, input data rates, output data rates, memory resources, data bus speeds and other design or performance constraints, as desired for a given example.
  • platform components 1050 may include common computing elements, such as one or more processors, multi-core processors, co-processors, memory units, chipsets, controllers, peripherals, interfaces, oscillators, timing devices, video cards, audio cards, multimedia input/output (I/O) components (e.g., digital displays) , power supplies, and so forth.
  • processors multi-core processors, co-processors, memory units, chipsets, controllers, peripherals, interfaces, oscillators, timing devices, video cards, audio cards, multimedia input/output (I/O) components (e.g., digital displays) , power supplies, and so forth.
  • I/O multimedia input/output
  • Examples of memory units may include without limitation various types of computer readable and machine readable storage media in the form of one or more higher speed memory units, such as read-only memory (ROM) , random-access memory (RAM) , dynamic RAM (DRAM) , Double-Data-Rate DRAM (DDRAM) , synchronous DRAM (SDRAM) , static RAM (SRAM) , programmable ROM (PROM) , erasable programmable ROM (EPROM) , electrically erasable programmable ROM (EEPROM) , flash memory, polymer memory such as ferroelectric polymer memory, ovonic memory, phase change or ferroelectric memory, silicon-oxide-nitride-oxide-silicon (SONOS) memory, magnetic or optical cards, an array of devices such as Redundant Array of Independent Disks (RAID) drives, solid state memory devices (e.g., USB memory) , solid state drives (SSD) and any other type of storage media suitable for storing information.
  • ROM read-only
  • communications interface 1060 may include logic and/or features to support acommunication interface.
  • communications interface 1060 may include one or more communication interfaces that operate according to various communication protocols or standards to communicate over direct or network communication links.
  • Direct communications may occur via use of communication protocols or standards described in one or more industry standards (including progenies and variants) such as those associated with the PCIe specification.
  • Network communications may occur via use of communication protocols or standards such those described in one or moreEthernet standards promulgated by IEEE.
  • one such Ethernet standard may include IEEE 802.3.
  • Network communication may also occur according to one or more OpenFlow specifications such as the OpenFlow Hardware Abstraction API Specification.
  • Network communications may also occur according to theInfiniband Architecture specification or the TCP/IP protocol.
  • computing platform 1000 may be implemented in a single network device or a logical network devicemade up of composed disaggregate physical components or elements from a shared pool of configurable computing resources. Accordingly, functions and/or specific configurations of computing platform1000 described herein, may be included or omitted in various embodiments of computing platform1000, as suitably desired for a physical or logical network device.
  • computing platform1000 may be implemented using any combination of discrete circuitry, application specific integrated circuits (ASICs) , logic gates and/or single chip architectures. Further, the features of computing platform1000 may be implemented using microcontrollers, programmable logic arrays and/or microprocessors or any combination of the foregoing where suitably appropriate. It is noted that hardware, firmware and/or software elements may be collectively or individually referred to herein as “logic” or “circuit. ”
  • exemplary computing platform1000 shown in the block diagram of FIG. 10 may represent one functionally descriptive example of many potential implementations. Accordingly, division, omission or inclusion of block functions depicted in the accompanying figures does not infer that the hardware components, circuits, software and/or elements for implementing these functions would necessarily be divided, omitted, or included in embodiments.
  • IP cores may be stored on a tangible, machine readable medium and supplied to various customers or manufacturing facilities to load into the fabrication machines that actually make the logic or processor.
  • hardware elements may include devices, components, processors, microprocessors, circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth) , integrated circuits, application specific integrated circuits (ASIC) , programmable logic devices (PLD) , digital signal processors (DSP) , field programmable gate array (FPGA) , memory units, logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth.
  • ASIC application specific integrated circuits
  • PLD programmable logic devices
  • DSP digital signal processors
  • FPGA field programmable gate array
  • software elements may include software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, application program interfaces (API) , instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof. Determining whether an example is implemented using hardware elements and/or software elements may vary in accordance with any number of factors, such as desired computational rate, power levels, heat tolerances, processing cycle budget, input data rates, output data rates, memory resources, data bus speeds and other design or performance constraints, as desired for a given implementation.
  • a computer-readable medium may include a non-transitory storage medium to store logic.
  • thenon-transitory storage medium may include one or more types of computer-readable storage media capable of storing electronic data, including volatile memory or non-volatile memory, removable or non-removable memory, erasable or non-erasable memory, writeable or re-writeable memory, and so forth.
  • the logic may include various software elements, such as software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, API, instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof.
  • a computer-readable medium may include a non-transitory storage medium to store or maintaininstructions that when executed by a machine, computing device or system, cause the machine, computing device or system to perform methods and/or operations in accordance with the described examples.
  • the instructions may include any suitable type of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, and the like.
  • the instructions may be implemented according to a predefined computer language, manner or syntax, for instructing a machine, computing device or system to perform a certain function.
  • the instructions may be implemented using any suitable high-level, low-level, object-oriented, visual, compiled and/or interpreted programming language.
  • Coupled and “connected” along with their derivatives. These terms are not necessarily intended as synonyms for each other. For example, descriptions using the terms “connected” and/or “coupled” may indicate that two or more elements are in direct physical or electrical contact with each other. The term “coupled, ” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
  • An example apparatus may include circuitry at a host computing platform and a queue component for execution by the circuitry to monitor received and available packet descriptors maintained at a receive queue for a NW I/O device over multiple polling iterations for a DPDK polling thread.
  • the queue module may determine a level of fullness for the receive queue based on available packet descriptors maintained at the receive queue following each polling iteration.
  • the apparatus may also include an increment component for execution by the circuitry that may increment a trend count based on the level of fullness for the receive queue following each polling iteration.
  • the apparatus may also include a performance component for execution by the circuitry to send a performance indication to an OS power management module based on whether the trend count exceeds a tread count threshold.
  • the performance indication may be capable of causing the OS power management module to increase a performance state of a processing element executing the DPDK polling thread.
  • Example 2 The apparatus of example 1, the OS power management module may increase the performance state of the processing element comprises the OS power management module capable of causing an operating frequency of the processing element to be increased.
  • Example 3 The apparatus of example 1, the level of fullness determined by the queue component may include one of near empty, half full or near full.
  • Example 4 The apparatus of example 3, the increment component may increment the trend count by a small number if the level of fullness is determined by the queue component to be near empty or by a large number if the level of fullness is determined by the queue component to be half full, the large number approximately 100 times that of the small number.
  • Example 5 The apparatus of example 3 may also include the performance component to send a high performance indication to the OS power management module responsive to a consecutive number of polling iterations for which the receive queue is determined to be near full by the queue component exceeds a consecutive threshold.
  • the high performance indication may be capable of causing the OS power management module to increase the performance state of the processing element to a highest performance state by causing an operating frequency of the processing element to be increased to a highest operating frequency.
  • Example 5-1 The apparatus of example 3, near empty may be 25%to 50%full, half full may be 51%to 75%full and near full may be greater than 75%full.
  • the OS power management module may be capable of increasing the performance state of the processing element according to one or more industry standards including ACPI Specification, Revision 5.0.
  • the OS power management module may be capable of increasing the performance state of the processing element to enter a Pn-1 performance state, where “n” represent a lowest performance state resulting in a lowest operating frequency of the processing element.
  • Example 7 The apparatus of example 1 may also include the queue component to determine a number of packets received at the receive queue for each polling iteration based on available packet descriptors maintained at the receive queue following each polling iteration.
  • the increment component may increment an idle count by a count of 1 if a number of packets received is 0 for a consecutive number of polling iterations that exceeds a consecutive threshold.
  • the apparatus may also include a sleep component for execution by the circuitry to cause the DPDK polling thread to sleep for one of a first or a second time period based on whether the idle count exceeds or is less than a first idle count threshold. For these examples, the second time period may be longer than the first time period.
  • Example 8 The apparatus of example 7 may also include an interrupt control component for execution by the circuitry to send an interrupt turn on message to the NW I/O device that may enable a one-shot Rx interrupt based on whether the idle count exceeds a second idle count threshold.
  • Example 9 The apparatus of example 7, the sleep component may cause DPDK polling thread to sleep for the second time period based on the idle count exceeding the first idle count threshold and send a long sleep indication to the OS power management module.
  • the long sleep indication may be capable of causing the OS power management module to cause the processing element executing the DPDK polling thread to be placed in a sleep mode for up to the second time period.
  • Example 10 The apparatus of example 9 may also include a timer component for execution by the circuitry to initiate a timer having a timer interval.
  • the sleep component may determine a percentage of iterations for which the DPDK polling thread was sleeping during the timer interval responsive to the timer expiring.
  • the performance component may send a performance indication to the OS power management module based on whether the percentage is greater than a percentage threshold.
  • the performance indication may be capable of causing the OS power management module to lower a performance state of the processing element that includes decreasing an operating frequency or decreasing an operating voltage of the processing element.
  • Example 11 The apparatus of example 9 may also include a timer component for execution by the circuitry to initiate a timer having a timer interval.
  • the queue component may determine an average packet-per-iteration for packets received at the receive queue during the timer interval responsive to the timer expiring.
  • the performance component may send a performance indication to the OS power management module based on whether the average packet-per-iteration is less than a packet average threshold.
  • the performance indication may be capable of causing the OS power management module to lower a performance state of the processing element that includes decreasing an operating frequency of the processing element.
  • the OS power management module may be capable of placing the processing element in sleep mode according to one or more industry standards including ACPI Specification, Revision 5.0.
  • the OS power management module may cause the processing element to enter at least a C1 processor power state to place the processing element in the sleep mode responsive to the sleep indication.
  • Example 13 The apparatus of example 7, the consecutive threshold may be 5 iterations and each polling iteration may occur approximately every 1 microsecond.
  • Example 14 The apparatus of example 13, the sleep indication indicating the idle count is less than the first idle count threshold that may cause the DPDK polling thread to sleep for the first time period.
  • the first time period may last approximately 5 polling iterations.
  • Example 15 The apparatus of example 13, the sleep indication indicating the idle count exceeds the first idle count threshold that may cause the DPDK polling thread to sleep for the second time period.
  • the second time period may last approximately 50 polling iterations.
  • Example 16 The apparatus of example 1, the processing element may include one of a core of a multicore processor or a virtual machine supported by one or more cores of the multicore processor.
  • the DPDK polling thread may be capable of executing a network packet processing application to process packets received by the NW I/O device.
  • the network packet processing application may include one of a firewall application, a VPN application, an NAT application, a DPI application or a load balancer application.
  • Example 18 The apparatus of example1 may also include a digital display coupled to the circuitry to present a user interface view.
  • An example method may include monitoring received and available packet descriptors maintained at a receive queue for a NW I/O device over multiple polling iterations for a DPDK polling thread. The method may also include determining a level of fullness for the receive queue based on available packet descriptors maintained at the receive queue following each polling iteration. The method may also include incrementing a trend count based on the level of fullness for the receive queue. The method may also include sending a performance indication to an OS power management module based on whether the trend count exceeds a tread count threshold. The performance indication may be capable of causing the OS power management module to increase a performance state of a processing element executing the DPDK polling thread.
  • Example 20 The method of example 19, the OS power management module to increase the performance state of the processing element includes the OS power management module capable of causing an operating frequency of the processing element to be increased.
  • Example 21 The method of example 19, the level of fullness may include one of near empty, half full or near full.
  • Example 22 The method of example 21, incrementing the trend count by a small number if the level of fullness is near empty or by a large number if the level of fullness is half full.
  • the large number may be approximately 100 times that of the small number.
  • Example 23 The method of example 21 may also include sending a high performance indication to the OS power management module responsive to a consecutive number of polling iterations for which the receive queue is determined to be near full exceeds a consecutive threshold.
  • the high performance indication may be capable of causing the OS power management module to increase the performance state of the processing element to a highest performance state including increasing an operating frequency of the processing element to a highest operating frequency.
  • Example 24 The method of example 21, near empty may be 25%to 50%full, half full may be 51%to 75%full and near full may be greater than 75%full.
  • Example 25 The method of example 19, the OS power management module may be capable of increasing the performance state of the processing element according to one or more industry standards including ACPI Specification, Revision 5.0. For these examples, the OS power management module may be capable of increasing the performance state of the processing element to enter a Pn-1 performance state, where “n” represent a lowest performance state resulting in a lowest operating frequency of the processing element.
  • Example 26 The method of example 17, the processing element may include one of a core of a multicore processor or a virtual machine supported by one or more cores of the multicore processor.
  • the DPDK polling thread may be capable of executing a network packet processing application to process packets received by the NW I/O device.
  • the network packet processing application may include one of a firewall application, a VPN application, an NAT application, a DPI application or a load balancer application.
  • Example 28 An example at least one machine readable medium may include a plurality of instructions that in response to being executed by system may cause the system to carry out a method according to any one of examples 19 to 27.
  • Example 29 An apparatus may include means for performing the methods of any one of examples 19 to 27.
  • An example at least one machine readable medium may include a plurality of instructions that in response to being executed on system at a host computing platform cause the system to monitor received and available packet descriptors maintained at a receive queue for a NW I/O device over multiple polling iterations for a DPDK polling thread.
  • the instructions may also cause the system to determine a level of fullness for the receive queue based on available packet descriptors maintained at the receive queue following each polling iteration.
  • the instructions may also cause the system to increment a trend count based on the level of fullness for the receive queue and send a performance indication to an OS power management module based on whether the trend count exceeds a tread count threshold.
  • the performance indication may be capable of causing the OS power management module to increase a performance state of a processing element executing the DPDK polling thread.
  • Example 31 The at least one machine readable medium of example 30, the OS power management module to increase the performance state of the processing element may include the OS power management module being capable of causing an operating frequency of the processing element to be increased.
  • Example 32 The at least one machine readable medium of example 30, the level of fullness may include one of near empty, half full or near full.
  • Example 33 The at least one machine readable medium of example 32, the instructions may cause the system to increment the trend count by a small number if the level of fullness is near empty or by a large number if the level of fullness is half full.
  • the large number may be approximately 100 times that of the small number.
  • Example 34 The at least one machine readable medium of example 32, the instructions may further cause the system to send a high performance indication to the OS power management module responsive to a consecutive number of polling iterations for which the receive queue is determined to be near full exceeds a consecutive threshold.
  • the high performance indication may be capable of causing the OS power management module to increase the performance state of the processing element to a highest performance state including increasing an operating frequency of the processing element to a highest operating frequency.
  • Example 35 The at least one machine readable medium of example 32, near empty may be 25%to 50%full, half full may be 51%to 75%full and near full may be greater than 75%full.
  • Example 36 The at least one machine readable medium of example 30, the OS power management module may be capable of increasing the performance state of the processing element according to one or more industry standards including ACPI Specification, Revision 5.0.
  • the OS power management module may be capable of increasing the performance state of the processing element to enter a Pn-1 performance state, where “n” represent a lowest performance state resulting in a lowest operating frequency of the processing element.
  • Example 37 The at least one machine readable medium of example 30, the processing element may include one of a core of a multicore processor or a virtual machine supported by one or more cores of the multicore processor.
  • Example 38 The at least one machine readable medium of example 30, the DPDK polling thread may be capable of executing a network packet processing application to process packets received by the NW I/O device.
  • the network packet processing application may include one of a firewall application, a VPN application, an NAT application, a DPI application or a load balancer application.
  • An example method may include monitoring received and available packet descriptors maintained at a receive queue for a NW I/O device over multiple polling iterations for a DPDK polling thread. The method may also include determining a number of packets received at the receive queue for each polling iteration based on available packet descriptors maintained at the receive queue following each polling iteration. The method may also include incrementing an idle count by a count of 1 if a number of packets received is 0 for a consecutive number of polling iterations that exceeds a consecutive threshold. The method may also include causing the DPDK polling thread to sleep for one of a first or a second time period based on whether the idle count exceeds or is less than a first idle count threshold. The second time period may be longer than the first time period.
  • Example 40 The method of example 39 may also include sending an interrupt turn on hint to the NW I/O device and cause the DPDK polling thread to pause based on the idle count exceeding a second idle count threshold.
  • Example 41 The method of example 39 may also include the DPDK polling thread to sleep for the second time period based on the idle count exceeding the first idle count threshold.
  • the method may also include sending a long sleep indication to an OS power management module.
  • the long sleep indication may be capable of causing the OS power management module to cause a processing element executing the DPDK polling thread to be placed in a sleep mode for up to the second time period.
  • Example 42 The method of example 41 may also include initiating a timer having a timer interval. The method may also include determining a percentage of iterations for which the DPDK polling thread was sleeping during the timer interval responsive to the timer expiring. The method may also include sending a performance indication to the OS power management module based on whether the percentage is greater than a percentage threshold. The performance indication may be capable of causing the OS power management module to lower a performance state of the processing element that includes decreasing an operating frequency or decreasing an operating voltage of the processing element.
  • Example 43 The method of example 41 may also include initiating a timer having a timer interval. The method may also include determining an average packet-per-iteration for packets received at the receive queue during the timer interval responsive to the timer expiring. The method may also include sending a performance indication to the OS power management module based on whether the average packet-per-iteration is less than a packet average threshold. The performance indication may cause the OS power management module to lower a performance state of the processing element that includes decreasing an operating frequency of the processing element.
  • Example 44 The method of example 41, the OS power management module may be capable of placing the processing element in sleep mode according to one or more industry standards including ACPI Specification, Revision 5.0.
  • the OS power management module may cause the processing element to enter at least a C1 processor power state to place the processing element in the sleep mode responsive to the sleep indication.
  • Example 45 The method of example 41, the processing element may be one of a core of a multicore processor or a virtual machine supported by one or more cores of the multicore processor.
  • Example 46 The method of example 39, the consecutive threshold may be 5 iterations and each polling iteration may occur approximately every 1 microsecond.
  • Example 47 The method of example 46, the sleep indication may indicate the idle count is less than the first idle count threshold, causing the DPDK polling thread to sleep for the first time period.
  • the first time period may last approximately 5 polling iterations.
  • Example 48 The method of example 46, the sleep indication may indicate that the idle count exceeds the first idle count threshold, causing the DPDK polling thread to sleep for the second time period.
  • the second time period may last approximately 50 polling iterations.
  • the DPDK polling thread may be capable of executing a network packet processing application to process packets received by the NW I/O device.
  • the network packet processing application may include one of a firewall application, a VPN application, an NAT application, a DPI application or a load balancer application.
  • Example 50 An example at least one machine readable medium may include a plurality of instructions that in response to being executed by system at a host computing platform may cause the system to carry out a method according to any one of examples 39 to 49.
  • Example 51 An example apparatus may include means for performing the methods of any one of examples 39 to 49.
  • An example at least one machine readable medium may include a plurality of instructions that in response to being executed on system at a host computing platform may cause the system to monitor received and available packet descriptors maintained at a receive queue for a NW I/O device over multiple polling iterations for a DPDK polling thread.
  • the instructions may also cause the system to determine a number of packets received at the receive queue for each polling iteration based on available packet descriptors maintained at the receive queue following each polling iteration.
  • the instructions may also cause the system to increment an idle count by a count of 1 if a number of packets received is 0 for a consecutive number of polling iterations that exceeds a consecutive threshold.
  • the instructions may also cause the system to cause the DPDK polling thread to sleep for one of a first or a second time period based on whether the idle count exceeds or is less than a first idle count threshold. The second time period may be longer than the first time period.
  • Example 53 The at least one machine readable medium of example 52, the instructions may further cause the system to send an interrupt turn on hint to the NW I/O device and cause the DPDK polling thread to pause based on the idle count exceeding a second idle count threshold.
  • Example 54 The at least one machine readable medium of example 52, the DPDK polling thread to sleep for the second time period based on the idle count exceeding the first idle count threshold.
  • the instructions may further cause the system to send a long sleep indication to an OS power management module.
  • the long sleep indication may be capable of causing the OS power management module to cause a processing element executing the DPDK polling thread to be placed in a sleep mode for up to the second time period.
  • Example 55 The at least one machine readable medium of example 54, the instructions may further cause the system to initiate a timer having a timer interval and determine a percentage of iterations for which the DPDK polling thread was sleeping during the timer interval responsive to the timer expiring.
  • the instructions may also cause the system to send a performance indication to the OS power management module based on whether the percentage is greater than a percentage threshold.
  • the performance indication may be capable of causing the OS power management module to lower a performance state of the processing element that includes decreasing an operating frequency or decreasing an operating voltage of the processing element.
  • Example 56 The at least one machine readable medium of example 54, the instructions may further cause the system to initiate a timer having a timer interval and determine an average packet-per-iteration for packets received at the receive queue during the timer interval responsive to the timer expiring.
  • the instructions may also cause the system to send a performance indication to the OS power management module based on whether the average packet-per-iteration is less than a packet average threshold.
  • the performance indication may cause the OS power management module to lower a performance state of the processing element that includes decreasing an operating frequency of the processing element.
  • Example 57 The at least one machine readable medium of example 54, the OS power management module may be capable of placing the processing element in sleep mode according to one or more industry standards including ACPI Specification, Revision 5.0.
  • the OS power management module may cause the processing element to enter at least a C1 processor power state to place the processing element in the sleep mode responsive to the sleep indication.
  • Example 58 The at least one machine readable medium of example 54, the processing element may include one of a core of a multicore processor or a virtual machine supported by one or more cores of the multicore processor.
  • Example 59 The at least one machine readable medium of example 51, the consecutive threshold may include 5 iterations and each polling iteration may occur approximately every 1 microsecond.
  • Example 60 The at least one machine readable medium of example 59, the sleep indication may indicate the idle count is less than the first idle count threshold, causing the DPDK polling thread to sleep for the first time period.
  • the first time period may last approximately 5 polling iterations.
  • Example 61 The at least one machine readable medium of example 59, the sleep indication may indicate the idle count exceeds the first idle count threshold, causing the DPDK polling thread to sleep for the second time period.
  • the second time period may last approximately 50 polling iterations.
  • Example 62 The at least one machine readable medium of example 52, the DPDK polling thread may be capable of executing a network packet processing application to process packets received by the NW I/O device.
  • the network packet processing application may include one of a firewall application, a VPN application, an NAT application, a DPI application or a load balancer application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Environmental & Geological Engineering (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Power Sources (AREA)
  • Small-Scale Networks (AREA)
PCT/CN2014/094515 2014-12-22 2014-12-22 Techniques for power management associated with processing received packets at a network device WO2016101099A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
PCT/CN2014/094515 WO2016101099A1 (en) 2014-12-22 2014-12-22 Techniques for power management associated with processing received packets at a network device
JP2017530249A JP6545802B2 (ja) 2014-12-22 2014-12-22 ネットワークデバイスにおける受信パケットの処理と関連する電力管理のための技術
EP14908670.4A EP3238403A4 (en) 2014-12-22 2014-12-22 Techniques for power management associated with processing received packets at a network device
CN201480083625.7A CN107005531A (zh) 2014-12-22 2014-12-22 用于与处理在网络设备处接收到的分组相关联的功率管理的技术
KR1020177013532A KR102284467B1 (ko) 2014-12-22 2014-12-22 네트워크 디바이스에서의 수신된 패킷의 처리와 연관된 전력 관리 기법

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/094515 WO2016101099A1 (en) 2014-12-22 2014-12-22 Techniques for power management associated with processing received packets at a network device

Publications (2)

Publication Number Publication Date
WO2016101099A1 true WO2016101099A1 (en) 2016-06-30
WO2016101099A9 WO2016101099A9 (en) 2016-11-10

Family

ID=56148844

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/094515 WO2016101099A1 (en) 2014-12-22 2014-12-22 Techniques for power management associated with processing received packets at a network device

Country Status (5)

Country Link
EP (1) EP3238403A4 (ko)
JP (1) JP6545802B2 (ko)
KR (1) KR102284467B1 (ko)
CN (1) CN107005531A (ko)
WO (1) WO2016101099A1 (ko)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108628684A (zh) * 2017-03-20 2018-10-09 华为技术有限公司 一种基于dpdk的报文处理方法及计算机设备
WO2019004880A1 (en) 2017-06-27 2019-01-03 Telefonaktiebolaget Lm Ericsson (Publ) POWER MANAGEMENT OF AN EVENT BASED PROCESSING DEVICE
WO2020003135A1 (en) * 2018-06-26 2020-01-02 Telefonaktiebolaget Lm Ericsson (Publ) Sliding window based non-busy looping mode in cloud computing
EP3640797A1 (en) * 2018-10-15 2020-04-22 INTEL Corporation Dynamic traffic-aware interface queue switching among processor cores
US10929179B2 (en) 2016-03-24 2021-02-23 Huawei Technologies Co., Ltd. Scheduling method and electronic device
US20220075654A1 (en) * 2019-03-25 2022-03-10 Telefonaktiebolaget Lm Ericsson (Publ) Optimizing runtime framework for efficient hardware utilization and power saving
CN116055230A (zh) * 2023-03-28 2023-05-02 北京博上网络科技有限公司 一种dpdk睡眠时间控制方法及系统
WO2023105692A1 (ja) * 2021-12-08 2023-06-15 日本電信電話株式会社 サーバ内データ転送装置、サーバ内データ転送方法およびプログラム

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062269B (zh) * 2017-12-05 2020-12-11 上海交通大学 一种基于dpdk的计算资源弹性伸缩方法及系统
CN108632110B (zh) * 2018-03-23 2020-06-19 北京网测科技有限公司 设备性能测试方法、系统、计算机设备和存储介质
US10642341B2 (en) * 2018-03-23 2020-05-05 Juniper Networks, Inc. Selective modification of power states based on conditions
CN110968402A (zh) * 2018-09-28 2020-04-07 深信服科技股份有限公司 一种cpu工作控制方法、装置、设备及存储介质
CN110968403A (zh) * 2018-09-28 2020-04-07 深信服科技股份有限公司 一种cpu工作控制方法、装置、设备及存储介质
CN112817772B (zh) * 2019-11-15 2023-12-29 深信服科技股份有限公司 一种数据通信方法、装置、设备及存储介质
KR102437625B1 (ko) * 2020-06-19 2022-08-29 재단법인대구경북과학기술원 전자 장치 및 전자 장치의 전력 관리 방법
JPWO2023002547A1 (ko) * 2021-07-19 2023-01-26
WO2023144958A1 (ja) * 2022-01-27 2023-08-03 日本電信電話株式会社 サーバ内遅延制御装置、サーバ内遅延制御方法およびプログラム
WO2023199519A1 (ja) * 2022-04-15 2023-10-19 日本電信電話株式会社 サーバ内遅延制御装置、サーバ内遅延制御方法およびプログラム
CN115756143B (zh) * 2022-11-30 2024-03-12 深圳市领创星通科技有限公司 数据包处理的节能方法、装置、计算机设备和存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006117013A1 (en) * 2005-05-04 2006-11-09 Telecom Italia S.P.A. Method and system for processing packet flows, and computer program product therefor
US8243744B2 (en) * 2004-03-01 2012-08-14 Futurewei Technologies, Inc. Priority sorting
WO2013003532A1 (en) * 2011-06-29 2013-01-03 Verisign, Inc. Data plane packet processing tool chain

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7225250B1 (en) * 2000-10-30 2007-05-29 Agilent Technologies, Inc. Method and system for predictive enterprise resource management
US7711966B2 (en) * 2004-08-31 2010-05-04 Qualcomm Incorporated Dynamic clock frequency adjustment based on processor load
US7840682B2 (en) * 2005-06-03 2010-11-23 QNX Software Systems, GmbH & Co. KG Distributed kernel operating system
US8984309B2 (en) * 2008-11-21 2015-03-17 Intel Corporation Reducing network latency during low power operation
US8639862B2 (en) * 2009-07-21 2014-01-28 Applied Micro Circuits Corporation System-on-chip queue status power management
CN102082698A (zh) * 2009-11-26 2011-06-01 上海大学 基于改进型零拷贝技术的高性能内核的网络数据处理系统
JP2012168757A (ja) * 2011-02-15 2012-09-06 Nec Casio Mobile Communications Ltd パフォーマンス制御システム、及びアプリケーションプロセッサ
US9323319B2 (en) * 2011-06-29 2016-04-26 Nec Corporation Multiprocessor system and method of saving energy therein
JP2014121036A (ja) * 2012-12-19 2014-06-30 Alaxala Networks Corp 中継装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8243744B2 (en) * 2004-03-01 2012-08-14 Futurewei Technologies, Inc. Priority sorting
WO2006117013A1 (en) * 2005-05-04 2006-11-09 Telecom Italia S.P.A. Method and system for processing packet flows, and computer program product therefor
WO2013003532A1 (en) * 2011-06-29 2013-01-03 Verisign, Inc. Data plane packet processing tool chain

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3238403A4 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10929179B2 (en) 2016-03-24 2021-02-23 Huawei Technologies Co., Ltd. Scheduling method and electronic device
CN108628684A (zh) * 2017-03-20 2018-10-09 华为技术有限公司 一种基于dpdk的报文处理方法及计算机设备
CN108628684B (zh) * 2017-03-20 2021-01-05 华为技术有限公司 一种基于dpdk的报文处理方法及计算机设备
WO2019004880A1 (en) 2017-06-27 2019-01-03 Telefonaktiebolaget Lm Ericsson (Publ) POWER MANAGEMENT OF AN EVENT BASED PROCESSING DEVICE
US11243603B2 (en) 2017-06-27 2022-02-08 Telefonaktiebolaget Lm Ericsson (Publ) Power management of an event-based processing system
WO2020003135A1 (en) * 2018-06-26 2020-01-02 Telefonaktiebolaget Lm Ericsson (Publ) Sliding window based non-busy looping mode in cloud computing
EP3640797A1 (en) * 2018-10-15 2020-04-22 INTEL Corporation Dynamic traffic-aware interface queue switching among processor cores
US20220075654A1 (en) * 2019-03-25 2022-03-10 Telefonaktiebolaget Lm Ericsson (Publ) Optimizing runtime framework for efficient hardware utilization and power saving
WO2023105692A1 (ja) * 2021-12-08 2023-06-15 日本電信電話株式会社 サーバ内データ転送装置、サーバ内データ転送方法およびプログラム
CN116055230A (zh) * 2023-03-28 2023-05-02 北京博上网络科技有限公司 一种dpdk睡眠时间控制方法及系统
CN116055230B (zh) * 2023-03-28 2023-06-09 北京博上网络科技有限公司 一种dpdk睡眠时间控制方法及系统

Also Published As

Publication number Publication date
EP3238403A1 (en) 2017-11-01
CN107005531A (zh) 2017-08-01
WO2016101099A9 (en) 2016-11-10
JP6545802B2 (ja) 2019-07-17
EP3238403A4 (en) 2019-01-02
JP2018507457A (ja) 2018-03-15
KR20170097615A (ko) 2017-08-28
KR102284467B1 (ko) 2021-08-03

Similar Documents

Publication Publication Date Title
WO2016101099A1 (en) Techniques for power management associated with processing received packets at a network device
US10355959B2 (en) Techniques associated with server transaction latency information
US20160179582A1 (en) Techniques to dynamically allocate resources for local service chains of configurable computing resources
US9292068B2 (en) Controlling a turbo mode frequency of a processor
TWI526843B (zh) 用於節能行動平台之適應性中斷合倂技術
US10929179B2 (en) Scheduling method and electronic device
US8560749B2 (en) Techniques for managing power consumption state of a processor involving use of latency tolerance report value
US11734204B2 (en) Adaptive processor resource utilization
US20090077277A1 (en) Methods and apparatus for decreasing power consumption and bus activity
WO2021078144A1 (zh) 能耗管理的方法和设备
CN113075982A (zh) 一种服务器智能网卡散热方法、装置、系统及介质
WO2014143674A1 (en) Controlling power supply unit power consumption during idle state
US11388074B2 (en) Technologies for performance monitoring and management with empty polling
EP3640797A1 (en) Dynamic traffic-aware interface queue switching among processor cores
EP4163795A1 (en) Techniques for core-specific metrics collection
CN113360344B (zh) 一种服务器监控方法、装置、设备及计算机可读存储介质
CN112817746B (zh) 一种cpu功率调整方法、装置、设备及可读存储介质
Trifonov et al. Data centre energy efficiency optimisation in high-speed packet I/O frameworks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14908670

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20177013532

Country of ref document: KR

Kind code of ref document: A

REEP Request for entry into the european phase

Ref document number: 2014908670

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2017530249

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE