KR101444990B1

KR101444990B1 - Method and apparatus for performing energy-efficient network packet processing in a multi processor core system

Info

Publication number: KR101444990B1
Application number: KR1020117031560A
Authority: KR
Inventors: 치흐-판 신; 지-시안 차이; 충-유안 씨. 타이
Original assignee: 인텔 코오퍼레이션
Priority date: 2009-06-26
Filing date: 2010-06-25
Publication date: 2014-10-07
Also published as: US20100332869A1; CN105446455B; WO2010151824A2; CN102460342A; CN105446455A; CN102460342B; WO2010151824A3; KR20140014407A; US8239699B2; EP2446340A2; EP2446340A4; US20120278637A1

Abstract

네트워크 패킷 프로세싱을 위한 코어 어피니티 관리를 위한 방법 및 장치가 제공된다. 복수의 프로세싱 유닛을 포함하는 시스템에서의 복수의 프로세싱 유닛의 저-전력 유휴 상태가 모니터링된다. 네트워크 패킷 프로세싱은 비-저 전력 유휴 상태인 프로세싱 유닛들에 대한 저-전력 유휴 상태 레지던시를 증가시키도록 저-전력 유휴 상태인 프로세싱 유닛들로 적응적으로 재할당하여 감소한 에너지 소비를 유도한다.A method and apparatus for core affinity management for network packet processing is provided. A low-power idle state of a plurality of processing units in a system including a plurality of processing units is monitored. The network packet processing adaptively reallocates to the low-power idle processing units to increase the low-power idle state residency for the non-low power idle processing units, resulting in reduced energy consumption.

Description

[0001] METHOD AND APPARATUS FOR PERFORMING ENERGY-EFFICIENT NETWORK PACKET PROCESSING IN A MULTI PROCESSOR CORE SYSTEM [0002]

이 개시는 멀티-프로세서 코어 시스템에서의 에너지-효율에 관한 것이며, 더 구체적으로는 에너지-효율적 네트워크 패킷 프로세싱에 관한 것이다.This disclosure relates to energy-efficiency in a multi-processor core system, and more particularly to energy-efficient network packet processing.

통상적으로, 복수의 프로세서 코어들을 갖는 컴퓨터 시스템은 모든 프로세서 코어들 사이에 워크로드를 분배함으로써 높은 워크로드를 다룬다. 그러나, 워크로드가 감소함에 따라, 복수의 프로세서 코어들 각각은 충분히 이용되지 않을 수 있다.Typically, a computer system with multiple processor cores handles high workloads by distributing the workload among all of the processor cores. However, as the workload decreases, each of the plurality of processor cores may not be fully utilized.

워크로드가 낮을 때 복수의 프로세서 코어들에 의한 전력 소비를 감소시키기 위해, 운영 체계는 시스템 이용 레벨에 기초하여 사용되는 프로세서 코어들의 수를 조정할 수 있다. 사용되지 않는 프로세서 코어들은 저-전력 유휴 상태로 두고(파킹되고"parked") 긴 연속적인 간격동안 저-전력 유휴 상태로 유지될 수 있다. 운영 체계는 저-전력 유휴 상태가 아닌 프로세서 코어들 사이에 워크로드를 분배하기를 계속한다.To reduce power consumption by the plurality of processor cores when the workload is low, the operating system may adjust the number of processor cores used based on the system utilization level. Unused processor cores may be left in a low-power idle state (parked and parked) and held in a low-power idle state for long successive intervals. The operating system continues to distribute the workload among the processor cores rather than the low-power idle state.

본 발명의 실시예들의 특징은 유사 참조번호가 유사 파트들을 나타내는 도면들을 참조하여, 하기의 상세한 설명이 진행됨에 따라 명백해질 것이다.
도 1은 수신측 스케일링을 지원하는 네트워크 인터페이스 제어기의 실시예를 포함하는 시스템의 블록도이다.
도 2는 도 1에 도시된 네트워크 인터페이스 제어기 및 메모리의 실시예를 나타내는 블록도이다.
도 3은 본 발명의 원리들에 따라 코어 어피니티(affinity) 설정을 동적으로 조정하는 방법의 실시예의 흐름도이다.
도 4는 도 1에 도시된 네트워크 인터페이스 제어기 및 메모리의 다른 실시예를 나타내는 블록도이다.
본 발명의 설명적인 실시예들을 참조로 하기의 상세한 설명이 진행되더라도, 그의 많은 대안들, 수정들, 변형들은 당업자들에게 명백할 것이다. 따라서, 본 발명은 넓게 해석되도록 의도되었으며, 첨부된 청구범위에 개시된 대로만 정의된다.Features of embodiments of the present invention will become apparent as the following detailed description proceeds, with reference to the drawings in which like reference numerals denote like parts.
1 is a block diagram of a system including an embodiment of a network interface controller that supports receive side scaling.
2 is a block diagram illustrating an embodiment of the network interface controller and memory shown in FIG.
Figure 3 is a flow diagram of an embodiment of a method for dynamically adjusting core affinity settings in accordance with the principles of the present invention.
4 is a block diagram illustrating another embodiment of the network interface controller and memory shown in FIG.
Although the following detailed description proceeds with reference to the illustrative embodiments of the invention, many alternatives, modifications and variations thereof will be apparent to those skilled in the art. Accordingly, the invention is intended to be broadly construed and defined only as set forth in the appended claims.

컴퓨터 시스템은 네트워크로부터 네트워크 패킷을 수신하고 복수의 프로세서 코어 중 하나에 프로세싱을 위해 수신한 네트워크 패킷을 전달하는 네트워크 인터페이스 제어기(어댑터, 카드)를 포함할 수 있다. 프로세스 코어들로 분배되는 워크로드는 네트워크 인터페이스 제어기에 의해 수신되는 네트워크 패킷들의 프로세싱을 포함할 수 있다.The computer system may include a network interface controller (adapter, card) that receives network packets from the network and forwards the received network packets for processing to one of the plurality of processor cores. The workloads distributed to the process cores may include processing of network packets received by the network interface controller.

예를 들면, 컴퓨터 시스템에서, 네트워크 패킷들의 프로세싱은 동일한 트래픽 플로우(예를 들면, 동일한 소스 어드레스 및 목적지 어드레스를 갖는 네트워크 패킷들)의 프로세싱이 동일한 프로세서 코어에 의해 수행되도록 프로세서 코어들로 분배될 수 있다. 워크로드가 낮을 때, 운영 체계는 복수의 프로세서 코어의 서브세트만을 사용하고 다른 프로세서 코어들을 저-전력 유휴 상태로 설정할 수 있다. 그러나, 특정 트래픽 플로우를 위해 프로세싱될 수신된 네트워크 패킷이 저-전력 유휴 상태인 프로세서 코어에 할당되면(트래픽 플로우에 대한 코어 어피니티 설정), 그 프로세서 코어는 저-전력 유휴 상태로부터 깨어난다. 결과적으로, 저-전력 유휴 상태인 프로세서 코어들은 긴 시간동안 저-전력 유휴 상태로 유지될 기회를 갖지 못한다.For example, in a computer system, the processing of network packets may be distributed to processor cores such that processing of the same traffic flow (e.g., network packets having the same source address and destination address) is performed by the same processor core have. When the workload is low, the operating system can only use a subset of the plurality of processor cores and set the other processor cores in a low-power idle state. However, if a received network packet to be processed for a particular traffic flow is assigned to a low-power idle processor core (core affinity setting for traffic flow), then the processor core wakes up from a low-power idle state. As a result, processor cores that are low-power idle do not have the opportunity to remain in a low-power idle state for a long time.

본 발명의 실시예는 운영 체계가 프로세서 코어들 중 어느 것을 저-전력 유휴 상태로 설정했는가에 기초하여 네트워크 패킷 프로세싱에 대한 코어 어피니티 설정을 동적으로 조정한다.Embodiments of the present invention dynamically adjust core affinity settings for network packet processing based on whether the operating system has set the processor cores to a low-power idle state.

본 발명의 실시예는 Microsoft's® Windows® 운영 체계(OS)에 의해 사용되는 수신측 스케일링(RSS)을 지원하는 네트워크 인터페이스 제어기를 갖는 컴퓨터 시스템에 대해 설명할 것이다. 그러나, 본 발명은 RSS 만으로 국한되지는 않는다. 다른 실시예들에서는, 네트워크 어댑터가 전력-절약 모드 기능을 갖는 스케쥴러를 포함하는 리눅스 운영 체계 또는 전력-절약 모드를 포함하는 임의의 다른 운영 체계에 의해 사용되는 스케일러블 입력/출력(I/O)을 지원할 수 있다.Embodiments of the present invention will be described with respect to a computer system having a network interface controller that supports Receive Side Scaling (RSS) used by the Microsoft's® Windows® operating system (OS). However, the present invention is not limited to RSS only. In other embodiments, the network adapter may be a scalable input / output (I / O) device used by a Linux operating system including a scheduler with power-saving mode capability or any other operating system including power- .

도 1은 수신측 스케일링을 지원하는 네트워크 인터페이스 제어기(108)의 실시예를 포함하는 시스템(100)의 블록도이다. 시스템(100)은 프로세서(101), 메모리 제어기 허브(MCH)(102) 및 입력/출력(I/O) 제어기 허브(ICH)(104)를 포함한다. MCH(102)는 프로세서(101)와 메모리(110) 사이의 통신을 제어하는 메모리 제어기(106)를 포함한다. 프로세서(101) 및 MCH(102)는 시스템 버스(116)를 통해 통신한다.1 is a block diagram of a system 100 that includes an embodiment of a network interface controller 108 that supports receive side scaling. The system 100 includes a processor 101, a memory controller hub (MCH) 102, and an input / output (I / O) controller hub (ICH) The MCH 102 includes a memory controller 106 that controls communication between the processor 101 and the memory 110. [ The processor 101 and the MCH 102 communicate via the system bus 116.

프로세서(101)는 Intel® Pentium D, Intel® Xeon® 프로세서, 또는 Intel® Core® 듀오 프로세서, Intel® Core™ i7 프로세서 또는 프로세서의 임의의 다른 유형과 같은 멀티-코어 프로세서일 수 있다. 도시된 실시예에서, 시스템은 적어도 두개의 프로세서 코어들("코어들")(122)을 각각이 갖는 두개의 멀티-코어 프로세서들(101)을 포함한다. 일 실시예에서, 각각의 멀티-코어 프로세서는 4개의 코어들(122)을 포함한다.The processor 101 may be an Intel® Pentium D, an Intel® Xeon® processor, or an Intel® Core ™ Duo processor, an Intel® Core ™ i7 processor, or any other type of processor. In the illustrated embodiment, the system includes two multi-core processors 101 each having at least two processor cores ("cores") 122. In one embodiment, each multi-core processor includes four cores 122. In one embodiment,

메모리(110)는 DRAM(Dynamic Random Access Memory), SRAM(Static Random Access Memory), SDRAM(Synchronized Dynamic Random Access Memory), DDR2(Double Data Rate 2) RAM 또는 RDRAM(Rambus Dynamic Random Access Memory) 또는 메모리의 임의의 다른 유형일 수 있다.The memory 110 may be a memory such as a Dynamic Random Access Memory (DRAM), a Static Random Access Memory (SRAM), a Synchronized Dynamic Random Access Memory (SDRAM), a Double Data Rate 2 (DDR2) RAM, a Rambus Dynamic Random Access Memory But may be of any other type.

ICH(104)는 DMI(Direct Media Interface)와 같은 고속 칩-대-칩 인터커넥트(114)를 사용하여 MCH(102)에 연결될 수 있다. DMI는 2개의 단방향성 레인들을 통해 2 기가비트/초 병행 전송 레이트를 지원한다.ICH 104 may be coupled to MCH 102 using a high speed chip-to-chip interconnect 114 such as a Direct Media Interface (DMI). DMI supports 2 gigabit / second concurrent transmission rates through two unidirectional lanes.

ICH(104)는 ICH(104)에 연결된 적어도 하나의 저장 디바이스(112)와의 통신을 제어하기 위한 저장 입력/출력(I/O) 제어기를 포함할 수 있다. 예를 들면, 저장 디바이스는 디스크 드라이브, 디지털 비디오 디스크(DVD) 드라이브, 콤팩트 디스크(CD) 드라이브, RAID(Redundant Array of Independent Disks), 테이프 드라이브 또는 다른 저장 디바이스일 수 있다. ICH(104)는 SAS(Serial Attached Small Computer System Interface) 또는 SATA(Serial Advanced Technology Attachment)와 같은 시리얼 저장 프로토콜을 사용하여 저장 프로토콜 인터커넥트(118)를 통해 저장 디바이스(112)와 통신할 수 있다.The ICH 104 may include a storage input / output (I / O) controller for controlling communication with at least one storage device 112 coupled to the ICH 104. For example, the storage device may be a disk drive, a digital video disk (DVD) drive, a compact disk (CD) drive, a Redundant Array of Independent Disks (RAID), a tape drive, or other storage device. ICH 104 may communicate with storage device 112 via storage protocol interconnect 118 using a serial storage protocol such as Serial Attached Small Computer System Interface (SAS) or Serial Advanced Technology Attachment (SATA).

다른 실시예에서, 시스템(100)에서 네트워크 인터페이스 제어기(108)는 저장 I/O 제어기(120)를 포함하지 않는 ICH(104)에 포함되거나 시스템 카드 슬롯 내에 삽입되는 개별 네트워크 인터페이스 카드 상에 포함될 수 있다.In another embodiment, network interface controller 108 in system 100 may be included in an ICH 104 that does not include a storage I / O controller 120, or may be included on a separate network interface card inserted into a system card slot have.

도 2는 도 1에 도시된 네트워크 인터페이스 제어기(108) 및 메모리(110)의 실시예를 도시하는 블록도이다. 네트워크 인터페이스 제어기(108)는 해시 함수 유닛(220), 간접 테이블(230) 및 복수의 하드웨어 수신 큐들(202)을 포함한다. 메모리(110)는 운영 체계 커널(280), 필터 드라이버(210) 및 네트워크 디바이스 드라이버(미니포트 드라이버)(270)를 포함한다. 실시예에서, 해시 함수 유닛(220) 및 간접 테이블(230)은 네트워크 인터페이스 제어기("NIC")(108)에 포함된다. 다른 실시예에서, 이 컴포넌트들의 일부 또는 이 컴포넌트들의 일부 구성요소들은 네트워크 인터페이스 제어기(108) 외부에 위치할 수 있다.2 is a block diagram illustrating an embodiment of the network interface controller 108 and memory 110 shown in FIG. The network interface controller 108 includes a hash function unit 220, an indirect table 230 and a plurality of hardware receive queues 202. The memory 110 includes an operating system kernel 280, a filter driver 210, and a network device driver (miniport driver) In an embodiment, the hash function unit 220 and the indirect table 230 are included in a network interface controller ("NIC") 108. In other embodiments, some of these components or some of the components may be located outside the network interface controller 108.

도시된 실시예에서, 미니포트 드라이버(270) 및 필터 드라이버(210)는 Microsoft® Windows® 운영 체계 모델(WDM)의 컴포넌트이다. WDM은 클래스 드라이버들 또는 미니포트 드라이버들일 수 있는 디바이스 함수 드라이버들을 포함한다. 미니포트 드라이버는 특정 유형의 디바이스, 예를 들면, 특정 네트워크 인터페이스 제어기(108)를 지원할 수 있다. 필터 드라이버(210)는 가치를 부가하는 선택적 드라이버이거나 함수 드라이버(미니포트 드라이버)(270)의 동작을 수정한다.In the illustrated embodiment, miniport driver 270 and filter driver 210 are components of the Microsoft® Windows® operating system model (WDM). WDM includes device function drivers that can be class drivers or miniport drivers. The miniport driver may support a specific type of device, for example, a particular network interface controller 108. [ The filter driver 210 is an optional driver that adds value or modifies the operation of the function driver (miniport driver) 270.

네트워크 드라이버 인터페이스 사양(NDIS)은 네트워크 인터페이스 제어기들(110)에 대한 API(Application programming interface)를 통해 액세스되는 함수의 라이브러리이다. NDIS는 7 레이어 오픈 시스템 인터커넥트(OSI)의 레이어 2(링크 레이어)와 레이어 3(네트워크 레이어) 사이의 인터페이스로 동작한다. 라이브러리 함수는 객체 식별자들(OID)을 포함한다.The Network Driver Interface Specification (NDIS) is a library of functions accessed through an application programming interface (API) to the network interface controllers 110. NDIS operates as an interface between Layer 2 (link layer) and Layer 3 (network layer) of the 7 Layer Open System Interconnect (OSI). The library function contains object identifiers (OIDs).

네트워크 인터페이스 제어기(108)는 수신된 네트워크 패킷 내의 해시 유형 별 해시 값(예를 들면, 헤더 내의 하나 이상의 필드들)을 연산하는 데에 해시 함수 유닛(220)의 해시 함수를 사용할 수 있다. 해시 값의 많은 최하위 비트(least significant bit)들은 수신한 데이터 패킷의 프로세싱을 다루는 프로세서 코어(122)(도 1) 및 수신한 데이터 패킷을 저장하는 복수의 수신 큐들(202) 중 하나를 식별하는 간접 테이블(230) 내의 엔트리를 인덱스하는 데에 사용될 수 있다. 네트워크 인터페이스 제어기(108)는 식별된 프로세서 코어(122)를 인터럽트할 수 있다.The network interface controller 108 may use a hash function of the hash function unit 220 to compute a hash value for each hash type in the received network packet (e.g., one or more fields in the header). Many of the least significant bits of the hash value are indirectly identified by processor core 122 (FIG. 1) that handles the processing of the received data packet and one of a plurality of receive queues 202 that store the received data packet May be used to index an entry in table 230. < RTI ID = 0.0 > The network interface controller 108 may interrupt the identified processor core 122. [

실시예에서, 각각의 네트워크 패킷이 네트워크 인터페이스 제어기(108)에 의해 수신됨에 따라, 네트워크 패킷과 연관된 "플로우"가 결정된다. 전송 제어 프로토콜(TCP) 패킷의 "플로우"는 패킷에 포함된 인터넷 프로토콜(IP) 헤더 및 TCP 헤더의 필드값에 기초하여 결정될 수 있다. 예를 들면, "플로우"에 대한 플로우 식별자는 IP 헤더에 포함된 IP 소스 어드레스 및 IP 목적지 어드레스와 수신된 네트워크 패킷의 TCP 헤더에 포함된 소스 포트 어드레스 및 목적지 포트 어드레스의 조합에 의존할 수 있다.In the embodiment, as each network packet is received by the network interface controller 108, a "flow" associated with the network packet is determined. The "flow" of a Transmission Control Protocol (TCP) packet may be determined based on the Internet Protocol (IP) header included in the packet and the field value of the TCP header. For example, the flow identifier for the "flow" may depend on the combination of the IP source address and the IP destination address included in the IP header and the source port address and destination port address included in the TCP header of the received network packet.

복수의 하드웨어 수신 큐들(202)은 수신한 네트워크 패킷들을 저장하도록 제공된다. 특정 플로우에 대한 순차 패킷 전달을 보장하도록, 각각의 하드웨어 수신 큐들(202)은 상이한 플로우에 또한 복수의 프로세서 코어들(122) 중 하나에 할당될 수 있다. 따라서, 복수의 하드웨어 수신 큐들(202)의 각각은 간접 테이블(230)을 통해 복수의 코어 프로세서들(122) 중 하나와 연관된다.A plurality of hardware receive queues 202 are provided to store the received network packets. Each hardware receive queue 202 may be assigned to a different flow and also to one of the plurality of processor cores 122 to ensure sequential packet delivery for a particular flow. Thus, each of the plurality of hardware receive queues 202 is associated with one of the plurality of core processors 122 via an indirect table 230.

도시된 실시예에서, 네트워크 인터페이스 제어기(108)는 8개의 수신 큐들(202)을 갖는다. 그러나, 수신 큐들(202)의 수는 8개로 한정되지는 않는다. 다른 실시예들에서 수신 큐들(202)이 보다 많거나 적을 수 있다. 수신된 네트워크 패킷들은 복수의 수신 큐들(202) 중 하나에 저장된다. 각각의 수신 큐들의 패킷 프로세싱은 특정 프로세서 코어(122)에 연관될 수 있다(affinitized). 본 발명의 실시예에서, 패킷 프로세싱을 위한 코어 어피니티 설정은 전력-절약 유휴 상태인('파킹된(parked)') 프로세서 코어들(122)의 수에 기초하여 동적으로 조정될 수 있다.In the illustrated embodiment, the network interface controller 108 has eight receive cues 202. However, the number of reception cues 202 is not limited to eight. In other embodiments, the number of receive cues 202 may be more or less. The received network packets are stored in one of the plurality of receive queues 202. [ The packet processing of each receive queue may be affinitized to a particular processor core 122. In an embodiment of the invention, the core affinity settings for packet processing may be dynamically adjusted based on the number of processor cores 122 that are power-saving idle ('parked').

표 1은 모든 프로세서 코어들(122)에 수신 패킷 프로세싱을 분배하기 위해 8개의 코어들과 8개의 수신 큐들을 갖는 시스템에서 프로세서 코어들(122)에 수신 큐들(202)을 초기 할당한 간접 테이블(230)의 일례를 하기에 도시한다.Table 1 shows an indirect table (FIG. 1) that initially allocates the reception queues 202 to the processor cores 122 in a system having 8 cores and 8 receive queues for distributing received packet processing to all processor cores 122 230 are shown below.

도 3은 본 발명의 원리에 따라 코어 어피니티 설정을 동적으로 조정하는 실시예의 흐름도이다.3 is a flow diagram of an embodiment for dynamically adjusting core affinity settings in accordance with the principles of the present invention.

코어 프로세서들(122)에의 수신 큐들(202)의 분배는 OS 커널에 의해 다뤄지는 OS 코어 파킹 상태에 기초하여 동적으로 수정된다. 예를 들면, 워크로드가 낮으면, OS 커널은 코어 프로세서들(122)의 일부를 저-전력 상태로 설정함으로써 그들을 "파킹(park)"할 수 있다. 예를 들면, 8개의 코어 프로세서들을 갖는 실시예에서, 8개의 코어 프로세서 중 6개가 "파킹"되고 나머지 코어 프로세서(예를 들면, 8개 중 2개)가 사용될 수 있다. 따라서, 네트워크 인터페이스 제어기로부터의 인터럽트들은 파킹되지 않은 코어로만 보내지고 파킹된 코어들은 저-전력 유휴 상태로 남아 에너지 소비를 감소시키게 된다.The distribution of the receive queues 202 to the core processors 122 is dynamically modified based on the OS core parking state handled by the OS kernel. For example, if the workload is low, the OS kernel can "park" them by setting some of the core processors 122 to a low-power state. For example, in an embodiment with eight core processors, six of the eight core processors may be "parked " and the remaining core processors (e.g., two of the eight) may be used. Thus, interrupts from the network interface controller are sent only to the non-parked core and the parked cores remain in a low-power idle state to reduce energy consumption.

예를 들면, 실시예에서, Windows® 7 서버 코어 파킹 기능이 워크로드가 낮을 때, 3개의 코어를 사용하도록 선택하고 남아있는 코어들을 파킹하면, 네트워크 인터페이스 제어기는 선택된 3개의 코어들 상에서 패킷 프로세싱 및 네트워크 인터럽트를 실행하도록 수정된다. 이것은 파킹된 코어들에 대한 더 낮은 전력 유휴 상태 레지던시를 증가시키게 된다.For example, in an embodiment, when the Windows® 7 server core parking function chooses to use three cores and parks the remaining cores when the workload is low, the network interface controller performs packet processing on the selected three cores and It is modified to execute a network interrupt. This increases the lower power idle state residency for the parked cores.

실시예에서, 10 기가비트 네트워크 인터페이스 제어기는 특정 코어에 각각이 연관되는 다중 수신측 스케일링 수신(RSS Rx) 큐들을 갖는다. 네트워크 인터페이스 제어기는 필터 드라이버(210)를 통해 OS 코어 파킹 설정의 구성을 얻고 OS 코어 파킹 구성에 기초하여 RSS 간접 테이블(230)을 수정한다.In an embodiment, the 10 Gigabit network interface controller has multiple receive side scaling receive (RSS Rx) queues each associated with a particular core. The network interface controller obtains the configuration of the OS core parking configuration through the filter driver 210 and modifies the RSS indirect table 230 based on the OS core parking configuration.

블록(300)에서, 필터 드라이버(210)는 OS 커널(280)로부터 OS 파킹 상태를 주기적으로 요청한다. 예를 들면, 실시예에서, 필터 드라이버(210)는 각각의 프로세서 코어들(122)의 파킹 상태를 질의(query)하기 위해 OS 커널(280)에 의해 제공되는 API 커맨드를 이용하여 OS 코어 파킹 상태를 요청한다. 실시예에서, 예를 들면, 필터 드라이버(210)는 매 약 1000개의 네트워크 패킷들이 수신된 후에, 수신된 네트워크 패킷들의 수에 기초하여 OS 파킹 상태를 주기적으로 요청한다.At block 300, the filter driver 210 periodically requests an OS parking state from the OS kernel 280. [ For example, in an embodiment, the filter driver 210 may use an API command provided by the OS kernel 280 to query the parking status of each of the processor cores 122 to determine the OS core parking status < RTI ID = 0.0 >Lt; / RTI > In an embodiment, for example, the filter driver 210 periodically requests an OS parking state based on the number of received network packets after each about 1000 network packets are received.

다른 실시예에서, 필터 드라이버(210)는 각각의 프로세서 코어(202)의 현재 전력 상태를 직접적으로 얻기 위해 CPU(101)에 액세스한다. 예를 들면, Intel® Core™ i7은 최적의 전력 관리를 위해 코어 레벨에서 저 전력 상태를 지원한다. 코어 전력 상태는 C0, C1, C3 및 C6를 포함한다. C0는 정상 운영 상태이며 C1, C3 및 C6는 감소한 전력 소비의 상이한 레벨들을 갖는 저 전력 상태이다. 예를 들면, C3 저 전력 상태에서, 모든 클록들은 정지하며 프로세서 코어는 캐시를 제외하고 그 모든 구조적인 상태를 유지한다.In another embodiment, the filter driver 210 accesses the CPU 101 to obtain the current power state of each processor core 202 directly. For example, Intel® Core ™ i7 supports low power states at the core level for optimal power management. The core power states include C0, C1, C3, and C6. C0 is in a normal operating state and C1, C3 and C6 are in a low power state with different levels of reduced power consumption. For example, in the C3 low-power state, all clocks are stopped and the processor core maintains all its structural state except for the cache.

각각의 프로세서 코어(202)의 현재 전력 상태는 필터 드라이버(210)에 의해 액세스 가능한 CPU(101)의 하나 이상의 레지스터에 저장된다. 특정 코어가 "파킹"되었는지 결정하기 위해, 필터 드라이버(210)는 코어들에 대한 전력 상태를 저장하는 레지스터를 주기적으로 판독하며 시간 주기에 걸쳐 판독한 코어의 전력 상태 유형의 분포에 기초하여 코어가 "파킹"되었는지를 추정한다. 예를 들면, 시간 주기 내에서, 레지스터들은 n 회 판독되고 코어는 각 회 저 전력 상태라면, 코어는 "파킹"되어 있다. 시간 주기동안, 코어가 n 회 고 전력 상태라면, 코어는 "파킹"된 것이 아니다. 시간 주기동안, 코어의 전력 상태가 상이하면, 즉, 전력 상태가 저 전력 상태인 횟수가 전력 상태가 고 전력 상태인 횟수보다 많으면, 코어는 "파킹"된 것으로 추정된다. 프로세싱은 블록(302)에서 계속된다.The current power state of each processor core 202 is stored in one or more registers of the CPU 101 accessible by the filter driver 210. To determine if a particular core has been "parked ", the filter driver 210 periodically reads a register that stores the power state for the cores and, based on the distribution of the power state type of the core read over time period, "Parked" For example, within a time period, the cores are "parked" if the registers are read n times and the cores are in each low power state. During the time period, if the core is in the high power state n times, then the core is not "parked. &Quot; During a time period, if the power state of the core is different, i. E., The number of times the power state is in the low power state is greater than the number of times the power state is in the high power state, then the core is estimated to be "parked. Processing continues at block 302.

블록(302)에서, 각각의 프로세서 코어들(122)의 파킹 상태를 수신하면, 필터 드라이버(210)는 어떤 프로세서 코어들(122)의 파킹 상태가 변경되었는지를 결정한다. 변경되지 않았다면, 프로세싱은 프로세서 코어들(122)의 파킹 상태를 주기적으로 요청하기를 계속하는 블록(300)에서 계속된다.At block 302, upon receiving the parked state of each processor core 122, the filter driver 210 determines which of the processor cores 122 has changed the parked state. If not, the processing continues at block 300, which continues to periodically request the parking status of the processor cores 122.

변경되었다면, 필터 드라이버(210)는 파킹 상태에 기초하여 NIC(108)의 간접 테이블(230)에 저장되는 새로운 데이터를 생성한다. 필터 드라이버(210)는 프로세서 코어들의 파킹 상태에 기초하여 NIC의 RSS 파라미터를 수정하는 데에 OID_GEN_RECEIVE_SCALE_PARAMETERS 객체 식별자(OID)를 사용한다. OID_GEN_RECEIVE_SCALE_PARAMETERS OID는 RSS 파라미터를 특정하는 NDIS_RECEIVE_SCALE_PARAMETERS를 포함한다.If so, the filter driver 210 generates new data stored in the indirect table 230 of the NIC 108 based on the parked state. The filter driver 210 uses the OID_GEN_RECEIVE_SCALE_PARAMETERS object identifier (OID) to modify the RSS parameters of the NIC based on the parked state of the processor cores. OID_GEN_RECEIVE_SCALE_PARAMETERS The OID contains NDIS_RECEIVE_SCALE_PARAMETERS specifying the RSS parameters.

실시예에서, 그 구조는 객체가 RSS 파라미터를 포함하는 것을 특정하는 유형을 갖는 헤더, 간접 테이블 및 연관된 멤버들이 변경되었는지를 나타내는 플래그, 및 간접 테이블의 사이즈를 포함한다. 코어 파킹 상태에 기초한 간접 테이블(230)에 저장되는 새로운 데이터는 다른 구조 멤버들 후에 첨부된다. 프로세싱은 블록(304)에서 계속된다.In an embodiment, the structure includes a header having a type that specifies that the object includes an RSS parameter, a flag indicating whether the indirect table and associated members have changed, and the size of the indirect table. New data stored in the indirect table 230 based on the core parking state is appended after other structure members. Processing continues at block 304.

블록(304)에서, OID_GEN_RECEIVE_SCALE_PARAMETERS OID의 수신을 검출함에 따라, 미니포트 드라이버(270)는 간접 테이블(230)에 간접 테이블(230)에 대해 수신된 새로운 데이터를 저장한다. 예를 들면, 8개의 수신(RSS Rx) 큐(202)가 있는 실시예에서, 검색된 코어 파킹 상태가 프로세서 코어 0과 프로세서 코어 4만이 파킹되지 않았음을 가리키면, 라운드 로빈(round-robin) 코어 할당을 수행하여 하기의 표 2에 도시된 바와 같이 새로운 데이터(간접 테이블의 콘텐츠)를 간접 테이블(230)에 저장할 수 있다. 프로세서 코어 0은 수신 큐들 0, 2, 4, 및 6에 할당되고, 프로세서 코어 4는 수신 큐들 1, 3, 5, 및 7에 할당된다.At block 304, upon detecting the receipt of the OID_GEN_RECEIVE_SCALE_PARAMETERS OID, the miniport driver 270 stores the new data received for the indirect table 230 in the indirect table 230. [ For example, in an embodiment with eight receive (RSS Rx) queues 202, if the retrieved core parking state indicates that only processor core 0 and processor core 4 are not parked, then a round-robin core allocation (The contents of the indirect table) in the indirect table 230 as shown in Table 2 below. Processor core 0 is assigned to receive cues 0, 2, 4, and 6, and processor core 4 is assigned to receive cues 1, 3, 5, and 7.

예를 들면, 표 2에 저장되는 데이터는 하기에 도시된 바와 같이 데이터 구조 IndTable로 표기될 수 있다:For example, the data stored in Table 2 may be denoted as data structure IndTable as shown below:

IndTable[0] = 0IndTable [0] = 0

IndTable[1] = 4IndTable [1] = 4

IndTable[2] = 0IndTable [2] = 0

IndTable[3] = 4IndTable [3] = 4

IndTable[4] = 0IndTable [4] = 0

IndTable[5] = 4IndTable [5] = 4

IndTable[6] = 0IndTable [6] = 0

IndTable[7] = 4IndTable [7] = 4

IndTable 구조는 미니포트 드라이버(270)에 의해 수신되는 OID에 첨부되어 미니포트 드라이버(270)가 간접 테이블(230)에 저장된 데이터를 업데이트하게 한다.The IndTable structure is appended to the OID received by the miniport driver 270 to cause the miniport driver 270 to update the data stored in the indirect table 230.

실시예는 필터 드라이버(210)를 포함하는 시스템에 대해 설명하였다. 그러나, 코어 파킹 상태의 모니터링 및 간접 테이블(230)의 수정을 요청하는 것은 필터 드라이버(210)에 국한된 것은 아니다. 다른 실시예에서 이러한 기능들은 네트워크 드라이버 스택의 다른 부분에 포함될 수 있다.The embodiment has been described with respect to a system including a filter driver 210. However, it is not limited to filter driver 210 to monitor core parking status and request modification of indirect table 230. In other embodiments, these functions may be included in other portions of the network driver stack.

도 4는 도 1에 도시된 네트워크 인터페이스 제어기(108) 및 메모리(110)의 다른 실시예를 나타내는 블록도이다.4 is a block diagram illustrating another embodiment of the network interface controller 108 and memory 110 shown in FIG.

도 4에 도시된바와 같이, 네트워크 드라이버 스택에 필터 드라이버(210)(도 2)를 더하는 대신, 코어 파킹 상태의 모니터링은 OS 커널(280)에 의해 수행된다. 간접 테이블(230)을 업데이트하는 요청은 OS 커널(280)로부터 미니포트 드라이버(270)로 직접 송신된다. 도 2와 함께 필터 드라이버(210)에 대해 논의된 실시예와 유사하게, OS 커널(280)은 도 2에 도시된 실시예와 함께 논의된 바와 같이, 간접 테이블(230)에 대한 수정된 콘텐츠를 갖는 OID_GEN_RECEIVE_SCALE_PARAMETERS OID를 입력 파라미터로서 생성한다. 이 OID는 수신한 수정된 콘텐츠에 기초하여 간접 테이블(230)을 수정하는 디바이스 드라이버(270)에 직접 송신된다.As shown in FIG. 4, instead of adding the filter driver 210 (FIG. 2) to the network driver stack, monitoring of the core parking state is performed by the OS kernel 280. A request to update the indirect table 230 is sent directly from the OS kernel 280 to the miniport driver 270. [ Similar to the embodiment discussed with respect to the filter driver 210 in conjunction with Figure 2, the OS kernel 280 may include modified content for the indirect table 230, as discussed with the embodiment shown in Figure 2, Create OID_GEN_RECEIVE_SCALE_PARAMETERS OID as an input parameter. This OID is sent directly to the device driver 270 that modifies the indirect table 230 based on the received modified content.

실시예에서, RSS 얼라이먼트 기능은 RSS 간접 테이블(230)을 조정하도록 OS 코어에 부가된다.In an embodiment, the RSS alignment function is added to the OS core to adjust the RSS indirection table 230.

Windows® 운영 체계에 대한 실시예가 설명되었다. 코어 파킹 상태의 모니터링 및 모니터링되는 코어 파킹 상태에 기초한 간접 테이블(230)의 콘텐츠의 수정은 Windows® 운영 체계에 국한되지 않고, 방법은 예를 들면, 리눅스 운영 체계와 같은 다른 운영 체계에 적용될 수 있다.An embodiment of the Windows® operating system has been described. The modification of the contents of the indirect parking table 230 based on the monitoring of the core parking status and the monitored core parking status is not limited to the Windows® operating system and the method can be applied to other operating systems such as, for example, the Linux operating system .

본 발명의 다른 실시예들은 또한 본 발명의 동작들을 수행하는 명령어 포함 머신-액세스가능한 매체를 포함할 수 있다. 그러한 실시예들은 또한 프로그램 제품으로서 참조될 수 있다. 그러한 머신-액세스가능한 매체는 그 위에 명령어들(컴퓨터 판독가능한 프로그램 코드)을 저장한 컴퓨터 판독가능한 저장 매체로서, 플로피 디스크, 하드 디스크, CD-ROM, ROM, 및 RAM 및 머신 또는 디바이스에 의해 형성되거나 제조되는 부품들의 다른 유형의 구성을 제한 없이 포함할 수 있다. 명령어들은 분배 환경에서 또한 사용될 수 있으며, 단일 또는 멀티-프로세서 머신에 의한 액세스를 위해 로컬로 및/또는 원격으로 저장될 수 있다.Other embodiments of the invention may also include machine-accessible media including instructions for performing the operations of the present invention. Such embodiments may also be referred to as a program product. Such machine-accessible medium is a computer-readable storage medium having stored thereon instructions (computer-readable program code), which may be formed by a floppy disk, hard disk, CD-ROM, ROM, and RAM and a machine or device But may include, without limitation, other types of configurations of parts being manufactured. The instructions may also be used in a distributed environment and may be stored locally and / or remotely for access by a single or multi-processor machine.

본 발명의 실시예들이 그 실시예들을 참조로 하여 구체적으로 도시되고 설명되었지만, 그것은 청구범위에 의해 한정되는 본 발명의 실시예들의 범위로부터 벗어나지 않고 그 안에서 다양한 형태 및 세부사항들의 변화가 행해질 수 있다는 것이 당업자에 의해 이해될 것이다.While the embodiments of the present invention have been particularly shown and described with reference to the embodiments thereof, it will be appreciated that various changes in form and detail may be made therein without departing from the scope of the embodiments of the invention which are defined by the claims Will be understood by those skilled in the art.

Claims

Monitoring a low power idle state of each of the plurality of processing units; And
And re-allocating the flows allocated to the processing units having a low power idle state to other processing units having a non-low power idle state such that energy consumption by the plurality of processing units And causing only network interface controller interrupts to be sent to the processing units having the non-low power idle state
&Lt; / RTI >

The method according to claim 1,
Adaptively re-assigning flows includes reallocating the flows through an indirect table, wherein the indirect table includes a plurality of entries, each entry having a processing associated with a flow associated with a queue And identifying the unit.

The method according to claim 1,
Wherein monitoring the low power idle state of the plurality of processing units is performed by an operating system.

The method according to claim 1,
Wherein monitoring the low power idle state of the plurality of processing units is performed by a driver associated with the network interface controller.

delete

A computer-readable storage medium storing instructions,
The command, when accessed, causes the computer to:
Monitoring a low power idle state of each of the plurality of processing units; And
Allocating flows assigned to the processing units having a low power idle state to other processing units having a non-low power idle state to reduce energy consumption by the plurality of processing units, Causing only controller interrupts to be sent to the processing units having the non-low power idle state
Readable < / RTI > storage medium.

The method according to claim 6,
Adaptively re-assigning flows includes reallocating the flows through an indirect table, wherein the indirect table includes a plurality of entries, each entry having a processing associated with a flow associated with a queue A computer-readable storage medium comprising an identification of a unit.

The method according to claim 6,
Wherein monitoring the low power idle state of the plurality of processing units is performed by an operating system.

The method according to claim 6,
Wherein monitoring the low power idle state of the plurality of processing units is performed by a driver associated with a network interface controller.

A plurality of processing units; And
Monitors the low power idle state of each of the plurality of processing units and adaptively reassigns the flows assigned to the processing units having a low power idle state to other processing units having a non-low power idle state A module that reduces energy consumption by the plurality of processing units and only causes network interface controller interrupts to be sent to the processing units having the non-
/ RTI >

11. The method of claim 10,
Further comprising an indirect table comprising a plurality of entries, each entry including an identification of a processing unit to which a flow associated with the queue is assigned, the module adapted to re- allocate the flows via the indirect table A device that reallocates to a target device.

11. The method of claim 10,
Wherein the module is included in an operating system.

11. The method of claim 10,
Wherein the module is a driver associated with a network interface controller.

14. The method of claim 13,
Wherein the driver is a filter driver.

11. The method of claim 10,
And a disk drive for storing data packets associated with flows received from the network.

delete