KR20120046232A

KR20120046232A - Altering performance of computational units heterogeneously according to performance sensitivity

Info

Publication number: KR20120046232A
Application number: KR1020127003135A
Authority: KR
Inventors: 세바스티앙 누스바움; 알렉산더 브래노버; 존 칼라마티오노스
Original assignee: 어드밴스드 마이크로 디바이시즈, 인코포레이티드
Priority date: 2009-07-24
Filing date: 2010-07-23
Publication date: 2012-05-09
Also published as: JP2013500520A; WO2011011668A1; JP5564564B2; EP2457139A1; CN102483646B; IN2012DN00933A; CN102483646A; WO2011011670A1; WO2011011673A1

Abstract

컴퓨터 시스템의 하나 이상의 연산 유닛들 중 어느 것이 다른 연산 유닛들보다 높은 성능 민감도를 가지는지에 따라 하나 이상의 연산 유닛들이 성능의 측면에서 선택적으로 변경된다.One or more computational units are selectively modified in terms of performance depending on which of the one or more computational units of the computer system has higher performance sensitivity than other computational units.

Description

Differential Performance Change of Computation Units According to Performance Sensitivity {ALTERING PERFORMANCE OF COMPUTATIONAL UNITS HETEROGENEOUSLY ACCORDING TO PERFORMANCE SENSITIVITY}

본 발명은 컴퓨터 시스템에 있어서 전력 할당에 관한 것이며 더욱 상세하게는 성능을 개선하도록 전력을 할당하는 것에 관한 것이다.FIELD OF THE INVENTION The present invention relates to power allocation in computer systems and more particularly to allocating power to improve performance.

프로세서들은 전력 소비와 작업부하 요구조건들을 매칭시키려는 노력으로 다양한 성능 레벨들에서 동작한다. 성능 레벨들은 전형적으로 프로세서에 의해 사용되는 전압/주파수 조합들에 의해 결정된다. 프로세서들이 다중 코어들 및 다른 기능들을 갖추며 더욱더 고도로 집적되면서, 전력 및 열적 고려사항들이 상당한 중요성을 가진다.Processors operate at various performance levels in an effort to match power consumption and workload requirements. Performance levels are typically determined by the voltage / frequency combinations used by the processor. As processors become more highly integrated with multiple cores and other functions, power and thermal considerations become of great importance.

개선된 성능을 제공하기 위하여, 한 실시예는 예컨대 주파수 변화로부터 발생되는 실행 능력의 변화에 대한 연산 유닛의 성능 민감도와 시스템에서 이용가능한 전력 헤드룸(일정한 전력 엔벨롭 내에서 시스템 성능을 개선하기 위함)을 기반으로 프로세싱 코어들 및 그래픽 처리 유닛들과 같은 연산 유닛들 상에서 실행되는 작업부하의 분석을 가능하게 한다.In order to provide improved performance, one embodiment is to improve the system performance within a certain power envelope (eg, a constant power envelope) and the power headroom available in the system for the performance sensitivity of the computational unit to changes in execution capability resulting from frequency changes, for example. Enable analysis of the workload executed on computing units such as processing cores and graphics processing units.

이에 따라, 한 실시예에서, 복수의 연산 유닛들을 포함하는 컴퓨터 시스템의 동작 방법이 제공된다. 상기 방법은 연산 유닛들의 각각의 성능 민감도에 따라 하나 이상의 연산 유닛들의 성능을 변경하는 것을 포함한다. 상기 방법은 하나 이상의 연산 유닛들 중 어느 것이 다른 연산 유닛들보다 더 높은 성능 민감도를 가지는지에 따라 하나 이상의 연산 유닛들의 성능을 변경하는 것을 포함할 수 있다. 한 실시예에서, 연산 유닛들은 프로세싱 코어들의 그룹을 포함하고, 상기 방법은 만일 프로세싱 코어들의 그룹의 성능을 부스트했을 때의 예상 전력 마진이 0보다 작으면, 더 작은 그룹을 형성하도록 상기 그룹에서 다른 것들보다 낮은 성능 민감도를 갖는 코어를 상기 그룹으로부터 제거하는 것과; 그리고 새로운 예상 전력 마진을 계산하고 상기 더 작은 그룹의 코어들의 성능이 부스트되면 상기 새로운 예상 전력 마진이 0보다 큰지를 판별하는 것을 더 포함한다. 만일 상기 새로운 예상 전력 마진이 0보다 크다면, 상기 더 작은 그룹의 코어들의 성능이 부스트된다. 상기 더 작은 그룹의 코어들은 적어도 이들 코어들에 공급되고 있는 클록 신호의 주파수를 증가시킴으로써 부스트될 수 있다.Thus, in one embodiment, a method of operating a computer system including a plurality of computing units is provided. The method includes changing the performance of one or more computing units in accordance with the performance sensitivity of each of the computing units. The method may include varying the performance of one or more computational units depending on which of the one or more computational units has higher performance sensitivity than other computational units. In one embodiment, the computing units comprise a group of processing cores, and the method differs from the group to form a smaller group if the expected power margin when boosting the performance of the group of processing cores is less than zero. Removing from the group cores with lower performance sensitivity than those; And calculating a new expected power margin and determining if the new expected power margin is greater than zero if the performance of the smaller group of cores is boosted. If the new expected power margin is greater than zero, the performance of the smaller group of cores is boosted. The smaller group of cores can be boosted by increasing the frequency of the clock signal being supplied to at least these cores.

연산 유닛들의 각각의 성능 민감도는 제1 및 제2 성능 레벨들에서 결정되는 각각의 연산 유닛들의 제1 및 제2 성능 메트릭들에 따라 결정될 수 있다.The performance sensitivity of each of the computing units may be determined according to the first and second performance metrics of the respective computing units determined at the first and second performance levels.

또 다른 실시예에서, 복수의 연산 유닛들을 포함하는 장치가 제공된다. 상기 장치는 연산 유닛들에 대한 각각의 성능 민감도를 저장하는 저장부를 더 포함한다. 전력 할당 기능부가 하드웨어, 펌웨어, 및/또는 소프트웨어로 구현되며, 성능 민감도들에 따라 연산 유닛들 중 하나 이상의 성능을 부스트한다.In yet another embodiment, an apparatus is provided that includes a plurality of computing units. The apparatus further includes a storage for storing each performance sensitivity for the computing units. The power allocation function is implemented in hardware, firmware, and / or software and boosts the performance of one or more of the computing units in accordance with performance sensitivity.

전력 할당 기능부는 연산 유닛들 중 어느 것들이 상기 연산 유닛들 중 다른 것들보다 높은 성능 민감도를 가지는지에 응답한다.The power allocation function responds to which of the computing units have higher performance sensitivity than the others of the computing units.

전력 할당 기능부는 연산 유닛들의 각각의 성능 민감도를 임계값과 비교하고 상기 임계값보다 높은 성능 민감도를 가지는 연산 유닛들을 부스트하도록 구성될 수 있다.The power allocation function may be configured to compare the performance sensitivity of each of the computing units with a threshold and to boost the computing units having a performance sensitivity higher than the threshold.

전력 할당 기능부는 또한 예상 전력 마진이 한 그룹의 코어들 모두를 부스트된 성능 상태로 부스트하기에 충분하지 않는 것에 응답하여 상기 그룹의 연산 유닛들로부터 하나 이상의 연산 유닛들을 제거하고, 새로운 예상 전력 마진을 재계산할 수 있고, 상기 제거는 상기 그룹의 하나 이상의 연산 유닛들이 상기 그룹의 다른 연산 유닛들보다 낮은 성능 민감도를 가지는 것에 따라 결정되고, 상기 제거와 재계산은 상기 새로운 예상 전력 마진이 0보다 커서 남아있는 연산 유닛들의 성능을 부스트할 수 있는 만큼 될 때까지 반복된다.The power allocation function also removes one or more compute units from the compute units of the group in response to the expected power margin not sufficient to boost all of the cores of the group to the boosted performance state, and to generate a new expected power margin. Can be recalculated, and the removal is determined as one or more computing units in the group have a lower performance sensitivity than the other computing units in the group, and the removal and recalculation is such that the new expected power margin remains greater than zero. The process is repeated until the performance of the computing units can be boosted.

본 발명은 첨부된 도면들을 참조함으로써 더욱 잘 이해될 수 있으며 본 발명의 여러 목적들, 특징들, 및 장점들도 본 발명이 속하는 기술 분야의 통상의 기술자들에게 명백하게 될 것이다.
도 1은 본 발명의 한 실시예에 따라 예시적인 시스템-온-칩(SOC) 시스템의 상위 레벨 블록도를 도시한 것이다.
도 2는 본 발명의 한 실시예에 따라 코어 주파수 변화에 대한 성능 민감도를 프로파일링하기 위한 상위 레벨 순서도를 예시한 것이다.
도 3은 시스템 블록도 레벨에서 주파수 연습(frequency training)을 예시한 것이다.
도 4는 주파수 연습의 추가 양상들을 예시한 것이다.
도 5는 본 발명의 한 실시예에 따라 전력 재할당의 예시적인 순서도를 예시한 것이다.
도 6은 주파수 민감도에 따라 연산 유닛들의 속도를 조절하기 위한 예시적인 순서도를 예시한 것이다.
서로 다른 도면들에서 동일한 참조 부호들을 사용하는 것은 유사하거나 일치하는 항목들임을 가리킨다.The present invention can be better understood by reference to the accompanying drawings and various objects, features, and advantages of the present invention will become apparent to those skilled in the art to which the present invention pertains.
1 illustrates a high level block diagram of an exemplary system-on-chip (SOC) system in accordance with one embodiment of the present invention.
2 illustrates a high level flow chart for profiling performance sensitivity to core frequency changes in accordance with one embodiment of the present invention.
3 illustrates frequency training at the system block diagram level.
4 illustrates additional aspects of frequency practice.
5 illustrates an exemplary flowchart of power reallocation in accordance with an embodiment of the present invention.
6 illustrates an example flowchart for adjusting the speed of computational units in accordance with frequency sensitivity.
Using the same reference numerals in different drawings indicates that the items are similar or identical.

프로세서 집적 회로가 열 설계점(thermal design point, TDP) 아래에서 동작하고 있을 때, 기회주의적으로 다중 코어 프로세서들 상의 CPU 코어들의 성능 레벨을 상승시키는(예컨대, 주파수를 상승시키는) 여러 방법들이 제안되어 왔다. 집적 회로가 동작하는 실제 열점(thermal point)은 열 측정, 스위칭 거동의 측정, 또는 전류의 측정에 의해 결정될 수 있다. 이러한 접근법들은 정해진 TDP 하에서 추정된 전력, 전류, 또는 열에 대해 헤드룸(headroom)이 있을 때 성능을 개선하기 위해 CPU 코어들의 동작 주파수가 함께 상승될 수 있도록 하고, 동작이 그 한계를 초과하고 있을 때 CPU 코어들의 동작 주파수가 감소될 수 있도록 한다. 이러한 접근법들은 모든 동작중인 CPU 코어들의 주파수가 조율된 방식으로 상승될 때 그 CPU 코어들이 그들의 최대 성능 상태에서 동작한다고 가정한 것이다.When a processor integrated circuit is operating below a thermal design point (TDP), several methods have been proposed that opportunistically increase the performance level of CPU cores on multi-core processors (eg, raise the frequency). come. The actual thermal point at which the integrated circuit operates may be determined by thermal measurement, measurement of switching behavior, or measurement of current. These approaches allow the operating frequencies of the CPU cores to be raised together to improve performance when there is headroom for the estimated power, current, or heat under a given TDP, and when the operation is exceeding its limits. Allows the operating frequency of the CPU cores to be reduced. These approaches assume that when the frequencies of all running CPU cores are raised in a coordinated manner, they are operating at their maximum performance state.

다른 접근법은 CPU 코어들 사이에 전력의 재할당을 제공한다. 성능 상태가 어떤 임계값(더 낮은 성능 상태로 정의됨) 이하에 있는 다른 코어(들)에서 이용가능한 전력 헤드룸을 재할당함으로써 P0(운영체제(OS)에 의해 설정된 최고 성능 상태)에 있는 코어가 오버-클록킹(over-clock)될 수 있다.Another approach provides reallocation of power between CPU cores. Cores at P0 (the highest performance state set by the operating system) are reassigned by reallocating power headroom available to other core (s) whose performance state is below a certain threshold (defined as lower performance state). It can be over-clocked.

코어들의 성능 상태들을 기반으로 모든 코어들 또는 하나 이상의 코어들에 대해 전력을 균일하게 증가시키는 상기 접근법들은 유휴(idle) 연산 유닛들로부터 전력이 재할당될 수 있도록 하지만, 주파수를 디더링(dither)하거나 정상 상태 주파수를 부스트(boost)할 때 모든 동작중인 유닛들을 균일하게 취급한다. 하지만, 일부 동작중인 코어들이나 다른 연산 유닛들은 더 높은 코어 주파수로부터 성능 증가를 거의 얻지 못하거나 전혀 얻지 못하고 있을 수 있는 반면, 다른 코어들이나 연산 유닛들은 코어 주파수의 증가에 대해 더 높은 민감도로 작업부하들을 처리하고 있을 수 있다. 주파수 민감도를 기반으로 동작중인 코어들이나 다른 연산 유닛들 간에 전력을 선택적으로 분배하는 것은 이질적인 작업부하들 또는 이질적인 쓰레드들을 이용한 다중 쓰레드 작업부하들에 대해 전반적인 시스템 처리량이 증가될 수 있게 한다. 그것은 코어 주파수의 변화에 대한 작업부하의 민감도를 식별하는 효과적인 접근법을 필요로 한다.The above approaches to uniformly increase power for all cores or one or more cores based on the performance states of the cores allow power to be reallocated from idle computing units, but dither frequency or Treat all running units uniformly when boosting the steady state frequency. However, some running cores or other computational units may get little or no performance gain from higher core frequencies, while other cores or computational units may place workloads at higher sensitivity to an increase in core frequency. It may be processing. Selectively distributing power between cores or other compute units running on the basis of frequency sensitivity can increase overall system throughput for heterogeneous workloads or multithreaded workloads with heterogeneous threads. It requires an effective approach to identify the workload's sensitivity to changes in core frequency.

도 1은 본 발명의 한 실시예를 포함하는 예시적인 시스템-온-칩(System on a Chip, SOC)(100)을 상위 레벨에서 도시한 것이다. SOC(100)는 다중 CPU 프로세싱 코어들(101), GPU(Graphics Processing Unit)(103), I/O 브리지(Bridge)(105)(일부 실시예들에서는 사우스-브리지(South-Bridge)로 명명됨), 및 노스-브리지(North-Bridge)(107)(일부 실시예들에서는 메모리 제어기(Memory Controller)와 결합될 수 있음)를 포함한다. 전력 할당 제어기(109)는 열 설계점(Thermal Design Point, TDP) 전력 헤드룸을 온-다이(on-die) 컴포넌트들 또는 온-플랫폼(on-platform) 컴포넌트들에게 할당하는 것을 제어하는 기능 요소이다. 성능 분석 제어 로직(111)은 본 명세서에서 더 설명되는 바와 같이 코어들 및 다른 연산 유닛들의 성능 민감도를 분석한다. 유의할 점은 전력 할당 제어(109)와 성능 분석 센터(111)가 노스-브리지(107)의 일부인 것으로 도시되어 있지만 다른 실시예들에서는 SOC(100)의 다른 곳에 위치될 수도 있다는 것이다.1 illustrates an exemplary System on a Chip (SOC) 100 at a high level that includes one embodiment of the present invention. The SOC 100 may be referred to as multiple CPU processing cores 101, a graphics processing unit (GPU) 103, an I / O bridge 105 (in some embodiments, named South-Bridge). And a North-Bridge 107 (which in some embodiments may be combined with a Memory Controller). The power allocation controller 109 is a functional element that controls the allocation of thermal design point (TDP) power headroom to on-die components or on-platform components. to be. The performance analysis control logic 111 analyzes the performance sensitivity of the cores and other computational units as further described herein. Note that although power allocation control 109 and performance analysis center 111 are shown as being part of north-bridge 107, other embodiments may be located elsewhere in SOC 100.

TDP(Thermal Design Point)는 전체 SOC에 의해 소비될 수 있는 전력을 나타내며, 이는 폼-팩터(form-factor), 이용가능한 냉각 방법, AC 어댑터/배터리, 및 전압 조정기(voltage regualtor)와 같은 인자들에 따라 달라진다. SOC 성능은 현재의 TDP 내에서 최적화되며, 한 실시예에서 TDP에 해당하는 전력 한계는 절대 초과하지 않는다. SOC 전력 한계를 SOC_TDP_Limit이라고 가정하자. SOC 특성 평가는 전형적으로 SOC_TDP_Limit 내에 머물러 있으면서 각각의 온-다이 컴포넌트들에 대해 최대 전력을 할당하는 것을 기반으로 한다. 이는 심지어 최대로 예상되는 동작이 최고 동작점(주파수(F)와 전압(V)에 있어서)에서 실행되더라도 이로 인해 전력이 할당된 엔벨롭(envelope)을 초과하지 않도록 최고 동작점을 설정함으로써 이루어진다. 예를 들어, 4-코어 SOC의 최대 전력이 40w TDP 엔벨롭으로 제한된다고 가정하자. 표 1은 각각의 온-다이 컴포넌트들에 대해 할당된 전력 예산을 항목별로 정리한 것이다.The Thermal Design Point (TDP) represents the power that can be consumed by the entire SOC, which includes factors such as form-factor, available cooling methods, AC adapter / battery, and voltage regualtor. Depends on. SOC performance is optimized within the current TDP, and in one embodiment the power limit corresponding to the TDP never exceeds. Assume the SOC power limit is SOC_TDP_Limit. SOC characterization is typically based on allocating maximum power for each on-die components while remaining within SOC_TDP_Limit. This is accomplished by setting the highest operating point so that power does not exceed the assigned envelope, even if the maximum expected operation is performed at the highest operating point (at frequency F and voltage V). For example, suppose the maximum power of a four-core SOC is limited to a 40w TDP envelope. Table 1 lists the power budget allocated for each of the on-die components.

온-다이 컴포넌트 On-die component 할당된 전력Allocated power 코어0 Core 0 8w8w 코어1 Core 1 8w8w 코어2 Core 2 8w8w 코어3 Core3 8w8w GPU GPU 5w5w 메모리 제어기 Memory controller 2w2w I/O 브리지 I / O bridge 1w1w 총합 total 40w40w

8w 전력 예산은 코어의 명목상 최고 동작점(F,V)을 정의하는 한계이고 5w 전력 예산은 GPU에 대한 한계이다. 하지만, 이 할당은 모든 온-다이 컴포넌트들의 동시 이용을 가정한 것이기 때문에 보수적으로 할당된 것이며 단지 명목상 최대이다. 실제 상황에서 대부분의 애플리케이션은 CPU 제한적(CPU-bounded)이거나 GPU 제한적(GPU-bounded)이다. 한 애플리케이션이 양 컴퓨팅 엔진들을 함께 동작시킨다 하더라도(예컨대, 비디오 재생이 일부 태스크(task)들을 프로세서 코어들에게 오프로드(offload)함) 4개의 프로세서 코어들을 모두 이용하지 않는다. 심지어 CPU 제한적인 클라이언트 애플리케이션들도 대부분 1개 내지 2개의 프로세서 코어들을 이용하며(1 내지 2 쓰레드 작업부하) 단지 소수의 애플리케이션들만이 장기간의 시간 동안 4개의 코어들을 모두 이용하기에 충분한 병렬성(parallelism)을 가진다.The 8w power budget is the limit that defines the nominal peak operating point (F, V) of the core, and the 5w power budget is the limit for the GPU. However, this assignment is conservative because it assumes simultaneous use of all on-die components and is only nominally maximum. In practice, most applications are either CPU-bounded or GPU-bounded. Even if an application operates both computing engines together (eg, video playback offloads some tasks to the processor cores) it does not use all four processor cores. Even CPU constrained client applications mostly use 1 to 2 processor cores (1 to 2 thread workload) and only a few applications have enough parallelism to use all 4 cores over a long period of time. Has

한 실시예는 실행중인 컴포넌트들에게 더 많은 전력을 할당함으로써 유휴 또는 더 적게 동작중인 컴포넌트들로부터 실행중인 컴포넌트들에게 전력을 재할당한다. 예를 들어, 4개의 코어 중 2개가 유휴이고 GPU는 절반의 전력으로 동작하는 경우의 작업부하 예에서, 표 2는 이런 상태를 반영한 전력 예산을 표로 도시한 것이다.One embodiment reallocates power from running or less running components to running components by allocating more power to running components. For example, in the workload example where two of the four cores are idle and the GPU is running at half the power, Table 2 shows a table of power budgets that reflect this state.

온-다이 컴포넌트 On-die component 할당된 전력Allocated power 비고Remarks 코어0 Core 0 16.75w16.75w 코어는 새로운 전력 헤드룸을 채우도록 더 높은 F,V에서 동작할 수 있음Cores can operate at higher F, V to fill new power headroom 코어1 Core 1 16.75w16.75w 코어는 새로운 전력 헤드룸을 채우도록 더 높은 F,V에서 동작할 수 있음Cores can operate at higher F, V to fill new power headroom 코어2 Core 2 0.5w0.5w 유휴 코어는 0.5w를 소비한다고 가정Assume an idle core consumes 0.5w 코어3 Core3 0.5w0.5w 유휴 코어는 0.5w를 소비한다고 가정Assume an idle core consumes 0.5w GPU GPU 2.5w2.5w 메모리 제어기 Memory controller 2w2w I/O 브리지 I / O bridge 1w1w 총합 total 40w40w

코어0과 코어1에는 전반적인 CPU 처리량을 개선하기 위해 16.75w가 할당된다. 양 코어들의 동작점(F,V)은 새로운 전력 헤드룸(8w 대신에 16.75w)을 채우도록 증가될 수 있다. 대체가능한 것으로, 단 하나의 코어의 전력 예산이 25.5w로 증가되지만, 다른 코어는 8w의 전력 예산으로 남아있을 수 있다. 이러한 경우에, 증가된 전력 예산을 가지는 코어는 훨씬 더 높은 동작점(F,V)으로 부스트될 수 있으며, 따라서 새로운 전력 헤드룸(25.5w)이 활용될 수 있다. 이 특정한 경우에, 2개의 코어들을 동일하게 부스트할지 또는 모든 이용가능한 전력 헤드룸을 하나의 코어로 제공할지에 대한 판단은 무엇이 전반적인 SOC 성능을 개선하는 최선의 방법인가에 달려있다.Core 0 and Core 1 are allocated 16.75 watts to improve overall CPU throughput. The operating points F and V of both cores can be increased to fill the new power headroom 16.75w instead of 8w. As an alternative, the power budget of only one core is increased to 25.5 watts, while the other core may remain with a power budget of 8 watts. In this case, a core with an increased power budget can be boosted to a much higher operating point (F, V), so that a new power headroom 25.5w can be utilized. In this particular case, the decision of whether to boost two cores equally or provide all available power headroom to one core depends on what is the best way to improve overall SOC performance.

부스트Boost 민감도 연습( Sensitivity Exercises ( BoostBoost SensitivitySensitivity TrainingTraining ) 및 데이터 구조) And data structures

한 실시예에 따르면, 성능 이득의 개선을 시도하고 달성하기 위해 코어0과 코어1 사이에 어떻게 전력을 할당할지를 결정하는 하나의 방법은 2개의 코어들 중 어느 것(만일 있다면)이 예컨대 주파수 증가에 의해 제공되는 실행 능력의 증가를 더 잘 활용할 수 있는지를 알아내는 것이다. 실행 능력의 변화는 또한 예컨대 코어에서 이용가능한 캐쉬의 양, 코어에서 동작하는 파이프라인들의 개수, 및/또는 명령어 인출 속도(fetch rate)의 변화에 의해 제공될 수도 있다. 코어들 중 어느 것이 실행 능력의 증가를 더 잘 활용할 수 있는지를 평가하기 위해, 한 실시예에서 각 연산 유닛의 주파수 변화 및/또는 다른 실행 능력의 변화에 대한 성능 민감도(본 명세서에서는 또한 부스트 민감도로도 지칭됨)가 연산 유닛 단위로 결정되고 저장된다.According to one embodiment, one method of determining how to allocate power between Core 0 and Core 1 to attempt and achieve improvement in performance gain is that which of the two cores (if any) is subject to frequency increase, for example. Is it better to take advantage of the increase in performance provided by that? The change in execution capability may also be provided by, for example, a change in the amount of cache available on the core, the number of pipelines operating on the core, and / or a change in the instruction fetch rate. In order to assess which of the cores can better utilize the increase in execution capability, in one embodiment performance sensitivity to frequency changes and / or other changes in execution capability of each computing unit (also referred to herein as boost sensitivity) Also referred to) is determined and stored in units of computation units.

도 2를 보면, 본 발명의 한 실시예에 따라 코어 주파수 변화에 대한 성능 민감도를 프로파일링(profile)하기 위한 상위 레벨 순서도가 예시되어 있다. 먼저, 201에서 미리 정의된 저주파수의 클록 신호가 소정의 시간 또는 프로그램가능한 시간(예컨대, 100us 내지 10ms의 시간) 동안 분석되는 CPU 코어에 인가된다. 이 시간 동안, 하드웨어 성능 분석 제어 로직(도 1의 요소(111) 참조)이 코어의 사이클당 명령어(instructions per cycle, IPC)(코어에서 보고된 대로)를 샘플링하고 평균한다. 성능 분석 제어 로직은 IPC × 코어 주파수(저주파수 또는 제1 성능 레벨)를 기반으로 제1 초당 명령어(instructions per second, IPS) 메트릭을 제1 성능 메트릭으로서 결정한다. IPS 메트릭은 임시 레지스터 "A"에 저장될 수 있다. 그런 다음, 205에서 성능 분석 제어 로직은 미리 정의된 고주파수의 클록 신호를 분석중인 CPU 코어에 동일한 소정의 또는 프로그램가능한 시간 동안 인가되게 한다. 207에서 성능 분석 제어 로직은 다시 코어 IPC(코어에서 보고된 대로)를 샘플링하고 평균한다. 성능 분석 제어 로직은 IPC × 코어 주파수(고주파수 또는 제2 성능 레벨)를 기반으로 제2 초당 명령어(IPS)를 결정하고 제2 IPC 메트릭을 제2 성능 메트릭으로서 임시 레지스터 "B"에 저장한다. 209에서 성능 분석 제어 로직은 수치의 차이를 결정하고 그 결과를 분석중인 코어의 번호 및 분석 동안 CPU 코어 상에서 실행되는 프로세스 콘텍스트(process context)의 번호와 함께 성능 민감도로서 성능 또는 부스트 민감도 테이블에 저장한다. 유의할 점은 부스트 민감도를 결정하기 위해 주파수 변화 대신에 또는 주파수 변화와 결합하여 다른 실행 능력의 변화들이 활용될 수도 있다는 것이다.2, a high level flow chart is illustrated for profiling performance sensitivity to core frequency changes in accordance with one embodiment of the present invention. First, at 201 a predefined low frequency clock signal is applied to a CPU core that is analyzed for a predetermined time or for a programmable time (eg, 100 us to 10 ms). During this time, hardware performance analysis control logic (see element 111 in FIG. 1) samples and averages instructions per cycle (IPC) of the core (as reported by the core). The performance analysis control logic determines the first instructions per second (IPS) metric as the first performance metric based on the IPC × core frequency (low frequency or first performance level). The IPS metric may be stored in the temporary register "A". The performance analysis control logic then at 205 causes a predefined high frequency clock signal to be applied to the CPU core under analysis for the same predetermined or programmable time. At 207 the performance analysis control logic again samples and averages the core IPC (as reported by the core). The performance analysis control logic determines a second instruction per second (IPS) based on the IPC × core frequency (high frequency or second performance level) and stores the second IPC metric as a second performance metric in temporary register "B". The performance analysis control logic at 209 determines the difference in numbers and stores the result in the performance or boost sensitivity table as performance sensitivity along with the number of cores under analysis and the number of process contexts running on the CPU cores during analysis. . Note that other performance changes may be utilized instead of or in combination with the frequency change to determine the boost sensitivity.

콘텍스트 번호는 더 짧은 수가 저장될 수 있도록 CR3 레지스터의 내용이나 CR3 레지스터의 해쉬(hash)에 의해 결정될 수 있다. 이 수치의 차이는 코어의 부스트 민감도를 나타낸다. 즉, 그것은 특정 프로세스 콘텍스트를 실행하는 코어의 주파수 변화에 대한 민감도를 나타낸다. 민감도가 클수록, 주파수 증가에 의해 얻어지는 성능의 증가도 더 많다. 도 2에 도시된 동일한 연습이 각 프로세서 코어와 명목상 최대 전력값 이상으로 부스트(오버-클록킹)될 수 있는 임의의 다른 컴포넌트에 적용될 수 있으며, 값들은 부스트 민감도 테이블에 저장된다. 부스트 민감도 테이블의 값들은 가장 높은 부스트 민감도를 갖는 코어나 다른 온-다이 컴포넌트에서 시작하여 내림차순으로 정렬될 수 있다.The context number can be determined by the contents of the CR3 register or by the hash of the CR3 register so that a shorter number can be stored. The difference in this figure indicates the boost sensitivity of the core. In other words, it represents the sensitivity to frequency changes of the core executing a particular process context. The higher the sensitivity, the greater the increase in performance obtained by increasing frequency. The same exercise shown in FIG. 2 can be applied to each processor core and any other component that can be boosted (over-clocked) above the nominal maximum power value, and the values are stored in the boost sensitivity table. The values in the boost sensitivity table can be sorted in descending order starting with the core with the highest boost sensitivity or other on-die component.

다른 실시예들에서, 주파수 민감도 연습은 연산 유닛들이 명목상 전력 레벨 이상으로 클록킹(또는 오버-클록킹)될 수 있는지 여부에 관계없이 다양한 성능 상태들을 구현하도록 그 주파수가 변화될 수 있는 모든 연산 유닛들에 적용될 수 있다. 그런 방식으로, 시스템은 전력 예산을 주파수 변화에 덜 민감한 코어들로부터 주파수 변화에 더욱 민감한 코어들(또는 다른 연산 유닛들)로 할당할 수 있다. 그런 방식으로, 코어들이나 다른 연산 유닛들은 SOC에 대해 상당한 성능 감소 없이 전력을 절약하기 위해 그 주파수가 감소될 수 있다.In other embodiments, the frequency sensitivity exercise may be any computing unit whose frequency can be changed to implement various performance states regardless of whether the computing units can be clocked (or overclocked) above the nominal power level. Can be applied to the In that way, the system can allocate the power budget from cores that are less sensitive to frequency changes to cores (or other compute units) that are more sensitive to frequency changes. In that way, cores or other computing units can be reduced in frequency in order to save power without significant performance loss for the SOC.

도 3은 시스템 블록도 레벨에서 주파수 연습을 예시한 것이다. 코어(301)에 대한 연습은 각 코어에 대한 주파수 연습을 대표하여 나타낸 것이다. 클록 발생기(303)는 성능 분석 제어 로직(111)에 의해 제어되는 대로 주파수 기간 동안 고주파수와 저주파수의 클록 신호들을 코어(301)에 공급한다. 코어(301)는 사이클당 명령어 값을 성능 분석 제어 로직(111)으로 제공하며, 성능 분석 제어 로직(111)은 도 2에 따른 방법을 제어한다. 도 4는 제1 시간 동안 샘플링하고 평균함으로써 결정되는 사이클당 명령어의 측정(IPC1)과 제1 시간 동안 공급되는 주파수(FREQ1)가 곱셈기(401)에서 곱해지는 것을 예시한 것이다. 유사하게, 곱셈기(403)에서 제2 시간 동안 결정되는 사이클당 명령어의 측정(IPC2)이 제2 시간 동안 공급되는 주파수(FREQ2)와 곱해진다. 곱셈기들(401, 403)에서 결정된 이용도 메트릭(utilization metric)들의 차이는 합산기(405)에서 결정된다. 그 결과는 부스트 민감도이며, 이는 부스트 민감도 테이블(407)에 저장된다. 부스트 민감도 테이블(407)은 각 측정에 대하여 그 결과와 함께 코어 번호(C#), 코어 상에서 실행되는 프로세스 콘텍스트, 및 마지막 성능 민감도 측정 이후로 경과된 시간을 저장한다. 그 결과는 예컨대 초당 명령어(Instructions Per Second, IPS)로서 표현되는 성능 메트릭 또는 부스트 민감도이며, 평균 IPC × 코어 주파수를 통해 계산된다. 유의할 점은 부스트 민감도 테이블은 SOC(100)(도 1)내에 저장되거나 컴퓨터 시스템의 어느 다른 곳에 저장될 수 있다는 것이다. 3 illustrates frequency practice at the system block diagram level. The exercises for core 301 are representative of the frequency exercises for each core. The clock generator 303 supplies the high frequency and low frequency clock signals to the core 301 during the frequency period as controlled by the performance analysis control logic 111. The core 301 provides the instruction value per cycle to the performance analysis control logic 111, which controls the method according to FIG. 2. 4 illustrates the measurement of the instruction per cycle (IPC1) determined by sampling and averaging for the first time and the frequency (FREQ1) supplied for the first time being multiplied in the multiplier 401. Similarly, the measurement of the instruction per cycle (IPC2) determined for the second time in the multiplier 403 is multiplied by the frequency (FREQ2) supplied for the second time. The difference of the utilization metrics determined at multipliers 401 and 403 is determined at summer 405. The result is boost sensitivity, which is stored in boost sensitivity table 407. The boost sensitivity table 407 stores, for each measurement, the result with the core number (C #), the process context running on the core, and the time that has elapsed since the last performance sensitivity measurement. The result is a performance metric or boost sensitivity, for example expressed as Instructions Per Second (IPS), calculated through the average IPC × core frequency. Note that the boost sensitivity table may be stored within SOC 100 (FIG. 1) or anywhere else in the computer system.

각 코어에 대한 부스트 민감도는 현재의 프로세서 콘텍스트와 묶일 수 있으며, 이는 x86의 CR3 레지스터 값에 의해 근사화될 수 있고 노스-브리지에 의해 추적된다. 한 실시예에서, 콘텍스트가 바뀔 때, 민감도가 재평가된다. 다른 실시예에서, 각 콘텍스트에 대한 부스트 민감도는 고정 타이머 또는 프로그램가능한 타이머를 기반으로(예컨대, 1ms 내지 100ms 후에) 만료된다. 또 다른 실시예에서, 타이머와 콘텍스트 스위치(context switch) 모두가 사용되며 어느 것이든 먼저 발생하면 부스트 민감도 재평가를 시작한다.The boost sensitivity for each core can be tied to the current processor context, which can be approximated by the CR3 register value of x86 and tracked by the north-bridge. In one embodiment, when the context changes, the sensitivity is reevaluated. In another embodiment, the boost sensitivity for each context expires based on a fixed timer or a programmable timer (eg, after 1 ms to 100 ms). In another embodiment, both a timer and a context switch are used and if either occurs first, the boost sensitivity reassessment starts.

따라서, 주파수 연습에 대한 한 실시예가 설명되었다. 도 2의 기능들은 하드웨어(예컨대, 성능 분석 제어 블록(111)의 상태 머신들)로, 펌웨어(마이크로코드나 마이크로컨트롤러)로, 또는 소프트웨어(예컨대, 드라이버, BIOS 루틴 또는 더 상위 레벨 소프트웨어)로 구현될 수 있다. 소프트웨어는 저주파수와 고주파수의 클록 신호들을 인가하고, IPC 값들을 수신하고, IPC 값들을 평균하고, 도 2와 관련하여 설명된 다른 기능들을 수행하는 것을 담당한다. 소프트웨어는 도 1의 컴퓨터 시스템에서 컴퓨터 판독가능한 전자, 광학, 자기, 또는 다른 종류의 휘발성이나 비휘발성 메모리에 저장될 수 있고, 하나 이상의 코어들에 의해 실행될 수 있다. 또 다른 실시예들에서, 도 2에 예시되고 전술된 주파수 민감도 연습은 특정 시스템의 필요 및 능력에 따라 부분적으로 하드웨어로 부분적으로 소프트웨어로 구현된다. 예를 들어, 소프트웨어는 부스트 민감도 테이블을 유지관리하고, 프로세스 콘텍스트를 결정하기 위해 CR3 레지스터를 판독하고, 부스트 민감도를 재결정하기 위해 소프트웨어 타이머들을 유지관리하는 것을 담당할 수 있으며, 하드웨어는 소프트웨어에 의해 통지되면 적절한 시간 동안 제1 주파수 및 제2 주파수의 클록들을 인가하고, 평균 IPC를 결정한다. 소프트웨어는 IPS 값들을 결정하는 것을 담당할 수 있다.Thus, one embodiment of frequency practice has been described. The functions of FIG. 2 are implemented in hardware (eg, state machines of performance analysis control block 111), in firmware (microcode or microcontroller), or in software (eg, drivers, BIOS routines, or higher level software). Can be. The software is responsible for applying low and high frequency clock signals, receiving IPC values, averaging the IPC values, and performing other functions described in connection with FIG. The software may be stored in computer readable electronic, optical, magnetic, or other types of volatile or nonvolatile memory in the computer system of FIG. 1, and may be executed by one or more cores. In still other embodiments, the frequency sensitivity exercises illustrated in FIG. 2 and described above are implemented in software in part and in hardware depending on the needs and capabilities of the particular system. For example, the software may be responsible for maintaining the boost sensitivity table, reading the CR3 register to determine the process context, and maintaining software timers to re-determine the boost sensitivity, the hardware being notified by the software. Then apply clocks of the first frequency and the second frequency for an appropriate time and determine the average IPC. The software may be responsible for determining the IPS values.

전력 예산의 재할당Reallocation of Power Budget

부스트 민감도 테이블(Boost Sensitivity Table, BST)은 잠재적으로 부스트될 컴포넌트들에 대한 주파수 민감도 연습 세션의 결과로서 유지관리된다. 다른 실시예들에서, 주파수 민감도 테이블은 전형적으로 주파수(및 필요하다면 전압) 조정을 통해 그 성능이 조정될 수 있는 모든 컴포넌트들에 대한 주파수 민감도 연습의 결과로서 유지관리된다. 한 실시예에서, 전력 예산 재할당은 BST의 정보를 사용하여 어느 온-다이 컴포넌트(들)가 부스트에 가장 민감하고 따라서 재할당이 일어날 때 더 높은 TDP 전력 마진(margin)을 재할당 받을 "가치가 있는지"를 판단한다.The Boost Sensitivity Table (BST) is maintained as a result of a frequency sensitivity practice session for potentially boosted components. In other embodiments, the frequency sensitivity table is typically maintained as a result of frequency sensitivity exercises for all components whose performance can be adjusted through frequency (and voltage if necessary) adjustments. In one embodiment, power budget reassignment uses the information in the BST to “receive any on-die component (s) most responsive to boost and thus reassign higher TDP power margin when reassignment occurs. Is there ".

특정한 프로세서 코어는 N개의 성능 상태들 중 하나에 있을 수 있다. 성능 상태는 코어 전압 값과 주파수 값의 고유한 쌍으로 특징지어진다. 가장 높은 성능 상태는 전형적으로 임의의 예상되는 거동으로 인해 코어 전력(동적 + 정적)이 그 코어에 할당된 전력 예산을 초과하지 않도록 선택되고 특징지어진다. 현재의 시스템들에서, 코어 성능 상태는 운영체제 소프트웨어에 의해 정의되며 현재 코어 이용도에 의해 정해진다. 다른 실시예들에서, 코어 성능 상태는 현재 코어에 의해 실행되는 콘텍스트를 기반으로 하드웨어에 의해 특정될 수 있다. 표 3은 4개의 성능 상태들(P0, P1, P2, P3)을 가지는 한 예시적인 시스템에 대한 성능 상태들을 도시한 것이며, 이들은 운영체제(OS)(또는 임의의 다른 상위 레벨 소프트웨어)가 시간 구간 동안의 코어 이용도에 따라 각 코어에 대해 이용할 수 있는 상태들이다. 한 예시적인 운영체제에서 시간 구간은 1msec 내지 100msec의 범위를 가진다. OS(또는 임의의 다른 상위 레벨 SW)가 코어를 낮은 C-상태로 설정할 때 2개의 유휴 상태들이 사용된다. C-상태는 코어의 전력 상태이다. 이 특정 실시예에서, 코어는 IDLE 상태(짧은 시간 동안 유휴 상태에 있을 것으로 예상될 때)나 아니면 깊은 C-상태(deep C-state)에 놓일 수 있다. 가장 높은 동작점(P-부스트)은 코어 전력(CoreBoostPwr)이 그 특정 코어에 대해 할당된 명목상 최대 전력 예산을 초과할 때의 것이다.A particular processor core may be in one of N performance states. The performance state is characterized by a unique pair of core voltage values and frequency values. The highest performance state is typically selected and characterized such that the core power (dynamic + static) does not exceed the power budget allocated to that core due to any expected behavior. In current systems, core performance states are defined by operating system software and by current core utilization. In other embodiments, the core performance state may be specified by hardware based on the context currently executed by the core. Table 3 shows the performance states for an example system with four performance states (P0, P1, P2, P3), where the operating system (or any other higher level software) is These are states available for each core depending on the core utilization of the core. In one exemplary operating system, the time interval ranges from 1 msec to 100 msec. Two idle states are used when the OS (or any other higher level SW) sets the core to a low C-state. The C-state is the power state of the core. In this particular embodiment, the core may be in the IDLE state (when expected to be idle for a short time) or in a deep C-state. The highest operating point (P-Boost) is when the core power CoreBoostPwr exceeds the nominal maximum power budget allocated for that particular core.

코어의
성능 상태Core
Performance status 동작점(F,V)Operating point (F, V) 소비 전력
(동적 및 정적)Power Consumption
(Dynamic and static) 비고Remarks P-부스트P-Boost F-부스트
/V-부스트F-Boost
/ V-boost CoreBoostPwrCoreBoostPwr 부스트 포인트.
코어의 전력 예산을 초과Boost point.
Exceed the power budget of the core P0P0 F0/V0F0 / V0 Core_Pwr0Core_Pwr0 코어 전력 예산Core power budget P1P1 F1/V1F1 / V1 Core_Pwr1Core_Pwr1 P2P2 F2/V2F2 / V2 Core_Pwr2Core_Pwr2 P3P3 F3/V3F3 / V3 Core_Pwr3Core_Pwr3 IDLEIDLE 클록 오프
/저전압Clock off
Low voltage Core_Idle_PwrCore_Idle_Pwr 깊은 C-상태Deep C-state 클록 오프
/전압 오프Clock off
On / off voltage Core_DeepCstate_PwrCore_DeepCstate_Pwr 코어가 파워 게이팅되거나(power gated) 아니면 깊은 전압이 인가됨Core is power gated or deep voltage is applied

GPU 전력 상태는 전통적으로 소프트웨어(그래픽 드라이버)에 의해 제어된다. 다른 실시예들에서, 그것은 또한 GPU의 거동을 추적하고 다른 그래픽 관련 엔진들(통합 비디오 디코더(Unified Video Decoder, UVD), 디스플레이 등)로부터 정보를 수신하는 하드웨어에 의해 제어될 수 있다. 한 예시적인 실시예에서, GPU는 표 4에 도시된 바와 같이 4개의 전력 상태들 중 하나에 있을 수 있다.GPU power state is traditionally controlled by software (graphics drivers). In other embodiments, it may also be controlled by hardware that tracks the behavior of the GPU and receives information from other graphics related engines (Unified Video Decoder (UVD), display, etc.). In one exemplary embodiment, the GPU may be in one of four power states as shown in Table 4.

성능 상태Performance status GPU 소비 전력(동적 및 정적)GPU power consumption (dynamic and static) GPU-부스트GPU-boost GPUBoostPwrGPUBoostPwr GPU_P0GPU_P0 GPU_Pwr0GPU_Pwr0 GPU_P1GPU_P1 GPU_Pwr1GPU_Pwr1 GPU_P2GPU_P2 GPU_Pwr2GPU_Pwr2 GPU_P3GPU_P3 GPU_Pwr3GPU_Pwr3

한 실시예에서, 오로지 2개의 온-다이 컴포넌트들, 즉 코어 프로세서들과 GPU만이 더 높은 성능 포인트로 부스트될 수 있다. I/O 모듈과 메모리 제어기는 그들의 "미사용(unused)" 전력 예산을 이들 컴포넌트들에 재할당함으로써 코어들이나 GPU의 부스트 프로세스에 기여할 수 있지만, 그들 자신은 부스트될 수 없다. 다른 실시예들에서, 메모리 제어기는 DRAM(Dynamic Random Access Memory) 및 그 주파수를 더 높은 동작점으로 천이시킴으로써 역시 부스트될 수 있다.In one embodiment, only two on-die components, the core processors and the GPU, can be boosted to higher performance points. I / O modules and memory controllers can contribute to the boost process of cores or GPUs by reallocating their "unused" power budget to these components, but they cannot be boosted themselves. In other embodiments, the memory controller may also be boosted by shifting Dynamic Random Access Memory (DRAM) and its frequency to a higher operating point.

연산 유닛들에게 전력을 효율적으로 할당하는 한 실시예는 이용가능한 전력 헤드룸, 또는 TDP 전력 마진을 영구히 추적하는 것에 입각하고 있다.One embodiment of efficiently allocating power to computational units is based on permanently tracking available power headroom, or TDP power margin.

SOC_TDP_Margin은 SOC_TDP_Limit에서 모든 온-다이 컴포넌트들의 전력 소비의 합을 빼서 계산되며, SOC_TDP_Margin = SOC_TDP_Limit - ∑코어(i) 전력 - GPU 전력 - 메모리 제어기 전력 - I/O 브리지 전력 이다. 온-다이 컴포넌트들 상태의 임의의 변화는 SOC_TDP_Margin 값의 업데이트를 트리거한다. 한 실시예에서, 업데이트를 트리거하는 상태 변화는 성능 또는 전력 상태의 변화 또는 애플리케이션/작업부하 거동의 변화이다. 다른 실시예들에서, 업데이트를 트리거하는 상태의 변화는 프로세스 콘텍스트 변화일 수 있으며, 또는 프로세스 콘텍스트 변화나 성능 상태 변화 중 어느 하나일 수 있다. 한 실시예에서, 성능/전력 상태의 변화나 애플리케이션/작업부하 거동의 변화와 같이 컴포넌트에 의해 소비되는 전력의 변화를 가져오는 임의의 이벤트는 상태 변화를 트리거하는 이벤트로서 기능할 수 있다.SOC_TDP_Margin is calculated by subtracting the sum of the power consumption of all on-die components in SOC_TDP_Limit, where SOC_TDP_Margin = SOC_TDP_Limit-core (i) power-GPU power-memory controller power-I / O bridge power. Any change in the on-die components state triggers an update of the SOC_TDP_Margin value. In one embodiment, the state change that triggers the update is a change in performance or power state or a change in application / workload behavior. In other embodiments, the change in state that triggers the update may be a process context change or may be either a process context change or a performance state change. In one embodiment, any event that results in a change in power consumed by a component, such as a change in performance / power state or a change in application / workload behavior, may serve as an event that triggers a change in state.

일반적으로, 특정한 연산 유닛의 전력(전압 × 전류)은 그 연산 유닛에서의 클록 신호 주파수, 공급 전압, 및 활동량에 의거한다. 각 연산 유닛의 전력을 결정하는 특정한 접근법은 시스템 능력 및 필요에 따라 달라질 수 있으며 하드웨어 및/또는 소프트웨어 접근법을 기반으로 구현될 수 있다. 예를 들어, 한 접근법에서, 연산 유닛은 평균 전력값을 동적 전력 + 정적 전력으로서 계산하고 보고한다. 동적 전력은 (평균 작업부하 거동/최대 거동) × MaxPower 로서 계산될 수 있으며, MaxPower는 최대 거동과 관련되는 최대 동적 전력의 유도된(fused) 값 또는 설정가능한 값이다. 정적 전력은 연산 유닛이 동작하는 전압에 따라 달라지며, 테이블로부터 추출되거나, 아니면 전력 관리 자원들로부터 이용가능하거나, 또는 하드웨어로 결정될 수 있다. 평균 작업부하 거동은 시간 동안 연산 유닛을 지나는 신호 토글들의 평균 개수, 또는 시간 동안의 평균 IPC로서 계산될 수 있다. 전력 계산은 역시 소프트웨어 방법들을 이용할 수 있으며, 이 방법들에서 소프트웨어(예컨대, 드라이버)는 연산 유닛에서 실행되는 애플리케이션의 거동을 알고 있으며 전술된 바와 유사한 접근법을 사용하여 평균 전력을 결정한다.In general, the power (voltage x current) of a particular computing unit is based on the clock signal frequency, the supply voltage, and the amount of activity in that computing unit. The specific approach to determining the power of each computing unit may vary depending on system capabilities and needs and may be implemented based on hardware and / or software approaches. For example, in one approach, the computing unit calculates and reports the average power value as dynamic power + static power. Dynamic power can be calculated as (average workload behavior / maximum behavior) × MaxPower, where MaxPower is the fused or settable value of the maximum dynamic power associated with the maximum behavior. The static power depends on the voltage at which the computing unit is operating and may be extracted from the table, or available from power management resources, or determined in hardware. Average workload behavior can be calculated as the average number of signal toggles passing through the computing unit over time, or the average IPC over time. The power calculation may also use software methods, in which the software (eg, driver) knows the behavior of the application running in the computing unit and uses an approach similar to that described above to determine the average power.

한 실시예에서, 오로지 P0 상태에 있는 코어와 GPU_P0 상태에 있는 GPU만이 다른 온-다이 컴포넌트들로부터 전력을 재할당 받고 더 높은 성능 포인트로 부스트될 수 있다. 이는 코어가 P0 상태에 있거나 GPU가 GPU_P0 상태에 있다는 것은 본질적으로 현재 실행되는 태스크가 연산 제한적(computationally bounded)이라는 힌트들(OS나 일부 그래픽 드라이버와 같은 상위 레벨 SW에 의해 제공되는)이라는 관찰에 근거한 것이다. 다른 실시예들에서, 코어 및/또는 GPU는 다른 비-유휴 상태들에 있을 때 부스트될 수도 있다.In one embodiment, only cores in the P0 state and GPUs in the GPU_P0 state may be reallocated power from other on-die components and boosted to higher performance points. This is based on the observation that the core is in the P0 state or the GPU is in the GPU_P0 state is essentially based on the observation that the currently running task is hints (computationally bounded) provided by higher level software such as the OS or some graphics drivers. will be. In other embodiments, the core and / or GPU may be boosted when in other non-idle states.

도 5는 전력을 할당하는 전력 할당 제어기(109)(도 1)의 한 실시예의 동작의 예시적인 순서도를 예시한 것이다. 501에서, 전력 할당 제어기는 임의의 온-다이 컴포넌트에 대한 상태 변화, 예컨대 성능 상태, 애플리케이션/거동 변화, 또는 프로세스 콘텍스트 변화를 기다린다. 상태 변화가 일어나면, 503에서 TDP_SOC_Margin이 추적되고, 505에서 마진이 0보다 큰지 여부에 대한 판별이 이루어진다. 만일 0보다 크지 않다면, 501로 진행한다. 만일 마진이 0보다 크다면, 이는 하나 이상의 코어들을 부스트할 수 있는 헤드룸이 존재한다는 것을 의미하며, 507에서 임의의 CPU 코어가 P0 상태에 있는지에 대한 검사가 이루어진다. 이 특정한 실시예에서는, 오직 P0에 있는 코어들만이 부스트될 수 있다. 만일 P0에 있는 코어들이 없다면, 523에서 GPU 전력 상태를 검사한다. 만일 적어도 하나의 코어가 P0에 있다면, 전력 할당 제어기는 509에서 P0에 있는 모든 코어들에 대해 새로운 TDP_SOC_Margin = TDP_SOC_Margin - ∑(CoreBoostPwr - Core_Pwr) 를 계산함으로써 모든 P0 코어들을 부스트하기에 충분한 공간이 있는지 검사한다. 새로운 TDP_SOC_Margin은 P0에 있는 모든 코어들이 부스트된다고 가정할 때 예상되는 마진값이다. TDP_SOC_Margin은 현재 마진값이다. CoreBoostPwr는 부스트될 때의 코어 전력이고 Core_Pwr는 P0 상태에서 현재 코어 전력이다. 511에서 전력 할당 제어기는 새로운 마진이 0보다 큰지를 검사한다. 만일 그렇다면, 모든 P0 코어들을 부스트하기에 충분한 공간이 있으며, 515에서 모든 코어들이 부스트되고, TDP_SOC_Margin이 업데이트된다. 그런 다음, 501로 돌아가서 또 다른 상태 변화를 기다린다.FIG. 5 illustrates an exemplary flow diagram of the operation of one embodiment of power allocation controller 109 (FIG. 1) for allocating power. At 501, the power allocation controller waits for state changes, such as performance states, application / behavior changes, or process context changes, for any on-die component. If a state change occurs, TDP_SOC_Margin is tracked at 503 and a determination is made at 505 whether the margin is greater than zero. If not greater than zero, go to 501. If the margin is greater than zero, this means that there is a headroom that can boost one or more cores, and a check is made at 507 to see if any CPU core is in the P0 state. In this particular embodiment, only cores at P0 can be boosted. If there are no cores at P0, then check the GPU power state at 523. If at least one core is at P0, the power allocation controller checks whether there is enough space to boost all P0 cores by calculating a new TDP_SOC_Margin = TDP_SOC_Margin-(CoreBoostPwr-Core_Pwr) for all cores at P0 at 509 do. The new TDP_SOC_Margin is the expected margin given that all cores in P0 are boosted. TDP_SOC_Margin is the current margin value. CoreBoostPwr is the core power when boosted and Core_Pwr is the current core power in the P0 state. At 511 the power allocation controller checks if the new margin is greater than zero. If so, there is enough space to boost all P0 cores, at 515 all cores are boosted, and TDP_SOC_Margin is updated. Then go back to 501 and wait for another state change.

만일 511에서 마진이 0보다 크지 않다면, 517로 가서 가능하다면 일부 마진을 찾는다. 가장 높은 민감도를 갖는 코어들이 식별된다. 이는 예컨대 앞서 논의된 부스트 민감도 연습에 의해 제공되는 부스트 민감도 테이블을 액세스함으로써 행해질 수 있다. 519에서, P0 상태에 있는 코어들이 예컨대 부스트 민감도의 내림차순으로 정렬된다. 따라서, 하단에 있는 것들은 주파수 증가에 가장 덜 민감하다. 521에서, 전력 할당 제어기는 하나씩 차례로 리스트로부터 최저의 부스트 민감도를 갖는 코어를 제거하고 리스트에 남아있는 코어들에 대해 509에서와 같이 새로운 TDP_SOC_Margin을 다시 계산한다. 다른 실시예들에서, 소정의 또는 프로그램가능한 임계값 이하의 부스트 민감도를 갖는 모든 코어들이 동시에 리스트로부터 제거된다. 그렇게 하는 이유는 성능이 증가되지 않을 코어들을 부스트함으로써 낭비되는 전력을 방지하려는 것이다. 새로운 TDP_SOC_Margin이 0보다 크면, 여전히 리스트 상에 있는 P0 코어들은 P-부스트로 천이되고 TDP_SOC_Margin이 업데이트된다.If the margin at 511 is not greater than zero, go to 517 and find some margin if possible. Cores with the highest sensitivity are identified. This can be done, for example, by accessing the boost sensitivity table provided by the boost sensitivity exercise discussed above. At 519, cores in the P0 state are sorted in descending order of boost sensitivity, for example. Thus, the ones at the bottom are least sensitive to frequency increase. At 521, the power allocation controller removes the cores with the lowest boost sensitivity from the list one by one and recalculates the new TDP_SOC_Margin as in 509 for the remaining cores in the list. In other embodiments, all cores with boost sensitivity below a predetermined or programmable threshold are removed from the list at the same time. The reason for doing so is to prevent wasted power by boosting cores that will not increase performance. If the new TDP_SOC_Margin is greater than zero, the P0 cores still on the list are transitioned to P-boost and TDP_SOC_Margin is updated.

523에서, 전력 할당 제어기는 GPU가 GPU_P0 상태에 있는지 보기 위해 검사한다. 만일 그 상태에 있지 않다면, 501로 되돌아가서 상태 변화를 기다린다. 만일 GPU가 P0 상태에 있다면, 전력 할당 제어기는 525에서 현재의 TDP_SOC_Margin으로부터 GPU에 대한 현재 전력과 부스트된 전력 사이의 차이를 뺌으로써 새로운 TDP_SOC_Margin을 계산하여 GPU를 부스트하기에 충분한 헤드룸이 있는지를 결정한다. 527에서, 전력 할당 제어기는 새로운 마진이 0보다 큰지를 보기 위해 검사하고, 만일 0보다 크다면 GPU를 부스트된 상태로 천이시키고 TDP_SOC_Margin을 업데이트하며, 503으로 되돌아가서 임의의 컴포넌트들에 있어서의 또 다른 상태 변화를 기다린다. 만일 충분한 마진이 없다면, 503으로 되돌아간다.At 523, the power allocation controller checks to see if the GPU is in GPU_P0 state. If not, return to 501 and wait for a state change. If the GPU is in the P0 state, the power allocation controller calculates the new TDP_SOC_Margin from the current TDP_SOC_Margin from the current TDP_SOC_Margin at 525 to calculate the new TDP_SOC_Margin to determine if there is enough headroom to boost the GPU. do. At 527, the power allocation controller checks to see if the new margin is greater than zero, if it is greater than zero, transitions the GPU to a boosted state, updates TDP_SOC_Margin, and returns to 503 for another component in any component. Wait for status change. If there is not enough margin, go back to 503.

따라서, 충분한 마진이 있을 때 P0에 있는 연산 유닛들에게 전력을 할당하고 주파수 부스트에 덜 민감한 연산 유닛들을 제외함으로써 그 마진을 찾는 한 실시예가 설명되었다. 다른 실시예들에서, 주파수 부스트는 잉여 전력을 보증하기 위해 예컨대 소정의 또는 프로그램가능한 임계값 이상의 충분히 높은 부스트 민감도를 갖는 연산 유닛들로만 제공된다. 이런 방식으로, 가능하다면 감소된 전력 소비를 유지하려고 시도하면서도 증가된 성능이 제공될 수 있다.Thus, one embodiment has been described that finds that margin by allocating power to computational units at P0 when there is sufficient margin and excluding computational units that are less sensitive to frequency boost. In other embodiments, the frequency boost is provided only to computational units that have a sufficiently high boost sensitivity, e.g., above a predetermined or programmable threshold, to ensure surplus power. In this way, increased performance can be provided while attempting to maintain reduced power consumption if possible.

도 5의 기능들은 부스트 민감도를 기반으로 전력을 할당하도록 하드웨어(예컨대, 상태 머신들)로, 펌웨어(마이크로코드나 마이크로컨트롤러)로, 또는 소프트웨어(예컨대, 드라이버, BIOS 루틴 또는 더 상위 레벨 소프트웨어)로 구현되거나, 또는 하드웨어와 소프트웨어의 임의의 적절한 조합으로 구현될 수 있다. 부스트 민감도 정보가 부스트 민감도 연습을 통해 이용가능하다고 가정하면, 한 실시예에서 소프트웨어는 임의의 컴포넌트의 상태 변화를 통지받고 도 5와 관련하여 설명된 접근법을 구현할 수 있다. 소프트웨어는 도 1의 컴퓨터 시스템에서 컴퓨터 판독가능한 전자, 광학, 또는 자기 휘발성이나 비휘발성 메모리에 저장될 수 있고, 하나 이상의 코어들에 의해 실행될 수 있다. 또 다른 실시예들에서, 도 5의 기능들은 특정 시스템의 필요 및 능력에 따라 부분적으로 하드웨어로 부분적으로 소프트웨어로 구현된다.The functions of FIG. 5 may be implemented in hardware (eg, state machines), in firmware (microcode or microcontroller), or in software (eg, drivers, BIOS routines, or higher level software) to allocate power based on boost sensitivity. It may be implemented or in any suitable combination of hardware and software. Assuming that the boost sensitivity information is available through the boost sensitivity exercise, in one embodiment the software may be informed of the change in state of any component and implement the approach described with respect to FIG. The software may be stored in computer readable electronic, optical, or magnetic volatile or nonvolatile memory in the computer system of FIG. 1, and may be executed by one or more cores. In still other embodiments, the functions of FIG. 5 may be implemented in software, in part in hardware, depending on the needs and capabilities of a particular system.

부스트 민감도 정보의 이용가능성은 다양한 방식들로 SOC에 의해 이용될 수 있다. 중앙 처리 유닛(CPU)의 속도 조절이 이러한 이용의 한 예이다. GPU 제한적인 애플리케이션이 실행되고 있다고 가정하자. 즉, GPU 상에서 실행되고 있는 애플리케이션은 예컨대 현재의 성능 상태가 그 특정한 애플리케이션에 대해 필요한 것보다 낮기 때문에 GPU의 성능에 의해 제한된다. 이 경우에, 모든 CPU 코어들에 P-상태 제한(P-state limit)을 부과함(예컨대, P-상태 제한 = P2 상태)으로써 CPU 코어들이 속도 조절(그 성능이 제한)될 수 있다. 이는 GPU가 이용가능한 전력 마진을 풀어줄 것이다. 한 실시예에서, GPU 제한적이거나 CPU 제한적인 애플리케이션은 특정한 코어나 GPU가 얼마나 바쁜지를 표시하는 데이터를 기반으로 식별된다.The availability of boost sensitivity information can be used by the SOC in various ways. Speed regulation of the central processing unit (CPU) is an example of such use. Suppose a GPU-limited application is running. That is, an application running on a GPU is limited by the GPU's performance, for example, because its current performance state is lower than required for that particular application. In this case, CPU cores can be speed regulated (the performance of which is limited) by imposing a P-state limit on all CPU cores (eg, P-state limit = P2 state). This will release the power margin available to the GPU. In one embodiment, GPU constrained or CPU constrained applications are identified based on data indicating how busy a particular core or GPU is.

대체가능한 것으로, 주파수에 대해 가장 낮은 성능 민감도를 갖는 코어들만이 P-상태 제한으로 속도 조절될 수 있다. 예를 들어, 4-코어 시스템에서, 부스트 민감도 테이블에 따라 코어 주파수 변화에 대해 가장 낮은 IPS 민감도를 갖는 2개의 코어들이 P-상태 제한 = P2를 부과함으로써 속도 조절될 수 있는 반면, 다른 코어들의 상태는 변화되지 않고 남아있을 수 있다. 이는 ((Core_Pwr0 - Core_Pwr2) × 2)와 동등한 전력 마진을 GPU를 위해 풀어줄 것이며, Core_Pwr0는 P0 상태에서 코어에 의해 소비되는 전력이고 Core_Pwr2는 P2 상태에서 코어에 의해 소비되는 전력이다.Alternatively, only cores with the lowest performance sensitivity to frequency can be speed adjusted to the P-state limit. For example, in a four-core system, the two cores with the lowest IPS sensitivity to core frequency change according to the boost sensitivity table can be speed regulated by imposing P-state limit = P2, while the state of the other cores. May remain unchanged. This will release for the GPU a power margin equivalent to ((Core_Pwr0-Core_Pwr2) x 2), where Core_Pwr0 is the power consumed by the core in the P0 state and Core_Pwr2 is the power consumed by the core in the P2 state.

또 다른 실시예들에서, CPU 제한적(연산 제한적)인 애플리케이션(하나 이상의 프로세싱 코어들의 성능에 의해 제한되는 애플리케이션)이 실행되고 있을 때, 애플리케이션들은 종종 이용가능한 코어들의 서브세트 상에서 실행되기 때문에, 주파수 증가(또는 감소)에 대해 덜 민감한 코어들이 다른 코어들에게 잉여 마진을 제공하도록 속도 조절될 수 있다. GPU 제한적인 애플리케이션은 GPU의 성능에 의해 제한되는 애플리케이션이다.In yet other embodiments, when a CPU constrained (operation constrained) application (application constrained by the performance of one or more processing cores) is running, the frequency is increased because the applications are often executed on a subset of the available cores. Cores that are less sensitive to (or reduce) can be speed adjusted to provide extra margin to other cores. GPU-limited applications are applications that are limited by the performance of the GPU.

도 6은 부스트 민감도 정보를 기반으로 성능의 속도 조절을 위한 상위 레벨 순서도를 도시한 것이다. 601에서, CPU 제한적이거나 GPU 제한적인 애플리케이션들이 식별된다. 603에서, 저장된 부스트 또는 성능 민감도 정보가 검토되고, 605에서, 예컨대 주파수, 전압, 코어가 이용가능한 캐쉬 양, 코어에서 동작중인 파이프라인들의 개수, 및/또는 명령어 인출 속도에 있어서의 감소와 같은 실행 능력의 감소에 대해 성능의 측면에서 덜 민감한 코어들의 서브세트를 기반으로 속도 조절하기 위해 예컨대 프로세싱 코어들과 같은 연산 유닛들의 서브세트가 식별된다. 607에서 서브세트의 성능이 제한되고, 609에서 속도 조절을 통해 이용가능해진 전력 헤드룸이 CPU 제한적 및/또는 GPU 제한적인 애플리케이션을 실행중인 연산 유닛(들)에게 제공된다. 도 6에 설명된 기능들은 전력 할당 제어기(109)에 구현되거나 또는 상위 레벨 소프트웨어를 이용하거나 하드웨어 및 소프트웨어 모두를 이용하여 구현될 수 있다.6 illustrates a high level flow chart for speed regulation of performance based on boost sensitivity information. At 601, CPU constrained or GPU constrained applications are identified. At 603, the stored boost or performance sensitivity information is reviewed, and at 605 execution such as a decrease in frequency, voltage, amount of cache available to the core, number of pipelines running in the core, and / or instruction fetch rate. A subset of computational units, such as processing cores, for example, is identified to adjust speed based on a subset of cores that are less sensitive in terms of performance to the reduction in capability. The performance of the subset is limited at 607 and the power headroom made available through speed control at 609 is provided to the computational unit (s) running CPU constrained and / or GPU constrained applications. The functions described in FIG. 6 may be implemented in power allocation controller 109 or may be implemented using higher level software or both hardware and software.

만일 애플리케이션이 주로 CPU 코어들을 이용한다면, GPU_Pwr0보다 낮은 GPU P-상태 제한을 GPU에게 강제함으로써 아니면 GPU의 명령어/메모리 트래픽 스트림을 속도 조절함으로써 GPU가 속도 조절될 수 있다. 만일 속도 조절된 GPU 전력이 GPU_Pwr2와 동등하다면, 잉여 전력 마진 GPU_Pwr0 - GPU_Pwr2가 부스트 민감도 테이블 값들에 따라 하나 이상의 CPU 코어들을 부스트하도록 재할당될 수 있다.If the application mainly uses CPU cores, the GPU can be speeded up by imposing a GPU P-state limit lower than GPU_Pwr0 or by speeding up the GPU's instruction / memory traffic stream. If the speed regulated GPU power is equivalent to GPU_Pwr2, surplus power margins GPU_Pwr0-GPU_Pwr2 may be reallocated to boost one or more CPU cores according to the boost sensitivity table values.

연산 제한적인 작업부하가 다중 코어 프로세서나 GPU 상에서 실행될 때, 메모리가 또한 속도 조절될 수 있다. 하나의 방법은 DRAM에 대한 모든 다른 액세스를 다수의 사이클들만큼 스톨(stall)시켜서 DRAM I/O와 DRAM DIMM 전력의 동적 부분을 2배 가까이 감소시키는 것이다. 또 다른 접근법은 다수의 이용가능한 메모리 채널들을 셧다운(shut down)시키고 또한 DRAM I/O와 DRAM DIMM 전력의 일정 비율을 풀어주는 것을 수반할 수 있다. 감소된 DRAM I/O 전력은 GPU나 CPU 코어들의 이용도 및 BST 값들(CPU 코어들에 관한 것이라면)에 따라 GPU나 아니면 CPU 코어들에게 재할당될 수 있으며, 따라서 전반적으로 더 높은 SOC 성능 처리량으로 이어진다. DRAM DIMM은 SOC의 일부가 아닐 수 있으며, 이 경우에 그 전력 예산은 SOC TDP의 일부가 아니다. 하지만, 감소된 DRAM DIMM 전력 마진이 SOC TDP로 다시 재할당될 수 있는 상황에서는, 잉여 마진이 GPU나 일부 CPU 코어들을 부스트하는 데 사용될 수 있다.When computationally constrained workloads run on multi-core processors or GPUs, the memory can also be speed regulated. One method is to stall all other accesses to DRAM by multiple cycles to reduce the dynamic portion of DRAM I / O and DRAM DIMM power by nearly twice. Another approach may involve shutting down multiple available memory channels and also releasing some ratio of DRAM I / O and DRAM DIMM power. Reduced DRAM I / O power can be reallocated to the GPU or CPU cores depending on the utilization of the GPU or CPU cores and the BST values (if CPU cores are concerned), thus increasing overall SOC performance throughput. It leads. The DRAM DIMM may not be part of the SOC, in which case the power budget is not part of the SOC TDP. However, in situations where reduced DRAM DIMM power margin can be reallocated back to SOC TDP, surplus margin can be used to boost the GPU or some CPU cores.

본 발명의 일부 실시예들의 경우 일반적으로 회로들과 물리적 구조들이 상정되어 있지만, 최신 반도체 설계 및 제조에 있어서, 물리적 구조들과 회로들은 후속의 설계, 테스트, 또는 제조 단계들에서 사용되기에 적합한 컴퓨터 판독가능한 기술 형태(descriptive form)로 실시될 수 있다는 것은 잘 인지될 것이다. 예시적인 구성들에서 별개의 컴포넌트들로서 제시된 구조들과 기능들은 결합된 구조나 컴포넌트로서 구현될 수 있다. 본 발명은 발명의 상세한 설명에서 서술되고 첨부된 특허청구범위에서 정의된 바와 같은 회로들, 회로들의 시스템들, 관련 방법들, 및 이러한 회로들, 시스템들, 및 방법들의 컴퓨터 판독가능한 매체 인코딩들을 모두 포함하는 것으로 고려된다. 본 명세서에서 사용된 바와 같이, 컴퓨터 판독가능한 매체는 적어도 디스크, 테이프, 또는 다른 자기, 광학, 반도체(예컨대, 플래쉬 메모리 카드, ROM), 또는 전자적 매체를 포함한다.Although circuits and physical structures are generally assumed for some embodiments of the present invention, in modern semiconductor design and fabrication, the physical structures and circuits are suitable for use in subsequent design, test, or manufacturing steps. It will be appreciated that it may be implemented in a descriptive form. Structures and functions presented as separate components in example configurations may be implemented as a combined structure or component. The invention provides both circuits, systems of circuits, related methods, and computer readable media encodings of such circuits, systems, and methods as described in the description of the invention and as defined in the appended claims. It is considered to include. As used herein, computer readable media includes at least disk, tape, or other magnetic, optical, semiconductor (eg, flash memory card, ROM), or electronic media.

따라서, 다양한 실시예들이 설명되었다. 유의할 점은 본 명세서에서 제시된 본 발명의 설명은 예시적인 것이며 첨부된 특허청구범위에서 제시되는 본 발명의 범위를 제한하고자 의도된 것이 아니라는 것이다. 예를 들어, 연산 유닛들은 다중 코어 프로세서의 일부일 수 있지만, 다른 실시예들에서, 연산 유닛들은 함께 또는 분리되어 패키징될 수 있는 별개의 집적 회로들에 있을 수 있다. 예를 들어, 그래픽 처리 유닛(GPU)과 프로세서는 함께 또는 분리되어 패키징되는 별개의 집적 회로들일 수 있다. 첨부된 특허청구범위에서 제시되는 본 발명의 범위를 벗어남이 없이 본 명세서에서 제시된 상세한 설명을 기반으로 본 명세서에서 개시된 실시예들에 대한 다양한 수정들과 변형들이 이루어질 수 있다.Thus, various embodiments have been described. It should be noted that the description of the invention presented herein is exemplary and is not intended to limit the scope of the invention set forth in the appended claims. For example, the computing units may be part of a multi-core processor, but in other embodiments, the computing units may be in separate integrated circuits that may be packaged together or separately. For example, the graphics processing unit (GPU) and the processor may be separate integrated circuits packaged together or separately. Various modifications and variations can be made to the embodiments disclosed herein based on the detailed description set forth herein without departing from the scope of the invention set forth in the appended claims.

Claims

A method of operation of a computer system comprising a plurality of computational units,
Varying the performance of one or more of the computational units in accordance with the performance sensitivity of each of the computational units.
How computer systems work.

The method of claim 1,
Modifying the performance of the one or more computing units by boosting the performance of the one or more computing units according to which of the one or more computing units has a higher performance sensitivity than other computing units. doing
How computer systems work.

The method of claim 1,
Comparing the performance sensitivity of each of the computing units to a threshold and changing the performance of the computing units in accordance with the comparison.
How computer systems work.

The method of claim 1,
The one or more computing units whose performance changes are in the same power state before the performance changes, and the same power state is nominally the maximum power state.
How computer systems work.

The method of claim 1,
The computing units comprise a plurality of processing cores in a group, and the method
Removing from the group one or more processing cores having a lower performance sensitivity than other processing cores among the processing cores of the group until the expected power margin is greater than zero for the processing cores remaining in the group; And
Altering the performance by boosting the performance of the cores remaining in the group;
How computer systems work.

The method of claim 1,
The computing units comprise a plurality of processing cores and the method
If the expected power margin when boosting the performance of the group of processing cores is less than zero,
Removing from the group cores having a lower performance sensitivity than others in the group to form a smaller group; And
Calculating a new expected power margin and determining if the new expected power margin is greater than zero if the performance of the smaller group of cores is boosted;
If the new expected power margin is greater than zero for the smaller group of cores, modifying the performance by boosting the performance of the smaller group of cores; And
If the new expected power margin is still less than zero for the current group, then remove another core from the smaller group with a lower performance sensitivity than the other cores in the smaller group to form another smaller group. Further comprising the step of
How computer systems work.

The method of claim 6,
Determining the new expected power margin in accordance with a current actual power margin-(Boost Power-Current Power), wherein the boosted power is the power of the smaller group of cores operating at the boosted power level. Wherein the current power is the power of the smaller group of cores operating at current power levels, and the current actual power margin is the power margin corresponding to the current power consumption of the computing units.
How computer systems work.

The method of claim 1,
Modifying the performance is to boost the performance of the one or more computing units.
How computer systems work.

The method of claim 1,
Accessing a storage to determine a performance sensitivity of each of the computing units, the storage storing performance sensitivity corresponding to respective process contexts executed on each of the processing cores.
How computer systems work.

A plurality of computing units;
A storage unit which stores respective performance sensitivitys for the computing units; And
A power allocation function configured to boost performance of one or more of the computing units in accordance with the performance sensitivitys.
Device.

The method of claim 10,
The power allocation function is responsive to one or more of the computing units having a higher performance sensitivity than others of the computing units.
Device.

The method of claim 10,
The power allocation function is configured to compare the performance sensitivity of each of the computing units with a threshold and to boost the computing units having a performance sensitivity higher than the threshold.
Device.

The method of claim 10,
The power allocation function also removes one or more computation units from the group of computation units in response to an expected power margin not sufficient to boost all of the cores of a group to a boosted performance state, and the removal is performed by the group. The one or more computing units of are determined to have a lower performance sensitivity than others of the computing units of the group, and the removal and recalculation are those remaining of the computing units in which the new expected power margin is greater than zero. Repeated until you can boost the performance of
Device.

14. The method according to any one of claims 10 to 13,
The apparatus includes at least one integrated circuit, and the computing units include at least one of processing cores, a memory controller, and a graphics processing unit, wherein the power allocation function is on hardware, firmware, and computer readable media. Implemented with one or more of the software stored
Device.

The method of claim 10,
The performance sensitivity of each of the computing units is determined according to a first performance metric and a second performance metric of each of the computing units determined at a first performance level and a second performance level.
Device.