TW200422942A

TW200422942A - Performance level setting of a data processing system

Info

Publication number: TW200422942A
Application number: TW092131595A
Authority: TW
Inventors: Krisztian Flautner; Trevor Nigel Mudge
Original assignee: Advanced Risc Mach Ltd; Univ Michigan
Priority date: 2002-11-12
Filing date: 2003-11-11
Publication date: 2004-11-01
Also published as: GB0305442D0; GB2402504A

Abstract

A target processor performance level is calculated from a utilization history of a processor in performance of a plurality of processing tasks. The method comprises calculating a task work value indicating processor utilization in performing a given processing task within a predetermined task time-interval and calculating a target processor performance level in dependence upon the task work value.

Description

200422942 玖’ 【發明係之資【先電源理器這些應的器效可接軟體後間上述算法次作能源的利窗格級的，發明說明：明所屬之技術領域】本發明係有關於資料處理系統領域。p # 史特言之，本發有關於可於處理器中設定給多個不同虛w l a « u疼理器效能層級料處理系統所設置的效能層級。前技術】處理器設計的一重要目標為提供更好的效能並且降低消耗。某些現今的處理器提供設定其為數種不同的處效能層級之一種能力，視當時應用程式的要求而定。處理器利用降低該處理器的時脈頻率之事實，以及對操作電壓可能以二次方地減少能源消耗。然而，處理能的降低僅在使用者沒什麼感覺或沒有效能影響時才受。因此重要的是處理器效能層級的降低不應該造成錯失其執行期限。在期限前完成一特定處理任務，然置比起慢點執行該任務在能源上是比較沒有效率的，慢點執行該任務為確保其更準確地達到符合期限。已知的效能層級設定技巧包括所謂的以間隔為主的演 ’其主要概念描述於Weiser等人於1994年11月第一業系統&十與實施座談會論文集中發表的“降低cpu 的排程”。此一已知的以間隔為主的演算法監控處理器用歷史並藉由計算一固定且簡短（10-50毫秒）之時間 =的閒置時間與忙碌時間之比例而導出一適當效能層才曰不。典型上計算最近時間間隔中的整體處理器利 3 200422942 用’若其超過一臨界值則增加處理器的效能層級；反之若該時間間隔大多包含閒置時間，則降低該效能層級。雖然此已知方法對於規則的工作負載還應付得不錯，但對於非規則（即非週期性的）工作負載以及互動應用程式便顯得左支右紬。其他已知的技術使用整體處理器利用的加權平均做為未來利用的引導，然而已顯示此一替代性技術並未產生可大大增進處理器利用並降低閒置時間的一時脈速度 (參見Grunwald等人於2000年1〇月第四次作業系統設計與實施研討會論文集中所發表之“動態時脈排程策略，，。因此對於廣大範圍的工作負載例如包括非規則與互動性工作負載，需要一種可更精確地預測一適當的處理器效能層級之一效能層級設定技術。本發明提供一種從執行多種處理任務的處理器利用史中計算一目標處理器效能層級的方法，該方法包含：計算一任務工作值，其係於一對應的任務時間間隔内指出在執行一預定處理任務的處理器利用。依據上述任務工作值計算上述目標處理器效能層級。本發明認知單獨的處理任務（或處理任務群組）常具有可識別的利用期於任務層級處，然而當估算一適當執行效能時，其對於所有累積之被觀察的任務而言可能為模糊的。藉由聚焦於該任務層級的處理器利用，效能設定策略可更佳地調和處理任務的多樣性與關於其所需的效能層級。本發明允許對每個處理任務直接地預測效能層級，而非藉由整體工作負載所指定的一任意數量而間接地調高或 4 200422942 調低處理器效能層級。單一過去時間窗格中的任務工作值可被用於預測該處理任務一適當的未來效能層級。然而較佳的實施例結合同一處理任務中對應至數個過去執行的任務工作值以預測1 未來效此層級。此具有提供關於特定任務之靜態上 μ 又平確的效能層級預測的優點。須知該任務時間間隔對於每個處理任務的每個執行可設為一固定值。然而，透過獨立對每個處理任務設定任務時間間隔，可使效能預測系統更加適合於不同的工作負栽種類。尤其是，鑑於一短時間週期可適用於一互動性處理任務，而一相對較長的時間週期似乎更適合用於一非互動性任務。如果選擇一不適合的短任務時間間隔，可造成效能層級之間的震盪。讓任務時間間隔可獨立地對每個處理任務設定會增加一穩定效能預測將被選定的可能性。此外’透過對一特定處理任務的每個執行獨立地設定時間週期，該效能預測可適合於考慮在執行時構成工作負荷之部分的其他任務。這些其他共同存在的任務由於任務先佔之緣故很有可能會影響一特定任務的總執行時間。任務時間間隔可被彈性地定義，假如該間隔包括於其範圍内的某處執行上述特定處理任務。然而，較佳的實施例定義遠任務時間間隔使其起始於特定處理任務的一第— 排程開始並於該特定處理任務之後續執行前結束。此優點在於容易實施且該任務時間間隔與該特定處理任務執行的頻率相關。此使得該技巧更適用於非週期性的處理任務。 200422942 將瞭解關於一特定處理任務之多個先前處理的任務工作值能以許多不同的方式加以結合來預測該任務的一未來 =能層級’例如可計算任務卫作值的—平均或—加權平 :。將任務工作值結合以計算指數式衰減的工作完成值會疋更佳的，因為此將使最近計算的任務工作值大的影響。负更雖然可計算相關任務群組的執行截止期π，但依據任 :基：計算一任務的執行截止期限是較佳的，目為其將使理器效能可更精細地被調整。任務工作值以及任務時間間隔内偵測到的閒置時間較佳地用於計算該任務執行截止期限：由於該執行截止期限可被正規化以考慮到該任務時間:隔内的普遍執行效能’ &相較於使用完成該任務之一先前執行所用的真實時間，此可提供一更精確的估算。分別記錄該任務工作值的指數平均以及一特定處理任務的執行截止期限的優點在於效能層級預測的重要性可依據該任務工作值被測量的間隔長度而t。此可防止與最長 :務時間間隔有關的任務工作值支配預測因而補償廣泛改變的視窗大小。將瞭解該任務工作值可僅包括在對應的任務時間窗格二：定處理任務的處理器利用1而，較佳實施例该 P㈣定處理任耗其他不料處理任務先佔並將該先相μ::處理盗利用包含於工作完成值中。此優點在於將 -特—ΓΓ以結合並相應地估算適當的效能層、級。預期到的後續執行中，相同的其他任務也很有可能先 6 200422942 以及效能層級預佔”亥特疋任務。因而該任務執行截止期限測應將這些先佔任務考量在内。雖然任務時間窗格可允許依照執行時間以及特定任的執行頻率而改變大小，但設定該任務時間窗格的上限界是較佳的。此優點在於可防止未被先佔的長時間處理在二不適當的效能層級上繼續執行。當任務時間窗格達到上限界時便開始重新計算效能層級。又、在較佳實施例中，實施效能層級設定方法於作業系統核心程式軟體中H點在於該軟體可依據依較豐富的執行時間資訊組合做出選擇，因而產生較佳的準確度。 L發明内容】 ’其係提供一種依據處用史來計算一目標處理含以下步驟：在預疋任務時間間隔用：以及目標處理器效能層級。，其係提供一載著一電產品，以依據一處理器而計算該處理器之一目少包含：計算一任務工作值，該間隔内執行一特定處理自本發明的更進一步態樣觀之理器在執行多個處理任務時的一利器效能層級的方法，該方法至少包計算一任務工作值，其係指出内執行一特定處理任務的處理器利依據上述任務工作值計算上述自本發明的更進一步態樣觀之月b程式用於控制一電腦之電腦程式於執行多個處理任務的一利用史中標處理器效能層級，該電腦程式至可操作的任務工作值計算碼以任務工作值指出在一預定任務時間 200422942 任務的值計算上以下詳第料處理式功能組11 2 智慧蜇包含/ 一事件叫模組 1 3 6 〇言! 供應資該分的核程式為主機上取特權叫的一處理器利用；以及搡作的目標處理器效能計算碼以依據上述任務工作上述目標處理器效能層級。述與本發明之其他物件、特性以及優點將明顯地從細的說明實施例描述配合附隨的圖式中得知。方式】 1圖概要地說明電源管理系統如何可被實施於一資系統中。該資料處理系統包含一種具有標準核心程怏模組的核心程式1 00，該模組包括一系統呼叫模排程程式1 1 4以及一習知電源管理程式1 1 6。一能源管理程式系統1 20被實施於該核心程式中並且策略協調程式122、一效能設定控制模組1 24以及追蹤模組126。一使用者處理層丨30包含一系統呼 132、一任務管理模組134以及特定應用程式資料〖使用者處理層級130透過一應用程式監控模組140 訊至該核心程式100。核心程式1 00為提供基本服務給作業系統之其他部心程式。該核心程式可與外殼程式相對比，該外殼作業系統的最外面部分而與使用者指令互動。在其執行該核心程式碼對實體資源像是記憶體有完全存。該系統的其他部分或一應用程式透過名為系統呼組程式介面請求該核心程式的服務。使用者處理層 8 200422942 112與132。該排程程式其順序200422942 玖 '[Invention of the Department of Information [The power of the power processor can be connected to the software and the above algorithm can be used as a profitable pane of energy, the description of the invention: The technical field to which the invention belongs] The present invention is related to information Processing system area. p # In particular, this issue is related to the performance level that can be set in the processor to multiple different virtual w l a «u processor performance levels. Previous technology] An important goal of processor design is to provide better performance and reduce consumption. Some modern processors offer the ability to set it to several different levels of processing performance, depending on the requirements of the application at the time. The processor takes advantage of the fact that the clock frequency of the processor is reduced, and the operating voltage may be reduced to a quadratic power consumption. However, the reduction in processing energy is only affected if the user does not feel it or has no effect on performance. It is therefore important that a reduction in the level of processor performance should not result in missed execution deadlines. Complete a specific processing task before the deadline, but it is less energy efficient to perform the task more slowly, and to perform the task slower to ensure that it meets the deadline more accurately. Known performance level setting techniques include the so-called interval-based performance. Its main concepts are described in Weiser et al., "Reducing the CPU Rank Cheng. " This known interval-based algorithm monitors the processor's history and derives an appropriate performance layer by calculating a fixed and short (10-50 milliseconds) time = ratio of idle time to busy time. . Typically, the overall processor benefit in the most recent time interval is calculated. If it exceeds a critical value, the processor's performance level is increased; otherwise, if the time interval mostly includes idle time, the performance level is decreased. Although this known method works well for regular workloads, it appears to be side-by-side for non-regular (that is, non-periodic) workloads and interactive applications. Other known techniques use the weighted average of overall processor utilization as a guide for future utilization, however this alternative technique has been shown to not produce a clock speed that can significantly increase processor utilization and reduce idle time (see Grunwald et al. The “Dynamic Clock Scheduling Strategy” published in the Proceedings of the Fourth Operating System Design and Implementation Symposium in October 2000. Therefore, for a wide range of workloads, including irregular and interactive workloads, a It is possible to more accurately predict one of the appropriate processor performance levels. Performance level setting technology. The present invention provides a method for calculating a target processor performance level from the history of processor utilization of multiple processing tasks. The method includes: Task work value, which indicates the utilization of a processor that executes a predetermined processing task within a corresponding task time interval. The target processor performance level is calculated according to the task work value. The present invention recognizes a separate processing task (or processing task) Groups) often have identifiable utilization periods at the task level, However, when estimating a proper execution performance, it may be ambiguous for all accumulated observed tasks. By focusing on processor utilization at that task level, performance setting strategies can better tune the diversity of processing tasks With regard to its required performance level, the present invention allows the performance level to be predicted directly for each processing task, rather than indirectly increasing or lowering the processor performance level by an arbitrary number specified by the overall workload 4 200422942 The task work value in a single past time pane can be used to predict an appropriate future performance level for the processing task. However, the preferred embodiment combines the task work values corresponding to several past execution tasks in the same processing task to predict 1 The future effect is at this level. This has the advantage of providing static μ and accurate performance level predictions for specific tasks. Note that the task interval can be set to a fixed value for each execution of each processing task. However, through independent Setting the task interval for each processing task can make the performance prediction system more suitable for different tasks The type of load. In particular, given that a short period of time can be suitable for an interactive processing task, and a relatively long period of time seems more suitable for a non-interactive task. If you choose an unsuitable short task interval, Can cause turbulence between performance levels. Allowing task intervals to be set independently for each processing task increases the likelihood that a stable performance prediction will be selected. In addition, 'set independently for each execution of a specific processing task Time period, the performance prediction may be suitable for considering other tasks that constitute part of the workload during execution. These other co-existing tasks are likely to affect the total execution time of a particular task due to task preemption. Task interval It can be flexibly defined if the interval is included somewhere within its scope to perform the specific processing task described above. However, the preferred embodiment defines a long task time interval so that it starts at the first of a specific processing task—the beginning of a schedule It ends before the subsequent execution of the specific processing task. This has the advantage that it is easy to implement and that the task time interval is related to the frequency of execution of that particular processing task. This makes the technique more suitable for aperiodic processing tasks. 200422942 It will be understood that a number of previously processed task work values for a particular processing task can be combined in many different ways to predict a future for the task = energy level ', for example, an -average or -weighted level that can calculate the task guard value. :. Combining task work values to calculate exponentially decayed work completion values is better because it will have a large impact on the recently calculated task work values. Negative change Although the execution deadline π of the relevant task group can be calculated, it is better to calculate the execution deadline of a task based on Ren: Base, so that the performance of the processor can be adjusted more finely. The task work value and the idle time detected during the task time interval are preferably used to calculate the task execution deadline: since the execution deadline can be normalized to take into account the task time: universal execution performance within the interval '& amp This provides a more accurate estimate than using the real time it took to complete one of the previous executions of the task. The advantage of separately recording the exponential average of the task's work value and the execution deadline of a particular processing task is that the importance of performance-level predictions can depend on the length of the interval at which the task's work value is measured. This prevents the task work value associated with the longest service interval from dominating the predictions and thus compensates for the widely changing window size. It will be understood that the task work value may only be included in the corresponding task time pane 2: the processor that determines the processing task uses 1, and in the preferred embodiment, the predetermined processing consumes other unexpected processing tasks first and the first μ :: Handling of misappropriation is included in the work completion value. This has the advantage of combining-special-ΓΓ and estimating the appropriate performance level and level accordingly. In the expected subsequent execution, the same other tasks are also likely to be preempted by 6 200422942 and the "Hyatt" task at the efficiency level. Therefore, the task execution deadline should consider these preempted tasks. Although the task time window The grid can change the size according to the execution time and the specific execution frequency, but it is better to set the upper bound of the task time pane. This advantage is that it can prevent the unused long-term processing from inappropriate performance. Continue execution at the level. When the task time pane reaches the upper bound, the performance level is recalculated. Also, in a preferred embodiment, the performance level setting method is implemented in the operating system core program software. The point H is that the software can A richer set of execution time information makes a choice, which results in better accuracy. L SUMMARY OF THE INVENTION 'It provides a method for calculating a target based on the application history, including the following steps: at the time interval of the preliminary task: and Target processor performance level. It provides an electronic product based on a processor. One of the items of the processor includes: calculating a task work value, performing a specific process within the interval, a method of a sharp tool efficiency level when the processor of the present invention further performs multiple processing tasks, the method At least one task work value is calculated, which indicates that a processor that executes a specific processing task can calculate the above-mentioned month b program from a further aspect of the present invention based on the task work value to control a computer program of a computer in A utilization history of executing multiple processing tasks is awarded to the processor performance level. The computer program is operable to calculate the task work value. The task work value is specified at a predetermined task time. 200422942 The task value calculation is detailed below. The group 11 2 smart card contains / an event called module 136 words! The core program that supplies the funds is used by a processor on the host to obtain a privileged call; and the target processor performance calculation code is based on the above The task works at the target processor performance level described above. Other objects, features, and advantages described herein will be apparent. Obviously know from the detailed description of the embodiment and the accompanying drawings. Mode] 1 diagram outlines how the power management system can be implemented in a capital system. The data processing system includes a standard core program. The core program 100 of the module includes a system call module scheduling program 1 1 4 and a conventional power management program 1 1 6. An energy management program system 1 20 is implemented in the core program and strategy coordination Program 122, a performance setting control module 1 24, and a tracking module 126. A user processing layer 丨 30 includes a system call 132, a task management module 134, and application-specific data [user processing level 130 through an application The program monitoring module 140 sends a signal to the core program 100. The core program 100 is the other central program that provides basic services to the operating system. The core program can be compared with the shell program, which interacts with user commands in the outermost part of the operating system. The core code in which it runs has full storage of physical resources like memory. Other parts of the system or an application program request the services of the core program through a system call program interface. User Processing Layer 8 200422942 112 and 132. The scheduler its order

以及該核心程式均具有系統呼叫模組間中給每個處理使用處理器。該習知電源管理程式ι Μ藉由依據處理器利用層級而在一省電休眠模式以及一標準清醒模式之間切換以管理供給電壓。該智慧型能源管理程式120負責計算並設定處理器效能目標。該智慧型能源管理程式12〇使中央處理器（cpu) 的操作電壓以及處理器的時脈降低而不會使應用程式軟髅錯過處理（即任務）的截止期限，而非僅依賴休眠模式達到省電目的。當該CPU以全速運轉時許多處理任務將會在其截止期限内完成，而該處理器將會閒置直到下一個任務開始舉例來說，產生資料的一種任務的一任務載止期限為該產生的資料被其他任務所需要的時間點0 —互動任務的截止期限會是使用者的感覺臨界值（50·丨00毫秒）❶以全效能運轉而後間置相較於慢點完成該任務以致更準確地符合截止期限而言是較沒有能源效率的。當處理器的頻率下降時，其電壓也可降低以達到節省能源的目的。對於實施於互補金氧半導體（CMOS)科技的處理器而言，一特定工作負載所使用的能源正比於電壓的平方。該策略協調程式管理數個效能設定演算法，每個演算法適合不同的運轉時間情況。對於一特定條件之最適合的效能設定演算法於運轉時間中選擇。效能設定控制模組1 24接收每個效能設定演算法的結果並藉由按優先順序處理這些結果重複計算一目# 200422942 處理器效能。事件追蹤模組i 26監控位於核心程式U 0和使用者處理層級130中的事件，並將收集到的資訊傳送給效能設定控制模組1 24以及策略協調程式1 22。在使用者處理層級中，監控處理工作係透過··系統呼叫事件1 3 2、包含任務切換、任務建立與任務離開事件的處理任務事件134以及特定應用程式資料。智慧型能源管理程式1 20被實施為一組核心程式模組並將掛勾嵌補於標準核心程式功能性模組中並且供作控制處理器的速度與電壓之用。該智慧型能源管理程式1 2〇實施的方法使其相對地獨立於該核心程式1 〇〇中的其他模組。此優點在於使效能設定控制機制較不會干擾主作業系統。實施於核心程式中同時也意味使用者應用程式不需被調整。因此，該智慧型能源管理程式1 2 0與系統呼叫模組11 2、排程程式114 以及核心程式的傳統能源管理程式丨丨6共同存在，雖然在這些子系統中其可能需要某些掛勾（h〇ok)。該智慧槊能源管理程式120用於透過檢查執行任務間的通訊型態自作業系統核心程式導出任務截止期限與任務分類資訊（例如該任務是否與一互動應用程式結合）。其同時也用於監控哪個系統呼叫被每個任務所存取以及資料如何於核心程式中的通訊架構間流動。第2圖概要地說明依據本技術之效能設定演算法的三階層鱧系。須注意到在一特定處理器上的頻率/電壓設定選項一般為間斷的而非連續的。因此該目標處理器效能層級必須從固定的預定值組合中選擇。雖然計算一目標處理器 10 200422942 效能層級的已知技術包含使用一單一效能設定演算法，本計數利用多種演算法，其各自擁有適合不同運轉時間況的不同特性。對一特定處理情況而言，最適合的演算於運轉時選擇。該策略協調程式模組122協調該效能設演算法並藉由連接該標準核心程式110中的掛勾而提供享的功能給多個效能設定演算法。該多個效能設定演算的結果被整理並分析以對一目標處理器效能層級決定一體的估計。將各種演算法組織為一判斷階層體系（或演算堆疊），其中由較高（較為支配的）階層體系的演算法輸出效能層級指示器有權優先於由較低（較不為支配的）階層系之演算法輸出的效能層級指示器。第2圖的示範實施具有三階層體系。於該階層體系的最高層具有一互動應程式效能指示器，於該階層體系的中間層具有一特定應程式效能指示器220，而於該階層體系的最低層具有一於任務之處理器利用效能指示器230。該互動應用程式效能指示器210的計算是由基 Flautner等人於200 1年7月在國際行動運算及網路會議文集中發表之“關於動態電壓調整之自動效能設定，，所描的一演算法來執行。該互動應用程式效能層級預測演算意在藉由找出直接影響使用者經驗的執行期間並確保這事件不會有不當的延遲而完成以提供良好互動效能保證該演算法使用一相對簡單的技術以自動分離互動事件。技術依賴監控來自為GUI控制程式之X伺服器的通訊以追蹤被觸發為一結果之任務執行。但情法定共法整法的體例用用基於論述法些〇此及 200422942 一互動事件（典型地包含多個任務）的展開由使用者開始並由- GIH事件表明，例如按下一滑鼠按鈕或鍵盤按鍵。因此，該GUI控制程式（在此情況中為χ伺服器）發送一訊息至負責處理此事件的任務。藉由監控適當的系統呼叫（各種讀取、寫入與選擇的版本），該智慧型能源管理程式120可自動地偵測一互動事件的展開。當該事件開始時，該GIH控制程式以及接收該訊息的任務被標示為正^ 於一互動事件中。如果一互動事件的任務與未標示的任務通訊，則尚未被標示的任務也會被標示。在此處理期間，該智慧型能源管理程式1 20追蹤已被先佔之標示任務的數量。當被先佔的任務為零時，代表所有的任務均已完成，故該事件結束。第3圖說明在一互動事件期間設定處理器效能層級的策略。一互動事件的期間已知會隨著數種重要順序而改變 (從萬分之一秒至大約一秒鐘）。然而一開始轉換的延遲或 “略過臨界值’’被設定為5毫秒以過濾掉最短的互動事件因而減少請求的效能層級轉換的數量。低於1毫秒的互動事件典型地為按至視窗之重複按鍵或移動滑鼠越過螢幕並重繪小矩形的結果。設定該略過臨界值為5毫秒是因為可使簡短事件自效能指示器預測中過濾掉而不會不利地影響最糟的情況。如果該互動事件期間超過該略過臨界值，則相關的效能層級數值被包含於整體互動效能層級預測中。對所有過去互動事件計算之效能參數的一加權指數式衰減平均提供 12 200422942 次一互動事件的效能參數。需注意依據本技術該互動應用程式效能設定演算法對於系統中一互動事件的必要效能層級使用單一整體預測。[此與上述提到之論文中描述的技術不同，其依照開啟該事件的任務使用每個任務不同的效能層級預測。] 為了限制一錯誤效能層級預測於使用者經驗的最壞影響情形，如果該互動事件並未在達到一所謂的“緊急臨界值”之前完成，則指定最上階層的效能層級預測為最大效能層級因這是一最上層預測，所以該系統將會強制執行。在問題互動事件的結尾，該互動演算法計算該事件的正確效能設定應該為何並將此修正值納入指數式衰減平均而將影響未來的預測〃執行一額外最佳化因而若在一互動事件過程中達到緊急臨界值，便重新調整移動平均因而將修正的效能層級以一較高的權值（使用k=1而非k==3)納入該指數式衰減平均之中。對所有較該略過臨界值為久的事件來說，計算該效能預測。互動事件“截止期限用於對每個定義的互動事件取得一效能層級指不器。該截止期限為一任務必須被完成以避免不良地影響效能的最後時間。依據與特定互動事件有關之人類感覺臨界值而計算互動事件的效能層級指示器。舉例來說，已知每秒20至30畫面已夠快讓使用者覺得一系列的影像為一連續的流動因而可設定一互動影像顯示事件的感覺臨界值為50毫秒。雖然實際使用的感覺臨界值依使用者以及進行中的任務而定，50毫秒的固定值仍被認為適 13 200422942 合於階層體系的互動演算法。以下的方程式乃是用於計算短於感覺臨界值之事件的效能需求。 ^ Work ise P , -—~ βη Perception Threshold 其中全速等效工作肠Au是從該互動事件開始時測量。階層體系中間層的特定程式效能指示器220藉由整理一種暸解效能層級設定功能之應用程式的資訊輸出而獲得。過去已採用這些應用程型能源管理程式120與其特可提供新的API項目給作業關效能需求的通訊。效能指示器230是藉由得，其中該演算法是根據最的使用。此演算法為每個獨據任務基礎調整一任務上以時間週期的大小。以預期為執行的所有任務種類，而最考量互動任務。儘管該互動證高水準互動效能的效能層高層，基於預期的演算法不窗格。由於在適當時可選擇層艘系的最下層使用較長利能。如果該利用史窗格過短個固定值之間快速地震盈。式以送給（透過系統呼叫）智慧疋效月b需求有關的特定資訊。系統以及應用程式以促進此有實作一基於演算法的預測而獲近利用史來估計該處理器未來立任務導出一使用估計並且依計算利用史（即利用史窗格）之基礎的演算法考量處理器正在上層的互動應用程式演算法僅應用程式演算法計算一用於保級指示器且位於階層體系的最需被限制於保守的短暫利用史一更強勢的省電策略，在此階用史窗格的可能性將會提升效將可能造成效能層級預測於兩典型上有必要於使用單一統一 14 200422942 、冼（而非一演算法的階層體系組合）時設置一短暫的利用史1^格以對所有運轉情況設定效能層級。為了可處理間 2 14的密集處理器互動事件，此統一運算法必須讓利用史窗格為短暫的。每個二層堆疊的效能設定演算法使用一種在一特定時間間隔中處理完成工作的方式。在此實施例中，所使用的 70成工作方法為該時間間隔中執行的全速（處理器）等效工作。此全速等效工作乃是依據下列公式而估計： worb^YjiPi i=l 其中i為在該特定時間間隔實施之[種不同處理器效能層級的其中之一；ti為效能層級i中消耗的非閒置時間秒數；而Pi為處理器效能層級i以高峰（全速）處理器效能層級的分數表示。此方程式於一時間標誌計數器（任務計數器）即時測量的系統中有效。該完成工作於替代的實施例中可使用計數率隨著目前處理器頻率而改變的週期計數器而有不同的計算。此外，該方程式隱含一工作負載的運轉時間與處理器頻率成反比的假設。該假設提供完成工作一合理的估計。然而，主要由於在效能調整的過程申匯流排速度與處理器速度的比值是非線性的，因此該假設並非總是準確。在替代實施例中可精細地調整該完成工作計算以考量這些因素。第4圖概要地說明在處理器上執行一工作負載以及對 15 200422942 -任務A計算利用史窗格。第4圏的橫袖代表時間。任務 A首先於時間S開始執行，隨後開始數個依據任務的資料結構❶有四種資料結構對應至下列四部分f訊：⑴任務計數器的現在狀態；（ii)目前時間（即時）；（iH)_閒置時間計數器的目冑狀態；以及(iv)設定一執行位元至邏輯層級]，指出該任務已開始執行，務計數器，計數器以及閒置時間計數器用於計算與任務八有關之處理器利用並随後計算任務Λ的效能需求。於pE時任務a尚未執行完畢但被其他另一任務B先佔。當任務排程程式ιΐ4判定2 他的任務較執行中任務具有較高的優先性時便會出現先佔。當任務A被先佔時該執行位元維持在邏輯層級‘丨，以指出該任務仍有任務待完成β於RE時，任務A繼續執行0 其被重新排程並持續執行直到於Tc時完成並隨後自願放棄處理時間。在完成時任務A可起始一系統呼叫使處理器處理其他任務。當任務A於TC完成時該執行位元被重設至邏輯層級‘ 0 ’。在TC時之後有一間置期間隨後執行一進階任務，之後又有一閒置期間。於RS時，任務A開始執行第二次。於RS時，與任務A有關之執行位元的‘〇，狀態指出資訊存在以開始計算任務A的效能需求，因而處理器目標效能層級也可依此設定以供即將到來的任務A重新執行。一特定任務的利用史窗格被定義為該特定任務第一次執行的開始直到該任務下一次執行的開始之期間並應於相關窗格中包 16 200422942 含至少一該任務的先佔事件（在此情形中為任務B於玟£處先佔任務A)。因此，在此情形中任務的利用史窗格被定義為時間S至時間RS的期間。在此窗格中任務a的目標效能層級因而如以下計算：And the core program has a system call module for each process using a processor. The conventional power management program ι manages the supply voltage by switching between a power-saving sleep mode and a standard awake mode according to the processor utilization level. The smart energy management program 120 is responsible for calculating and setting processor performance targets. The intelligent energy management program 12 reduces the operating voltage of the central processing unit (CPU) and the clock of the processor without causing the application software to miss the deadline for processing (ie, tasks), instead of relying solely on the sleep mode to reach Power saving purpose. When the CPU is running at full speed, many processing tasks will be completed within its deadline, and the processor will be idle until the next task starts. For example, a task deadline for a task that generates data is the one generated The time point at which the data is required by other tasks 0 — The deadline of the interactive task will be the user's critical threshold (50 · 丨 00 milliseconds). It runs at full efficiency and then completes the task more slowly than the slower one to make it more accurate. It is less energy efficient to meet the deadline. When the frequency of the processor drops, its voltage can be reduced to save energy. For processors implemented in complementary metal-oxide-semiconductor (CMOS) technology, the energy used for a particular workload is proportional to the square of the voltage. The strategy coordination program manages several performance setting algorithms, each of which is suitable for different running time situations. The most suitable performance setting algorithm for a specific condition is selected in the running time. The performance setting control module 1 24 receives the results of each performance setting algorithm and repeats the calculation by processing these results in priority order # 200422942 processor performance. The event tracking module i 26 monitors events located in the core program U 0 and the user processing level 130 and transmits the collected information to the performance setting control module 1 24 and the policy coordination program 1 22. In the user processing level, the monitoring and processing work is through the system call event 1 2 2. Processing task event 134, including task switching, task creation, and task leaving events, and application-specific data. The smart energy management program 120 is implemented as a set of core program modules and hooks are embedded in the standard core program functional modules and are used to control the speed and voltage of the processor. The method implemented by the smart energy management program 120 makes it relatively independent of other modules in the core program 1000. This has the advantage that the performance setting control mechanism is less likely to interfere with the main operating system. Implementation in the core program also means that the user application does not need to be adjusted. Therefore, the smart energy management program 1 2 0 coexists with the system call module 11 2, the scheduling program 114, and the core program's traditional energy management program 丨 6. Although these subsystems may require some hooks (H〇ok). The smart energy management program 120 is used to derive task deadlines and task classification information (such as whether the task is integrated with an interactive application program) from the core program of the operating system by checking the communication type between execution tasks. It is also used to monitor which system calls are accessed by each task and how data flows between the communication frameworks in the core program. Fig. 2 schematically illustrates the three-tier relationship of the performance setting algorithm according to the present technology. It should be noted that the frequency / voltage setting options on a particular processor are generally intermittent rather than continuous. Therefore, the target processor performance level must be selected from a fixed combination of predetermined values. Although known techniques for calculating the performance level of a target processor 10 200422942 include the use of a single performance setting algorithm, the count uses multiple algorithms, each of which has different characteristics suitable for different operating time conditions. For a particular processing situation, the most suitable calculation is selected during operation. The policy coordination program module 122 coordinates the performance setting algorithm and provides shared functions to multiple performance setting algorithms by connecting the hooks in the standard core program 110. The results of the multiple performance setting calculations are collated and analyzed to determine an overall estimate of a target processor performance level. Organize algorithms into a hierarchy of judgments (or stacks of algorithms), where the algorithm's output performance level indicator from the higher (more dominant) hierarchy has the right to take precedence over the lower (less dominant) hierarchy The performance level indicator of the algorithm output. The model implementation in Figure 2 has a three-tier system. There is an interactive application performance indicator at the highest level of the hierarchy, a specific application performance indicator 220 at the middle level of the hierarchy, and a processor utilization performance at the lowest level of the hierarchy. Indicator 230. The calculation of the interactive application performance indicator 210 is based on the calculations described in "Automatic Performance Settings for Dynamic Voltage Adjustment", published in the International Mobile Computing and Web Conference Proceedings by Flatner et al. In July 2001. The interactive application performance level prediction algorithm is intended to provide good interactive performance by identifying the execution period that directly affects the user experience and ensuring that the event is not unduly delayed to ensure that the algorithm uses a relative Simple technology to automatically separate interactive events. Technology relies on monitoring communication from the X server that is a GUI control program to track the execution of tasks that are triggered as a result. However, the method of the common law integration method is based on the discussion method. This and 200422942 The development of an interactive event (typically containing multiple tasks) is initiated by the user and indicated by a -GIH event, such as pressing a mouse button or a keyboard key. Therefore, the GUI control program (in this case, χ server) sends a message to the task responsible for handling this event. By monitoring the appropriate system call (Various versions of reading, writing, and selecting), the smart energy management program 120 can automatically detect the development of an interactive event. When the event starts, the GIH control program and the task to receive the message are marked Is positive in an interactive event. If the task of an interactive event communicates with an unmarked task, the unmarked task will also be marked. During this processing, the smart energy management program 1 20 tracking has been first Occupied indicates the number of tasks. When the preempted task is zero, it means that all tasks have been completed, so the event ends. Figure 3 illustrates the strategy for setting the processor performance level during an interactive event. An interactive event The period is known to change with several important sequences (from tenths of a second to about a second). However, the delay or "skip threshold" at the beginning of the conversion is set to 5 milliseconds to filter out the shortest interactions Events thus reduce the number of requested performance level transitions. Interactive events under 1 millisecond are typically the result of pressing a button repeatedly to the window or moving the mouse across the screen and redrawing a small rectangle. The skip threshold is set to 5 milliseconds because short events can be filtered from the performance indicator forecast without adversely affecting the worst case. If the skip threshold is exceeded during the interaction event, the relevant performance level value is included in the overall interaction performance level prediction. A weighted exponential decay of the performance parameters calculated for all past interaction events provides an average of 12 200422942 performance parameters for one interaction event. It should be noted that according to this technology, the interactive application performance setting algorithm uses a single overall prediction for the necessary performance level of an interactive event in the system. [This is different from the technique described in the paper mentioned above, which uses a different performance level prediction for each task depending on the task that opened the event. ] In order to limit the worst-case scenario where a wrong performance level is predicted from user experience, if the interaction event is not completed before reaching a so-called "emergency threshold", the performance level prediction at the top level is designated as the maximum performance level factor This is a top-level forecast, so the system will enforce it. At the end of the problem interactive event, the interactive algorithm calculates what the correct performance setting for the event should be and incorporates this correction into the exponential decay average that will affect future predictions. An additional optimization is performed and thus if an interactive event process When the critical threshold is reached in mid-range, the moving average is readjusted so that the modified performance level is incorporated into the exponential decay average with a higher weight (using k = 1 instead of k == 3). The performance prediction is calculated for all events older than the skip threshold. The "interaction event deadline" is used to obtain a performance level indicator for each defined interaction event. The deadline is the last time a task must be completed to avoid adversely affecting performance. It is based on the human perception associated with a particular interaction event A performance level indicator for calculating interactive events based on critical values. For example, it is known that 20 to 30 frames per second is fast enough for users to think that a series of images is a continuous flow, so an interactive image can be set to display the event feeling The critical value is 50 milliseconds. Although the actual sensory critical value depends on the user and the task in progress, a fixed value of 50 milliseconds is still considered to be suitable for the interactive algorithm of the hierarchical system. The following equation is used Calculate performance requirements for events shorter than the sensory threshold. ^ Work ise P,-~~ βη Perception Threshold where the full-speed equivalent work intestine Au is measured from the beginning of the interactive event. Program-specific performance indicators in the middle layer of the hierarchy 220 is obtained by sorting out the information output of an application that understands the performance level setting function. These application-based energy management programs 120 have been adopted to provide new API items to communicate performance requirements. The performance indicator 230 is obtained by using the algorithm based on the most used. This algorithm is for each Each task is adjusted based on the time period of a task. All types of tasks performed with expectations are the most important consideration for interactive tasks. Although the interactive certificate has a high level of effectiveness at the high-level, the algorithm based on expectations does not window Since the lowest layer of the ship system can be selected to use a longer power when appropriate. If the history pane is too short and the value between the fixed and fast earthquakes is profitable, it is given to (through the system call) wisdom to effect the month. bRequires specific information related to the system and applications to facilitate the implementation of an algorithm-based forecast and recent utilization history to estimate the processor's future tasks. Derive a usage estimate and calculate the utilization history (that is, use the history window). Grid) based algorithm to consider the processor's interactive application algorithm at the upper level. Only the application algorithm is used to calculate one for security. Level indicator and located in the hierarchical system need to be constrained by a conservative short-term history of use-a stronger power-saving strategy, the possibility of using the history pane at this stage will improve efficiency and may cause performance levels to be predicted on the two typical It is necessary to set a short utilization history 1 ^ grid when using a single unified 14 200422942, 冼 (not a combination of hierarchical systems of an algorithm) to set the performance level for all operating conditions. In order to deal with the intensive processors of 2 to 14 For interactive events, this unified algorithm must make the history pane short. The performance setting algorithm for each two-tier stack uses a way to process work at a specific time interval. In this embodiment, the A 70% working method is the full-speed (processor) equivalent work performed during this time interval. This full-speed equivalent work is estimated based on the following formula: worb ^ YjiPi i = l where i is one of the different processor performance levels implemented at that particular time interval; ti is the non-consumption consumed in performance level i Idle time in seconds; and Pi is the processor performance level i is expressed as a peak (full speed) processor performance level score. This equation is valid in systems where a time stamp counter (task counter) measures in real time. This completion is done in an alternative embodiment and can be calculated differently using a cycle counter whose count rate changes with the current processor frequency. In addition, the equation implies the assumption that a workload's operating time is inversely proportional to the processor frequency. This assumption provides a reasonable estimate of the work done. However, this assumption is not always accurate, mainly because the ratio of the bus speed to the processor speed during the performance adjustment process is non-linear. The completion calculation may be fine-tuned in alternative embodiments to take these factors into account. Figure 4 outlines the execution of a workload on a processor and the use of 15 200422942-Task A calculation history pane. The horizontal sleeve at 4th represents time. Task A starts execution at time S first, and then starts several data structures based on the task. There are four data structures corresponding to the following four parts: f) the current status of the task counter; (ii) the current time (instant); (iH ) _ The current status of the idle time counter; and (iv) set an execution bit to the logical level], indicating that the task has started execution, the service counter, counter and idle time counter are used to calculate the processor utilization related to task eight And then calculate the efficiency requirement of task Λ. At the time of pE, task a has not been completed but is occupied by another task B. Preemption occurs when the task scheduler 4 determines that his task has a higher priority than the running task. When task A is preempted, the execution bit is maintained at the logic level '丨 to indicate that the task still has tasks to be completed β at RE, task A continues to execute 0, it is rescheduled and continues to execute until completion at Tc And then voluntarily gave up processing time. Upon completion, task A can initiate a system call for the processor to perform other tasks. The execution bit is reset to logic level '0' when task A is completed at TC. After the TC hours, there is an idle period followed by an advanced task, followed by an idle period. At RS, task A starts to execute a second time. At the time of RS, the status of the execution bit '0' related to task A indicates that information exists to start calculating the performance requirements of task A. Therefore, the target performance level of the processor can be set accordingly for the re-execution of the upcoming task A. The utilization history pane of a specific task is defined as the period from the start of the first execution of the specific task to the start of the next execution of the task and should be included in the relevant pane. 16 200422942 contains at least one preemption event for the task ( In this case, task B preempts task A at 玟 £). Therefore, the utilization history pane of the task in this case is defined as a period from time S to time RS. The target performance level for task a in this pane is thus calculated as follows:

WorkEstNew = (Λ X WorkEstold + Workjse) !{k +1)WorkEstNew = (Λ X WorkEstold + Workjse)! (K +1)

Deadline New =(kx Deadlineold + {Work]se + Idle)) !{k +1)Deadline New = (kx Deadlineold + (Work) se + Idle))! (K +1)

其中k為一權值，Idle為第4囷中時間s至時間Rs 之間的間置時間秒數，而任務A的截止期限被定義為 (+ 。在此特別的範例中，對於像是第4圓任務B 之先佔任務執行偵測引導演算法判定每個任務的利用史窗袼。處理在次一未被先佔之任務A排程之前執行的任務往往與任務A的執行高度相關。TC處至RS處之間的間置時間為“遲緩期’，，處理器可以一下降的效能層級於該期間執行。然而’工作C由於縮小了可用的遲緩期而被納入效能層級計算的考量。上述方程式中的职^與各代表一指數式衰減平均。此指數式衰減平均讓更新的估計比較舊的估計對平均有更大的影響力。權值k為關於指數式衰減平均的一參數。已知k = 3可有效地作用且此微小數值指出每個估計為一好的估計。藉由個別追蹤工作預測程式以及截止期限預測程式，效能預測的重要性依據利用史窗格的長度而定。此確保與較大窗格大小有關的效能估計不會主導效能預測。此演算法的效能層級指示器…epecM.vej 由 17 200422942 兩指 Perfper 立的效負载的務所估基礎於自適合上每5 與已知出一使格大小均，但 i計算史窗格依新的非器很長的情況時這樣用等待上層臨則其工的應用數式衰減平均之比而得出： sepecrnvu-hw" 心。對每個工作計算獨能層級數值。依據本技術的策略，在一從屬於工作時間間隔50亳秒至i 5〇毫秒間重新計算對一特定任計的工作％。然而由於％是以任務一任務上被計算，因此每個被執行的任務利用其各之以任務為基礎的％值，該％4W值事實到1 〇亳秒便被修正（反映任務切換事件）。此演算法以間隔為主的演算法不同，前者對每個任務分別導用估計同時藉由任務基礎調整一任務上的利用史窗。雖然已知的統一效能設定算法使用指數式衰減平其在所有執行任務的固定使用窗格（10至50毫秒）一整體平均，而非在一變動之以任務為基礎的利用上來計算一以任務為基礎的平均。據本技術之以預期為基礎的演算法，有必要避免一互動CPU極限任務在未被先佔之情形下使用處理段期間《由於僅能於一旦該任務被先佔至少一次下疋義”亥利用史窗格，故對該任務採用該效能層級 t引質的等待時間。為了避免不想要的效能採時間&汁算工作估計t未被先佔6¾ i§程中設置-界值特別地，如果一任務持續1 00毫秒未被先佔，作估计由預設值來重新計算。考慮到確保一更迫切程式歷史1¾格透過階層體系層21〇提供給互動應用 18 200422942 程式，故選擇1 00毫秒的數值。同時也考慮到可能會被1 00 毫秒窗格臨界值影響的僅有使用者應用程式種類為密集計算的批次作業例如編譯，其可能執行數秒或甚至數分鐘。在這樣的情形中一額外的1 00毫秒（〇. i秒）執行時間可能是重要的明智效能。第5圖概要地說明第2圖之三階層體系效能策略堆疊的一種實施。該實施包含一效能指示器策略堆疊510以及一策略事件處理程式530，兩者均輸出資訊至一目標效能計算程式540。該目標效能計算程式540用於整理來自四種效能設定演算法的結果：高層互動演算法、中層以應用程式為基礎的演算法以及兩種不同的低層演算法。該四種演算法可以同時被執行。該目標效能計算程式540從該策略堆疊5 1 0所產生之多種效能指示器（在此例中為四種）導出一單一整體目標效能層級。該策略堆疊510連同該策略事件處理程式530以及該目標效能計算程式540提供一彈性架構給多種效能設定策略，因而該堆疊每層的策略演算法可依據使用者的要求而被替換或交換。因此該效能策略堆叠提供一種可納入使用者自訂之效能設定策略的平台以供實驗。多種效能設定演算法中的各個均專門對付不同特定種類之運轉時間事件。然而，由於在第5圖之示範實施例中有四種輸出不同效能指示器的不同演算法，該軟體必須決定以四種效能指不器的何者為優先以設定整體目標數值。 19 200422942 此外也必須決定可有效計算之一整體目標效能層級的時間，假設每個效能設定演算法可獨立地執行並於不同的時間產生輸出。同時也必須考慮在多個效能設定演算法均以相同的處理事件作為決定之基礎的情況下如何結合該效能指示器，否則可能發生假的目標更新。為了處理這些議題以圖中所示之三階層體系組織該策略堆疊510演算法，其中較高層級之策略可優先於導自較低（叫不具支配性的）層的效能層級請求。因此，階層演算法可優先於階層演算法，而後者可優先於階層〇的兩種演算法。注意每個階層體系層級本身可包含多個替代性的效能設定演算法。不同的效能設定演算法並不知道其於階層體系中的位置並可基於系統中的任何事件決定其效能。當一特定演算法請求一效能層級，其送出一指令伴隨其想要的效能層級至策略堆疊 510。該策略堆疊的每個演算法包含一命令512、516、5 20、5 24且儲存一對應的效能層級指示器514、518、522與526。用於階層1演算法的忽略指令5 20向目標效能計算程式440指出在計算整體效能目標時應忽略相關的效能層級指示器。已被指定給階層〇之兩演算法的設定指令512與516使目標效能計算輕式5 40不管任何來自於階層醴系較低層的效能層級請求而設定對應的效能層級。然而該設定指令無法優先於來自較高階層體系層級的效能層級請求。在此實施例中一階層〇演算法已請求將該效能設定至峰值 20 200422942 水平的5 5%，而另一階層〇演算法以要求將峰值水平的25%。該目標效能計算程式使用結合此具有相同優先性的請求，在此例t較數值為該階層〇效能指示器。於階層2,如指令連同一 8 0 %效能指示器而被指定。該“ 定”指令提供該目標效能計算程式54〇必須層級為8 0 °/。，假設此大於任何來自較低階層能指示器。在此例中該階層〇效能指示器為 1效能指示器將被忽略以使整體目標將真正值效能的80%。由於每個演算法之最近計算的效能層級略堆疊5 1 〇儲存於記憶餿中，該目標效能計於任何時間計算一新的整體目標數值而不需 &疋演算法。當該堆疊上的其中之一演算法能層級請求，該目標效能計算程式自底層向能資料結構的内容以計算一更新的整體目標此於第5圖的範例中，於層級〇設定整體預層級1仍維持在5 5 %而於層級2改變該整體雖然每個效能設定演算法可被觸發（由系統件）以於任何時候存在一組讓所有效能設定回應之共同事件時計算一新的效能層級。該程，530將監控這些事件並為其加上旗標，供策略事件資訊給目標效能計算程式54〇。該效能設定至一操作程式以佳地設定55% 果大於便設定如果大於便設設定整體效能體系層級之效 5 5 %而該階層地被設定為峰指示器由該策算程式540可調用每個效能計算一新的效上評估指令效效能層級。因測為55%，於預測為80%。中的一處理事演算法將傾向策略事件處理該處理程式提此特別事件分 21 200422942 類包含重置事件532、任務切換事件534以及效能改變事件536。該效能改變事件536為一通知，其警告每個效能設定演算法注意處理器的現在效能層級，即使其通常不會更改該策略堆疊510上的效能請求。關於此特別分類的策略事件532、534及536，並不會每次其中一演算法送出一更新的效能層級指示器便計算整體效能層級。反之，該效能層級計算被整合，因而對於每個事件通知而言僅於所有有關的效能設定演算法的所有事件處理程式已被調用之後才計算一次** 可提供一應用程式介面（API)給裝置驅動程式或裝置本身，該介面使一個別裝置將任何操作條件上的重大改變通知該策略堆疊5 1 0及/或個別效能設定演算法。這使得該效能設定演算法觸發目標效能層級的重新計算因而促使快速地採取該操作條件之改變。舉例來說，當一密集處理器 CPU極限任務開始時，該裝置可送出一通知至該策略堆憂 510〇此〆通知為選擇性的而該效能設定演算法於接收時可不需要對其回應。第6圖概要地說明依據本技術之一工作追蹤計數器 600。該工作追縱计數器600包含：一增量數值暫存器“ο, 其具有〆軟體控制模組620與一硬體控制模組630; 一累加器模組640，其具有一工作計數值暫存器與一時間計數值暫存器’一時間基礎暫存器646 ; —即時計時器65〇以及一控制暫存器660。此示範實施例的工作追蹤計數器可 22 200422942 與已知的時間標誌計數器以及CPU週期計數器不同’本實施例之計數器增量數值在接近或位於計數值被增量時與處理器實際執行的工作成正比。該增量數值暫存器610包含一完成工作計算器，其估計在每個計數器週期該處理器完成的工作。該完成工作估計透過該軟體控制模組620及/ 或透過硬體工作模組6 3 0而取得。該軟體控制模組實施一種將增量數值與現在處理器速度相關聯的簡單完成工作計算。如果該處理器以高峰效能的7 0 %運轉則該增量數值將為0.7，而若該處理器以高峰效能的40%運轉則該增量數值將為0.4。當該效能控制模組620偵測到該處理器於計數器週期為間置的則設定該增量數值為0 °在替代的工作追縱計數器實施例中使用一更精密的軟體演算法以計算一精確的完成工作估計。第1表列出測量數據，其指出當考量一效能層級於兩種不同的處理器速度間轉換（在此例中為高速至低速）時關於一 cpu極限迴圈以及一 MPEG視訊工作負載之一預期運轉時期間以及一實際運轉時期間之間的百分比差異。該結果乃是基於在明確處理器效能層級-300、400及5 OOMhz(如表中最左方攔位所示）之轉換後運轉。第1表的最上列列出將轉變至最左方欄位所列出之對應處理器速度的起始效能層級。於CPU極限迴圈，預測與實際測量無法與雜訊相分辨，然而於MPEG工作負載，在每個l〇〇Mhz處理器頻率階段大約有6%-7%的不正確損失。在這些工作負载的最 23 200422942 大不準確率看起來是低於20%(19·4%)，對於僅有數個固定效能層級的系統而言認為是可接受的。然而當一系統中可選擇的最小至最大處理器效能之可用範圍增加而每個效能層級階段的範圍減少時，似乎將需要一更準確的工作估計程式。轉換後 CPU極限迴圈 MPEG 視訊工作負載速度 400Mhz 5 0 OMhz 600Mhz 400Mhz 500Mhz 600Mhz 300Mhz -0.3% -0.4 % -0.3% 7.1 % 1 3.5 % 19.4% 400Mhz -0.1% 0.0 % 6.9 % 1 3.3 % 500Mhz 0.1 % 6.8 % 第1 表替代示範實施例之更精密的演算法使用更準確的完成工作估計技術，其包含監控指示特性資料（透過技術器追蹤像是記憶體存取等重大事件）與估計的及實際的工作負載降低比率’而非假設完成工作與處理器速度成正比。進一步的替代實施例使用快取命中率以及記憶體系統效能指示器以精確元成工作估計。更進一步的替代示範實施例使用軟體監控執行一程式設計應用所使用之處理時間比率相對於執行奇景應用程式任務所使用之處理時間的比率。該硬艘控制模組63〇即使於該處理器在兩固定效能層級間切換中的轉換期間仍可估計完成工作。每個處理器效心轉換可有大約2〇微秒的一暫停，於該期間該處理器不會 24 200422942 送出任何指示。此暫停是由於需要時間以將鎖住相位的迴圈重新同步至新的目標處理器頻率。此外，在改變該處理器頻率之前，為了新的目標頻率須穩定電壓至一適當數值。因此有最多一秒的一轉變時間，於該期間可假設該處理器以舊的目標頻率運轉但卻以新的目標層級消耗能源 (因為已設定電壓至新的目標層級該頻率可透過中間頻率階段而躍升數個層級以影頻率動態改變的轉變期間可量數值暫存器而考量該軟體範實施例同時使用硬體及軟成工作，替代的示範實施例之一以估計完成工作。累加器模組640定時自數值並將其加入工作計數值該工作計數值暫存器在每個值。該計時器時間記號為即訊號。為測量一預定時間間該累加益模組6 4 〇中的工作預定時間間隔的開始而另一的差異提供完成工作於預定該即時計時器也控制暫增加的速度。此時間計數值的時間基礎運作但用於測量響效能層級改變。在此處理器操作該硬艎控制模組以更新增未察覺的動態改變。雖然此示體控制模組6 2 0、6 3 0以計算完可以僅使用此兩種模組的其中增量數值暫存器610讀取增量暫存器中儲存的一累加總和。計時器時間記號增加工作計數時计時器650所導出的一時間隔之中的工作計數值，儲存於計數值被讀取兩次，一次於該 -人為其結尾。這兩種數值之間時間間隔中的一指示。存器644中儲存之時間計數值暫存器與該工作計^:值以相同消耗的時間而非完成的工作。 25 200422942 同時具有一時間計數器以及一完成工作計數器形成效能定演算法。提供時間基礎暫存器646的目的在於多平台容性以及轉換為秒。其用於指定兩計數器642、644之時基礎（頻率）因而時間可為準確且一致的，換言之儲存於時間計數值暫存器的累積數值提供一種耗去時間毫秒的測。該控制暫存器模，組660包含兩控制暫存器，每個計器各用其中之一。一計數器可透過適當的控制暫存器而動、停止或重置。第7圖概要地說明一種可依據工作負載特性提供數不同的固定效能層級的設備該設備包含一 CPU 710、一時計時器720、一電源供應控制模組730以及第6圖中工作追蹤計數器之增量數值暫存器610。該電源供應控模組7 3 0判斷該C P U目刖被設定以何種固定的效能層級行並為即時計時器720選擇一適當的時脈。該電源供應制模組730將目前處理器頻率上的資訊輸入至增量數值存器610。因此該增量的值與處理器頻率成正比，其依提供該處理器完成之可用工作的估計。該策略堆疊5 1 0的許多效能測定演算法使用該處理於一特定時間間隔（窗格）的利用史以估計該處理器未來適當目標速度。任何效能設定策略的主要目標為藉由使理器頻率與電壓層級降低一適當目標效能層級而最大化處理器於執行開始至任務截止期限之過程中的忙碌時間為了實際地預測目標效能層級，該智慧型能源管理設相間該預數啟種即的制執控暫次器的處該〇程 26 200422942 式1 20提供一提取（abstraction)以追蹤該處理迄於一特定時間間隔中完成的實際工作。此完成工作提取使得效能改變以及間置時間可被納入考量而不管各平台間可能有所變動之特定硬體計數器的實施。依據本計數，為了取得一時間間隔中的一工作測量估計，每個效能設定演算法被分配一“工作架構’’資料結構。設定每個演算法以於時間間隔開始時呼叫一“工作開始功能，，並於該特定時間間隔結束時呼叫一“工作停止功能。在該完成工作測量期間，該工作結構的内容被自動地更新以指定由該處理器之個別效能層級所分配的閒置時間比例與使用處理器時間比例。儲存於工作結構中的資訊隨後用於計算全速度等效工作數值 (’該數值隨後用於目標效能層級預測。此完成工作提取功能實施於該智慧型能源管理程式1 2 0的軟體中並提供連至該智慧型能源管理程式1 2〇之一便利介面給效能層級預測演算法開發者。該完成工作提取也簡化本技術之效能設定系統的埠連接為不同的硬體架構。替代的硬體平台間的一重大差異為該平台上測量時間的方式°特別地，某些架構透過時間標誌計數器提供一低經常性的週期計數方法，而其他架構僅提供外部可程式化的時間岔斷給使用者。然而即使提供時間標誌計數器也並非必然地測量相同事物。舉例來說，第一種硬體平台同時包含Intel [RTM]以及ARM [RTM]處理器。在這些處理器中該計數器計算CPU週期因而計數率與該處理器的速度 27 200422942 有關而該計數器於處理器進入休眠模式時停止計數種硬體平台包含Crusoe [RTM]處理器，其實施一時計數器一致地計算該處理器於高峰速率的週期並持該高峰速率的計數，即使該處理器處於休眠模式中成工作提取幫助本目標效能設計技術實施於此兩種體平台上。在此實施例中計算的工作估計並未考定工作以高峰效能的一半執行並非必然會耗去於處速運轉時的兩倍時間才會執行完成的事實。此違反結果的一種原因在於即使該處理器核心程式速度減憶體系統卻非如此。因此，核心程式與記憶鱧的效加對記憶體較為有利。執行模擬以估計本效能設定技術對照一已知技別地’該已知技術為内建於Transrneta Crusoe CPU， 4長時間運轉（LongRun)，電源管理程式。Transmeta CPU將‘LongRun’電源管理程式内建於處理器韌 LongRun與其他已知的電力管理技術不同，其避免業系統以使電力管理生效的需要。L〇ngRun使用處歷史使用以導引時脈選擇：若為高度使用便增加處度’而於低度使用時降低效能。不像其他實施於更統處理器上’該電力管理策略可相對容易地被 Crusoe處理器上，因為該處理器已具有一隱藏的軟行動態的二進位轉譯與最佳化。該模擬的目標在於。第二間標諸續增加。該完替代硬慮一特理器全直覺之慢，記能比增術。特 t7的一 Crusoe 體中。調整作理器的理器速多的傳實施於體層執建立一 28 200422942 種如LongRun的策略實施於軟體階層體系的如此低層如何有效地執行。本技術與LongRun —同在同一處理器上執行。於Sony Vaio PCG-C1VN筆記型電腦上執行此模擬，該筆記型電腦使用 Transmeta Crusoe處理器並於數個固定效能層級300Mhz至600Mhz之間以lOOMhz之效能層級階段運轉。此模擬使用一種具有Linux 2.4.4 acl8核心程式之改進版本的Mandrake 7.2作業系統。用於比較之估計的工作負載如下：Plaympeg SDL MPEG播放器程式庫、用於演算 PDF檔案的 Acrobat Reader、用於文字編輯的 Emaes、用於閱讀新聞的NetScape郵件與新聞4.7、用於網頁澳I覽的Konqueror 1_9.8以及作為一 3D遊戲的Xwelltris 1·〇.〇。用於互動外殼程式命令的效能測試程式為一使用者於大約 3 0分鐘範圍期間中執行雜項外殼程式操作的記錄。為了避免該Crusoe處理器之動態轉譯引擎可能產生的變化性，多數效能測試程式至少執行兩次以讓該動態轉譯快取記憶體做好準備，而除了最後執行的以外，所有產生的模擬資料均被忽略。依據本發明之效能設定演算法的設計將不會阻礙其主機平台控制計時器的方式。為了此模擬的目的，本技術提供一次毫秒解析度定時器，而不會改變該 Linux内建之 l〇ms解析度計時器工作的方式。此目標藉由背負一計時器分配常式（其檢查計時器事件）至該核心程式常執行的部分如排程程式與系統呼叫上而達成。 29 200422942 由於依據本技術之效能設定演算法被設計為具有至核心程式的掛勾以使其可岔斷某些系統呼叫以找出互動事件且其於每個任務切換均被調用，因此可直接地附加一些指示至這些掛勾以管理計時器分配。每個掛勾藉由實施一時間標諸&十數器讀取、一種與次一計時器事件時間標誌、之比較以及於成功時實施一分支至該計時器分配常式而被擴充。在實際操作中發現此策略建立一具有次毫秒準確度的計時器。下面的第2表詳述附屬於該模擬中的計時器統計。最遭情況的計時器解析度被該排程程式中的1 〇毫秒（看似與第2表不一致）時間單位所限制。然而，由於依據本計數之效能設定演算法專注於測量之情形通常發生於接近時間觸發器之處’因此達成的解析度被視為是足夠的。已證明該系統的軟镀計時器於該處理器處於休眠模式時停止工作是有利的，因為此意味著該計時器岔斷並未改變執行中作業系統與應用程式之休眠特性。所使用的計時器具有高解析度與低額外負擔。這些計時器的優點促進一同時具有主動模式與被動模式之實施的發展。主動模式中控制依據本技術的效能設定演算法。而被動模式中該内建的LongRun電力管理程式負責效能，雖然本技術之智慧型能源管理程式作為執行與效能改變的觀察者》第2表 30 200422942 存取至一時間標誌計數器所需 30至40週期計時器檢査的平均間隔 __〜〇 · 1氅秒 __〜1毫秒計時器準確度 — 平均計時器檢査舆分配持續期間（包括可能執行一事件處理程式） 1 〇〇至15〇週期監控由LongRun產生的效能改變類似於該計時器分配常式而達成。依據本技術之智慧型能源管理程式12〇透過一特定機器暫存器定時地讀取該處理器的效能層級並將結果與一先前數值相比較。如果兩數值不同，則其變化會兮己錄至一緩衝器中。依據本技術之智慧型能源管理程式包含一追蹤機制，其保留一核心程式緩衝器中重大事件的記錄。此記錄包含源自不同策略的效能層級請求、任務先佔、任務IDs(識別記號）以及該處理器的效能層級。在執行該模擬時可比較LongRun以及在相同執行運轉期間中依據本技術之效能設定演算法：LongRun控制效能設定而依據本計數之智慧型能源管理程式120可用於輸出在相同工作負載上其於控制中而可能做的決定。此模擬策略用於客觀地評估已知的LongRun技術與本技術之間的互動效能測試程式之不可重複執行之間的差異。為了評估使用該測量與效能設定技術的額外負擔，依據本技術之效能設定演算法配有標示器，其持續追蹤於運轉時間該效能設定演算法碼中消耗的時間。雖然依據本技術於一 Pentium II上的運轉時間額外負擔為大約0.1 %至 31 200422942 0 · 5%，但於Transmeta Crusoe處理器上的額外負擔為1% 至4%。於虚擬機器中的進一步測量例如‘VMWare’與4使用者模式linux，（UML)確認依據本技術之效能設定演算法的額外負擔於虛擬機器中可能遠大於在傳統處理器架構上。然而此額外負擔可藉由演算法最佳化而降低。 MPEG(動態畫面專家群組）視訊播放對於所有測試的效能設定演算法提出一難難考驗。雖然該效能設定演算法典型地將一週期性負載置於該系統上，效能需求仍可依據 MPEG訊框種類而改變。因此，如果一效能設定演算法對應過去的（高度變化性的）MPEG訊框解碼事件使用一相對的長時間窗格以預測未來的效能需求，其可能錯過執行密集運算訊框（較不具代表性的）的截止期限。另一方面，如果該演算法僅考慮一短暫的間隔，則其將不會收斂於單一效能數值而會快速地於多個設定間震盪。由於每個效能層級的改變導致一轉換延遲，於不同的效能數值間快速震盪是不令人滿意的。關於LongRun模擬結果確認於MPEG效能測定程式之震盪行為。本技術藉由依據位於階層體系最上層的互動效能設定演算法以限制最糟情況的回應而處理此MPEG工作負載的震盪問題。於階層體系的最下層有更多傳統的間隔基礎預期演算法將可採取更長期的效能層級需求觀點。第 8圖為一表格，其詳述關於該‘plaympeg’放影機 (lL本·ίP://www.I〇kipames.com/develor)ment/smpeg.php3)播放 32 200422942 各種MPEG視訊時的模擬測量結果。該放影機的某些内部變數已被揭露以提供該放影機如何受到執行時動態地改變該處理器效能層級產生之結果的影響。這些數字顯示於該表格的MPEG解碼欄中。特別地，該‘提早（Ahead)’測量每個訊框有多接近截止期限。至截止期限的接近程度以播放每種視訊之累積秒數表示。為了最大的電力效率，該提早變數值應儘可能接近零，雖然該處理器最慢的效能層級對提早數值可被降低的程度設了一下限。位於該表格最右側攔位的一‘正好位於時間攔位’指出正好符合其截止期限的訊框總數量。正好準時的訊框數越多，則該效能設定演算法就越接近理論最佳值。第8圖表格中執行統計攔的資料被監控子系統的智慧型能源管理程式1 20所收集。為了收集關於LongRun的資訊，該智慧型能源管理程式12〇於被動模式中被用於聚集效能改變的軌跡而不控制該處理器效能層級。該閒置攔位指出該核心程式之閒置迴圈中（可能處理内部雜務或僅在盤旋）耗去的時間比例，而該休眠棚位才匕出該處理器實際處於一低電力休眠模式所耗去的時間比例。可從第8囷中看出對於每個這些效能測量本技術表現地較LongRun好上許多。第9圖為一表格，其列出於每個工作負栽的執行期間所收集的處理器效能層級統計。將每個效能層級的時間比例運算為該工作負載之執行期間的總共非閒置時間的一部份。該表格之‘平均效能，層級攔指出在每個工作負載之執 33 200422942 行期間的平均效能層級（以高峰效能之百分比表示）。因此於所有情形中，使用本技術之每個工作負載的平均效能層級較使用LongRun為低，最後一攔指出關於LongRun達成的平均效能降低。LongRun工作負載與本發明之工作負載的播放品質是相同的，即相同的訊框速度且沒有遺漏訊框。結果顯示本技術較已知的LongRun技術可更準確地預測必要的效能層級。增加的準確度造成在執行效能測試程式的期間中該處理器之平均效能層級有1 1 %至3 5 %的降低。由於執行一工作負載之間的工作數量應該保持相同，該較低的平均效能層級暗示當本技術之智慧型能源管理程式啟動時可預期有降低的閒置與休眠時間。該模擬結果確認了此一預期。相似地，當本技術之智慧效能管理程式啟動時正好符合其截止期限的訊框數量將會增加，而當解碼早於其截止期限所累積的時間數量將會降低。該中間效能層級（由第9圖表格中的每欄以粗體強調）也顯示重大的降低。依據本發明之效能設定演算法於多數效能測定程式上選定一低於高峰的單一效能層級給最多數的執行時間（>88%)，而LongRun通常設定該處理器以全速運轉。此通常規則的例外為‘Danse De Cable，工作負載，其中依據本技術之效能設定演算法選定最低的兩種效能層級並於此兩層級間震盪。此震盪行為的原因乃是由於該 Crusoe處理器的特定效能層級。依據本技術之效能設定演算法將決定選擇僅高於300 Mhz —點點的一效能層級，因 34 200422942 而當效能層級預測在300 Mhz的上下波動時，層級將被設為最接近的兩效能層級數值。該已技術與本技術之效能上最值得注意的不同在於偵測到大量的處理器活動時，其非常快地躍升而顯得過度謹慎。關於所有的工作負載，使用LongRun的平能層級從未低於 80%，而由本技術設定的多 ‘Red’s Nightmare Small’效能測試程式中下降J 本技術之演算法較LongRun更為主動但於服務所下降時可快速地反應。由於LongRun並未擁互動效能的資訊，其被迫於一較短的時間訊框的行動而該模擬結果顯示此致使無效率。第10圖包含播放兩種名為‘Legendary’（第 ‘Danse de Cable)(第 10B 圊）之不同 MPEG 電表。每個圖表說明對於LongRun與本技術於四能層級（300，400，500, 600 MHz)各自耗去時間然每次執行的播放品質是相同的，但仍可由圖用依據本技術之演算法時該處理器於高峰效能的時間大大地長於該LongRun技術指定該效況。第10A圖中描繪之播放‘Legendary，電影的據本技術之演算法選定一 500 MHz效能層級。所繪關於‘Danse de Cable，電影之結果顯示使術之演算法，該處理器於兩效能層級300 MHz 該目標效能知 LongRun 當 LongRun 該效能層級均處理器效吹能層級於 L 52%。依據品質顯示有有任何關於中採取保守 10A圖）與影的結果圖種處理器效的比例。雖表中看出使之下所耗去能層級的情結果顯示依第10B圖中用依據本技與 400 MHz 35 200422942 之間切換定演算法 Mhz 〇第1 入觀察。然而依據一目標效 LongRun 級。第1: 程式但啟示執行期術之效能上可請求些情形中實上低於如今術。由於作負載的服此困難體地說，式的控制 120僅被設定選擇 °藉由對照’關於該兩部電影該LongRun效能設於大部分的執行時間選擇高峰處理器速度600 1圖提供對兩種不同效能設定策略之品質上的深 LongRun持續快速交替地上下切換該效能層級，本技術所控制的系統之處理器效能層級保持接近心層級。第11A囷之兩圖表（上排）顯示在啟動執行一效能測試程式支期間該處理器的效能層、 c囷（中排與下排）顯示關於相同效能測試動本發明之演算法的效能層級結果。第11B圓顯間中的實際效能層級，而第11C圖反映依據本技 p又疋演算法在一可於任何效能層級運轉之處理器的效能層級（假設有相同的最大效能）。注意於某 ’依據本技術之演算法所計算的需求效能層級事該處理器上可達成的最低效能層級。考慮模擬結果用於比較互動工作負載上的兩種技難以建立重複執行的互動效能測試程式，互動工估計較多媒體效能測試程式要難上許多。為了克 ’實驗上的測量與一簡易模擬技術相結合。更具該互動效能測試程式於本機LongRun電力管理程下運轉’而依據本技術之智慧型能源管理程式動模式中執行，因而其僅記錄其可能做出的效能而不會實際改變該處理器的效能層級。 36 200422942 第12圖顯示在一模擬運轉過程中收集的效能資料以供評估互動工作負載。第12A圖為關於LongRun技術之相對於時間（以粆為單位）的效能層級百分比圖表且在此情形中所緣之結果相當於該處理器於該測量中的實際效能層級。第12B圖為一數值化的效能層級圖表，而第12C圖為本技術之效能設定演算法於控制該處理器可設定的一原始效能層級時間函數圖表。注意若本技術之演算.法事實上於控制中，其效能設定選擇將與LongRun所做的選擇有一不同的運轉時間影響。因為此因素於第12B與12C圊之圖表中的時間軸將被視為近似值。為了避免該統計之時間偏離問題，依據本技術之模擬的被動效能層級軌跡於往後被處理以評估使用本技術而非 LongRun所可能導致之增加執行時間的影響。僅關注互動事件而非整個效能層級軌跡。本技術之互動效能設定演算法包含找出對使用者具有一直接影響之執行過程的功能。此技術給予有效的讀數而不論何種演算法負責控制因而用於關注我們的測量。一旦一互動事件的執行範圍被獨立， LongRun與本技術均計算於該事件過程中的全速等效完成工作。由於在測量期間中LongRun控制該CPU速度而其運轉較本技術控制時為快，對應本技術之結果的事件過程必須被延長。首先，本技術依據下列公式計算剩餘工作：Where k is a weight, Idle is the number of seconds between the time s and the time Rs in the fourth frame, and the deadline of task A is defined as (+. In this particular example, for the 4-round task B The preemptive task execution detection guidance algorithm determines the utilization history of each task. Processing tasks executed before the next unpreempted task A is scheduled is often highly related to the execution of task A. The interval between the TC and the RS is the "latency period", and the processor can execute a reduced level of performance during this period. However, the "work C" is included in the performance level calculation due to the reduction of the available latency period. The positions in the above equation and each represent an exponential decay average. This exponential decay average makes the updated estimate have a greater influence on the average than the old estimate. The weight k is a parameter about the exponential decay average. .K = 3 is known to work effectively and this small value indicates that each estimate is a good one. With individual tracking job predictors and deadline predictors, the importance of performance predictions is based on the use history pane Depending on the length. This ensures that performance estimates related to larger pane sizes do not dominate performance predictions. The performance level indicator for this algorithm ... epecM. vej is calculated based on 17 200422942 two-fingered Perfper's payload. It is based on self-adaptation and every known 5 equals the size of the grid, but it is used when the calculation history pane is long according to the new negator. Wait for the upper layer to calculate the average ratio of the attenuation of the applied formula to obtain: sepecrnvu-hw " heart. Calculate the unique level value for each job. In accordance with the strategy of the present technology, the% of work for a particular task is recalculated between a subordinate working time interval of 50 亳 to i 50 milliseconds. However, because% is calculated on a task-to-task basis, each executed task uses its task-based% value, and the% 4W value fact is corrected to 10 seconds (reflecting the task switching event) . This algorithm is different in interval-based algorithms. The former uses estimates for each task and adjusts the usage history window on a task by task basis. Although known uniform performance setting algorithms use exponential decay to flatten their fixed-use panes (10 to 50 milliseconds) across all tasks performed as a whole, rather than calculating a task-based utilization over a variable task-based utilization Based averaging. According to the anticipation-based algorithm of this technology, it is necessary to avoid an interactive CPU extreme task from being used without being preempted during the processing section. "Because it can only be downloaded at least once once the task is preempted." The history pane is used, so the waiting time of the performance level t prime is used for this task. To avoid unwanted performance time & calculation work estimate t is not pre-occupied 6¾ i§ set the limit value in particular If a task lasts for 100 milliseconds and is not preempted, it is recalculated based on the preset value. Considering to ensure a more urgent program history, 1¾ grid is provided to the interactive application 18 200422942 program through the hierarchical system layer 21, so choose 1 The value of 00 milliseconds. It is also considered that only user applications that may be affected by the 100 millisecond pane threshold are batch-intensive batch operations such as compilation, which may take seconds or even minutes. In this case an additional 100 milliseconds (0. i seconds) execution time can be an important wise performance. Figure 5 outlines an implementation of the three-tier system performance strategy stack of Figure 2. The implementation includes a performance indicator strategy stack 510 and a strategy event handler 530, both of which output information to a target performance calculator 540. The target performance calculation program 540 is used to organize the results from four performance setting algorithms: a high-level interactive algorithm, a middle-level application-based algorithm, and two different low-level algorithms. The four algorithms can be executed simultaneously. The target performance calculation program 540 derives a single overall target performance level from the multiple performance indicators (four in this example) generated by the strategy stack 5 10. The strategy stack 510, together with the strategy event handler 530 and the target performance calculation program 540, provides a flexible framework for multiple performance setting strategies, so the strategy algorithms of each layer of the stack can be replaced or exchanged according to user requirements. Therefore, the performance strategy stack provides a platform for user-defined performance setting strategies for experimentation. Each of the various performance setting algorithms is dedicated to dealing with different specific types of uptime events. However, since there are four different algorithms for outputting different performance indicators in the exemplary embodiment of FIG. 5, the software must decide which of the four performance indicators is preferred to set the overall target value. 19 200422942 It is also necessary to determine the time at which one overall target performance level can be efficiently calculated, assuming that each performance setting algorithm can be executed independently and produce output at different times. At the same time, it is also necessary to consider how to combine the performance indicator under the condition that multiple performance setting algorithms all use the same processing event as a basis for determination, otherwise false target updates may occur. In order to deal with these issues, the strategy stacking 510 algorithm is organized in a three-tier system as shown in the figure, where higher-level policies may take precedence over performance-level requests that are derived from lower (not dominant) layers. Therefore, the hierarchical algorithm can take precedence over the hierarchical algorithm, and the latter can take precedence over the two algorithms of the level 0. Note that each hierarchy can itself contain multiple alternative performance setting algorithms. Different performance setting algorithms do not know their position in the hierarchy and can determine their performance based on any event in the system. When a specific algorithm requests a performance level, it sends an instruction to the strategy stack 510 along with its desired performance level. Each algorithm of the strategy stack includes a command 512, 516, 5 20, 5 24 and stores a corresponding performance level indicator 514, 518, 522, and 526. The ignore instruction 5 20 for the layer 1 algorithm indicates to the target performance calculation program 440 that the related performance level indicator should be ignored when calculating the overall performance target. The setting instructions 512 and 516 of the two algorithms that have been assigned to the level 0 make the target performance calculation light 5 40 and set the corresponding performance level regardless of any performance level request from the lower level of the hierarchy. However, this setting command cannot take precedence over performance level requests from higher-level system levels. In this embodiment, one level 0 algorithm has requested to set the performance to 55% of the peak 20 200422942 level, while the other level 0 algorithm requires to set 25% of the peak level. The target performance calculation program uses the request with the same priority in combination with the value of t in this example as the performance indicator of the layer. In level 2, if the command is specified with the same 80% performance indicator. The "set" instruction provides the target performance calculation program 54. The level must be 80 ° /. Assume that this is greater than any performance indicator from a lower class. In this example, the level 0 performance indicator is 1 and the performance indicator will be ignored so that the overall target will actually be 80% of the performance. Since the most recently calculated performance level of each algorithm is slightly stacked in the memory 5, the target performance is calculated at any time to calculate a new overall target value without the & algorithm. When one of the algorithms on the stack is able to request a hierarchy, the target performance calculation program calculates an updated overall target from the bottom to the content of the data structure. In the example in Figure 5, the overall pre-level is set at level 0. 1 remains at 55% and changes the whole at level 2 although each performance setting algorithm can be triggered (by system components) to calculate a new performance at any time when there is a set of common events that all performance settings respond to Hierarchy. During this process, 530 will monitor these events and flag them for strategic event information to the target performance calculator 54. The performance is set to an operating program to optimally set 55% if it is greater than the setting. If it is greater than the setting, the overall performance of the system level is set to 55% and the level is set as a peak indicator. The calculation program 540 can call each A new performance calculation evaluates the performance level of the command effectiveness. The estimate is 55% and the forecast is 80%. A processing event in the algorithm will tend to be strategic event processing. This handler will be divided into 21 200422942 categories including reset event 532, task switching event 534, and performance change event 536. The performance change event 536 is a notification that warns each performance setting algorithm to pay attention to the current performance level of the processor, even though it typically does not change the performance request on the policy stack 510. Regarding the strategic events 532, 534, and 536 of this special classification, the overall performance level is not calculated every time one of the algorithms sends an updated performance level indicator. On the contrary, the performance level calculation is integrated, so for each event notification, it is only calculated once after all the event handlers of all relevant performance setting algorithms have been called ** An application programming interface (API) can be provided to The device driver or the device itself. This interface enables a device to notify the strategy stack of 5 10 and / or individual performance setting algorithms of any significant changes in operating conditions. This causes the performance setting algorithm to trigger a recalculation of the target performance level and thus promptly take changes in the operating conditions. For example, when a CPU-intensive processor-bound task starts, the device may send a notification to the policy stack 510. This notification is optional and the performance setting algorithm may not need to respond to it when received. FIG. 6 schematically illustrates a work tracking counter 600 according to one of the techniques. The work tracking counter 600 includes: an incremental value register "ο, which has a software control module 620 and a hardware control module 630; an accumulator module 640, which has a work count value Register and a time count register 'a time base register 646;-an instant timer 65 and a control register 660. The work tracking counter of this exemplary embodiment can be 22 200422942 with a known time The flag counter and the CPU cycle counter are different. The increment value of the counter in this embodiment is close to or at the time when the count value is incremented, and is proportional to the actual work performed by the processor. The increment value register 610 includes a work completion calculator. , Which estimates the work completed by the processor in each counter cycle. The completed work is estimated to be obtained through the software control module 620 and / or through the hardware work module 630. The software control module implements a The amount of work is simply calculated in relation to the current processor speed. If the processor is running at 70% of peak performance, the incremental value will be 0. 7, and if the processor is operating at 40% of peak performance, the incremental value will be 0. 4. When the performance control module 620 detects that the processor is interposed in the counter cycle, the increment value is set to 0 °. In the alternative work tracking counter embodiment, a more sophisticated software algorithm is used to calculate a Precise job estimates. Table 1 lists the measurement data, which indicates one of a cpu limit loop and an MPEG video workload when considering a performance level transition between two different processor speeds (high to low in this example). The percentage difference between the expected operational period and an actual operational period. The result is based on the conversion after clear processor performance levels -300, 400 and 5 OOMhz (as shown in the leftmost stop in the table). The top column of Table 1 lists the initial performance levels that will transition to the corresponding processor speeds listed in the leftmost column. At the CPU limit loop, predictions and actual measurements cannot be distinguished from noise. However, for MPEG workloads, there is an incorrect loss of about 6% -7% at each 100Mhz processor frequency stage. The largest inaccuracies in these workloads appear to be less than 20% (19.4%), which is considered acceptable for systems with only a few fixed performance levels. However, as the available range of selectable minimum to maximum processor performance in a system increases and the range of each performance level phase decreases, it appears that a more accurate job estimation procedure will be required. CPU limit loop MPEG video workload after conversion 400Mhz 5 0 OMhz 600Mhz 400Mhz 500Mhz 600Mhz 300Mhz -0. 3% -0. 4% -0. 3% 7. 1% 1 3. 5% 19. 4% 400Mhz -0. 1% 0. 0% 6. 9% 1 3. 3% 500Mhz 0. 1% 6. 8% Table 1 replaces the more sophisticated algorithms of the exemplary embodiment using more accurate job estimation techniques, which include monitoring indication characteristics (tracking of major events such as memory access through a technology device) and estimated and actual The workload reduction ratio 'is not assuming that the completion of work is directly proportional to processor speed. A further alternative embodiment uses a cache hit ratio and a memory system performance indicator to accurately generate a job estimate. A further alternative exemplary embodiment uses software to monitor the ratio of the processing time used to execute a programming application relative to the processing time used to execute a wonder application task. The hardship control module 63 can estimate the completion of work even during the transition of the processor between two fixed performance levels. Each processor can have a pause of approximately 20 microseconds during the transition, during which time the processor will not send any instructions. This pause is due to the time required to resynchronize the phase-locked loop to the new target processor frequency. In addition, before changing the processor frequency, the voltage must be stabilized to a proper value for the new target frequency. Therefore, there is a transition time of at most one second, during which it can be assumed that the processor runs at the old target frequency but consumes energy at the new target level (because the voltage is set to the new target level, the frequency can pass through the intermediate frequency stage While jumping up several levels to measure the value register during the transition period where the shadow frequency changes dynamically, the software example embodiment uses both hardware and software to work, one of the alternative exemplary embodiments to estimate the completion of the work. Group 640 is timed from the value and added to the work count value. The work count value register is at each value. The timer time is marked as a signal. To measure the work in the accumulation module 6 4 〇 for a predetermined time The beginning of a predetermined time interval while another difference provides the completion of work. The instant timer also controls the rate of temporary increase. The time basis of this time count operates but is used to measure changes in response levels. Here the processor operates the hard艎 Control module to update and add undetected dynamic changes. Although this display control module 6 2 0, 6 3 0 can be used only after calculation In the two modules, the incremental value register 610 reads an accumulated sum stored in the incremental register. The timer time mark increases the work count value in the time interval derived by the timer 650 when the work count is increased. The value stored in the count is read twice, once at the end of the person. An indication of the time interval between these two values. The time count value register stored in the memory 644 and the work count ^: The value takes the same time consumed instead of the completed work. 25 200422942 has both a time counter and a completed work counter to form a performance deterministic algorithm. The purpose of providing a time-based register 646 is multi-platform capacity and conversion to seconds. Its It is used to specify the base (frequency) of the two counters 642, 644, so the time can be accurate and consistent, in other words, the accumulated value stored in the time count value register provides a measure that takes time milliseconds. The control register mode Group 660 contains two control registers, each of which uses one of them. A counter can be moved, stopped, or reset through an appropriate control register. Figure 7 To explain a device that can provide a number of different levels of fixed performance depending on the characteristics of the workload. The device includes a CPU 710, a timer 720, a power supply control module 730, and the incremental value of the work tracking counter in Figure 6. Register 610. The power supply control module 730 determines the fixed performance level at which the CPU is set and selects an appropriate clock for the real-time timer 720. The power supply system module 730 will The information on the current processor frequency is input to the incremental value memory 610. Therefore, the value of the increment is proportional to the processor frequency, which provides an estimate of the available work done by the processor. This strategy stacks many 5 1 0 The performance measurement algorithm uses the utilization history of the process at a specific time interval (pane) to estimate the appropriate target speed of the processor in the future. The main goal of any performance setting strategy is to maximize the busy time of the processor from the start of execution to the deadline of the task by reducing the processor frequency and voltage level to an appropriate target performance level. In order to actually predict the target performance level, the Intelligent energy management sets the pre-launch and start-up control relays between phases. 26 200422942 Formula 1 20 provides an extraction to track the actual work that the process has completed in a specific time interval . This completion work abstraction allows performance changes and lag time to be taken into account regardless of the implementation of specific hardware counters that may vary between platforms. Based on this count, in order to obtain a work measurement estimate for a time interval, each performance setting algorithm is assigned a "work structure" data structure. Each algorithm is set to call a "work start function" at the beginning of the time interval , And call a "work stop function" at the end of the specific time interval. During the completion of the work measurement, the content of the work structure is automatically updated to specify the proportion of idle time allocated by the individual performance level of the processor and Use processor time scale. The information stored in the work structure is then used to calculate the full-speed equivalent work value ('The value is then used to predict the target performance level. This completed work extraction function is implemented in the smart energy management program 1 2 0 software and provides a convenient interface to the smart energy management program 1220 for developers of performance-level prediction algorithms. The completion of the task extraction also simplifies the performance setting system of this technology. The port connection of the system is different hardware. Architecture. A major difference between alternative hardware platforms is the way in which time is measured on that platform. ° Specifically, some architectures provide a low recurrence cycle counting method through time stamp counters, while other architectures only provide externally programmable time breaks to the user. However, even if a time stamp counter is provided, it does not necessarily measure the same For example, the first hardware platform contains both Intel [RTM] and ARM [RTM] processors. In these processors, the counter calculates CPU cycles and the counting rate is related to the speed of the processor 27 200422942. The counter stops counting when the processor enters the sleep mode. The hardware platform includes the Crusoe [RTM] processor, which implements a time when the counter consistently calculates the period of the processor at the peak rate and holds the peak rate count, even if the processor is in The work extraction in the sleep mode helps the target efficiency design technology be implemented on these two platforms. The estimated work calculated in this embodiment does not determine that the work is performed at half the peak performance and is not necessarily consumed at speed. The fact that it takes twice as long to complete. One reason for this violation is that This is not the case with the processor's core program speed reduction memory system. Therefore, the core program and memory card effect increase is more beneficial to the memory. Run simulation to estimate the performance setting technology compared to a known technology 'the known technology It is built in Transrneta Crusoe CPU, 4 long run (LongRun), power management program. Transmeta CPU has 'LongRun' power management program built into the processor. LongRun is different from other known power management technologies, and it avoids the system. The need for power management to take effect. LOngRun uses historical usage to guide clock selection: if it is highly used, it will increase the degree 'and reduce performance when it is used at a low level. Unlike other implementations on more unified processors 'The power management strategy can be relatively easily implemented on the Crusoe processor, because the processor already has a hidden soft-line dynamic binary translation and optimization. The goal of this simulation is. The second standard has continued to increase. This finish is slower than replacing a full-intuition-thinking processor. Special t7 in a Crusoe body. Adjust the processor speed of the processor. Many implementations are implemented at the system level. 28 200422942 Strategies such as LongRun are implemented at such a low-level layer of the software hierarchy how to execute them effectively. This technique executes on the same processor as LongRun. This simulation was performed on a Sony Vaio PCG-C1VN laptop, which uses a Transmeta Crusoe processor and operates at a fixed performance level of 300Mhz to 600Mhz at a performance level of 100OMhz. This simulation uses one with Linux 2. 4. 4 An improved version of the acl8 core program, Mandrake 7. 2 operating system. The estimated workload for comparison is as follows: Plaympeg SDL MPEG player library, Acrobat Reader for calculating PDF files, Emaes for text editing, NetScape mail and news for reading news 4. 7.Konqueror 1_9 for web browsing. 8 and Xwelltris 1.0 as a 3D game. 〇. The performance test program for interactive shell commands is a record of a user performing miscellaneous shell operations during a period of approximately 30 minutes. In order to avoid the variability that the dynamic rendering engine of the Crusoe processor may generate, most performance test programs are executed at least twice to prepare the dynamic rendering cache memory, and all the generated simulation data except the last one is executed. be ignored. The design of the performance setting algorithm according to the present invention will not hinder the way the host platform controls the timer. For the purpose of this simulation, this technology provides a millisecond resolution timer without changing the way the Linux built-in 10ms resolution timer works. This goal is achieved by carrying a timer assignment routine (which checks for timer events) to parts of the core program that are often executed, such as schedulers and system calls. 29 200422942 Because the performance setting algorithm based on this technology is designed to have hooks to the core program so that it can break certain system calls to find interactive events and it is called at every task switch, it can be Attach some indicators to these hooks to manage timer assignments. Each hook is augmented by implementing a time-scaled & ten-counter read, a time stamp with the next timer event, a comparison, and a branch-to-timer assignment routine on success. It is found in practice that this strategy establishes a timer with sub-millisecond accuracy. Table 2 below details the timer statistics attached to this simulation. In the worst case, the resolution of the timer is limited by the time unit of 10 milliseconds (seemingly inconsistent with Table 2) in the scheduler. However, since the performance-setting algorithm based on this count focuses on measurement, which usually occurs close to the time trigger ', the resolution reached is considered sufficient. It has proven to be advantageous for the system's soft-plated timer to stop working when the processor is in sleep mode, because this means that the interruption of the timer does not change the sleep characteristics of the operating system and applications during execution. The timer used has high resolution and low overhead. The advantages of these timers facilitate the development of implementations with both active and passive modes. In the active mode, the control is based on the performance setting algorithm of this technology. In the passive mode, the built-in LongRun power management program is responsible for performance, although the smart energy management program of this technology acts as an observer of performance and performance changes. Table 2 30 200422942 30 to 40 required to access a time stamp counter Average interval of period timer check __ ~ 〇 · 1 leap second__ ~ 1 millisecond timer accuracy — average timer checks duration of allocation (including possible execution of an event handler) 1 00 to 15 The performance change produced by LongRun is similar to that achieved by the timer allocation routine. The smart energy management program 12 according to the present technology periodically reads the processor's performance level through a specific machine register and compares the result with a previous value. If the two values are different, the changes are recorded in a buffer. The smart energy management program according to the technology includes a tracking mechanism that keeps a record of major events in a core program buffer. This record contains performance level requests, task preemption, task IDs (identification tokens), and the performance level of the processor from different strategies. When running the simulation, LongRun can be compared with the performance setting algorithm of this technology during the same execution period: LongRun controls the performance setting and the intelligent energy management program 120 based on this count can be used to output its control over the same workload. A possible decision. This simulation strategy is used to objectively evaluate the difference between the known LongRun technology and the non-repeatable execution of the interactive performance test program between the technology. In order to evaluate the additional burden of using this measurement and performance setting technology, the performance setting algorithm according to this technology is equipped with a marker that continuously tracks the elapsed time of the performance setting algorithm code in runtime. Although the extra operating time burden on a Pentium II according to the technology is about 0. 1% to 31 200422942 0 · 5%, but the additional burden on the Transmeta Crusoe processor is 1% to 4%. Further measurements in virtual machines, such as 'VMWare' and 4-user mode linux, (UML) confirm that the additional burden of performance setting algorithms based on the technology may be much greater in virtual machines than on traditional processor architectures. However, this additional burden can be reduced by algorithm optimization. MPEG (Motion Picture Experts Group) video playback poses a difficult test for the performance setting algorithms of all tests. Although the performance setting algorithm typically places a periodic load on the system, the performance requirements may still vary depending on the type of MPEG frame. Therefore, if a performance setting algorithm responds to past (highly volatile) MPEG frame decoding events using a relatively long pane of time to predict future performance requirements, it may miss the execution of intensive computing frames (less representative Deadline). On the other hand, if the algorithm only considers a short interval, it will not converge to a single performance value and will quickly oscillate between multiple settings. Since each performance level change causes a conversion delay, rapid oscillation between different performance values is not satisfactory. The LongRun simulation results are confirmed in the oscillating behavior of the MPEG performance measurement program. This technology addresses the oscillating problem of this MPEG workload by setting algorithms based on interactive performance settings at the top of the hierarchy to limit the worst-case response. There are more traditional interval-based prediction algorithms at the bottom of the hierarchy that can take a longer-term view of performance-level requirements. Figure 8 is a table detailing the 'plaympeg' player (lL 本 · ίP: // www. I〇kipames. com / develor) ment / smpeg. php3) 32 200422942 Analog measurement results when playing various MPEG videos. Certain internal variables of the player have been disclosed to provide how the player is affected by the results of dynamically changing the processor performance level during execution. These numbers are shown in the MPEG decode column of the table. In particular, this 'Ahead' measures how close each frame is to the deadline. The closeness to the deadline is indicated by the cumulative number of seconds of playing each video. For maximum power efficiency, the early variable value should be as close to zero as possible, although the processor's slowest performance level places a lower limit on how early the value can be reduced. A 'just-in-time' block located on the far right of the form indicates the total number of frames that just met its deadline. The larger the number of just-in-time frames, the closer the performance setting algorithm is to the theoretical optimal value. The data of the execution statistics in the table in Figure 8 were collected by the smart energy management program 120 of the monitoring subsystem. In order to collect information about LongRun, the smart energy management program 120 is used in the passive mode to gather the trajectory of performance changes without controlling the processor performance level. The idle stop indicates the percentage of time spent in the idle loop of the core program (which may handle internal chores or only hovering), and the hibernation booth only consumes the processor when it is actually in a low-power sleep mode. Time proportion. It can be seen from Section 8 that for each of these performance measures the technology performs much better than LongRun. Figure 9 is a table listing the processor performance level statistics collected during the execution of each job load. Calculate the time ratio for each performance level as part of the total non-idle time during the execution of that workload. The ‘Average Performance, Levels’ column in the table indicates the average performance level (expressed as a percentage of peak performance) during the execution of each workload. Therefore, in all cases, the average performance level of each workload using this technology is lower than using LongRun, and the last block indicates that the average performance achieved with LongRun is reduced. The playback quality of the LongRun workload and the workload of the present invention is the same, i.e. the same frame speed and no missing frames. The results show that the technology can predict the necessary performance level more accurately than the known LongRun technology. The increased accuracy results in a 11% to 35% reduction in the average performance level of the processor during the performance test program. Since the amount of work performed between performing a workload should remain the same, this lower average performance level implies that reduced idle and hibernation times can be expected when the technology's smart energy management program is launched. The simulation results confirmed this expectation. Similarly, the number of frames that exactly meet its deadline will increase when the Smart Performance Manager of this technology starts, and the amount of time accumulated when decoding is earlier than its deadline will decrease. This intermediate performance level (emphasized by each column in the table in Figure 9) also shows significant reductions. According to the performance setting algorithm of the present invention, a single performance level below the peak is selected on most performance measurement programs to give the maximum execution time (> 88%), and LongRun usually sets the processor to run at full speed. The exception to this general rule is ‘Danse De Cable, workload, where the two lowest performance levels are selected based on the technology ’s performance setting algorithm and oscillate between these two levels. The reason for this oscillating behavior is due to the specific performance level of the Crusoe processor. The performance setting algorithm based on this technology will decide to choose a performance level that is only higher than 300 Mhz—a little bit. Because 34 200422942 and when the performance level prediction fluctuates up and down at 300 Mhz, the level will be set to the closest two performances. Level value. The most noteworthy difference between this technology and the performance of this technology is that when a large amount of processor activity is detected, it jumps very quickly and appears overly cautious. Regarding all workloads, the level level of using LongRun has never been lower than 80%, and the multiple 'Red's Nightmare Small' performance test programs set by this technology have dropped. The algorithm of this technology is more active than LongRun but is used by service agencies. Quick response when descending. Because LongRun did not have information on interactive performance, it was forced to act in a short time frame and the simulation results showed that this made it inefficient. Figure 10 contains two different MPEG meters called 'Legendary' (#Danse de Cable) (# 10B 圊). Each chart shows that the time spent for LongRun and the technology at the four-energy level (300, 400, 500, 600 MHz) is the same, but the playback quality is the same each time. The processor's peak performance time is much longer than the LongRun technology specifies for this condition. The playback of 'Legendary' depicted in Figure 10A, the movie's algorithm based on this technology selects a 500 MHz performance level. The results of the movie ‘Danse de Cable’ show that the algorithm is operational. The processor is at 300 MHz at the two performance levels. The target performance is known as LongRun. When LongRun, the performance level is equal to the processor efficiency. The blowing level is at L 52%. According to the quality display, there is anything about the ratio of the processor efficiency in the conservative 10A graph) and the shadow result graph. Although it can be seen in the table that the energy level is consumed below, the results show that according to Fig. 10B, the switching algorithm between 400 MHz 35 200422942 based on this technology and the fixed algorithm Mhz 〇 First observation. However, it is based on a goal of LongRun level. The first: the program but the execution period of the technique can be requested. In some cases, it is actually lower than the current technique. Due to the difficulty of serving the load, the type of control 120 is only set and selected. By comparing the two movies, the LongRun performance is set at most of the execution time and the peak processor speed 600 is selected. The deep LongRun in the quality of different performance setting strategies continuously and quickly alternates the performance level up and down. The processor performance level of the system controlled by this technology remains close to the heart level. The two graphs (upper row) of 11A 囷 show the performance level of the processor during the execution of a performance test program. result. Figure 11B shows the actual performance level in the circle, and Figure 11C reflects the performance level of a processor that can operate at any performance level (assuming the same maximum performance). Pay attention to the required performance level calculated by the algorithm based on the technology. The lowest performance level that can be achieved on the processor. Considering the simulation results used to compare the two techniques on interactive workloads, it is difficult to create a repetitive interactive performance test program. The interactive work is estimated to be much more difficult than the multimedia performance test program. In order to combine the experimental measurements with a simple simulation technique. Furthermore, the interactive performance test program runs under the local LongRun power management process and is executed in the intelligent energy management program based on the technology, so it only records the performance it may make without actually changing the processor. Level of effectiveness. 36 200422942 Figure 12 shows performance data collected during a simulated run for evaluating interactive workloads. Figure 12A is a graph of the performance level percentage of LongRun technology versus time (in units of 粆) and the result in this case is equivalent to the processor's actual performance level in the measurement. Fig. 12B is a numerical performance level chart, and Fig. 12C is an original performance level time function chart that can be set by the performance setting algorithm of the technology to control the processor. Note the calculations of this technique. In fact, in the control, its performance setting choices will have a different run time impact than the choices made by LongRun. Because of this factor, the time axis in the graphs of 12B and 12C 圊 will be regarded as approximate. In order to avoid the problem of time deviation of the statistics, the passive performance level trajectory of the simulation based on this technology is processed later to evaluate the impact of increased execution time that may be caused by using this technology instead of LongRun. Focus only on interactive events, not the entire performance-level trajectory. The interactive performance setting algorithm of this technology includes the function of finding an execution process that has a direct impact on the user. This technology gives effective readings regardless of the algorithm responsible for control and is therefore used to focus on our measurements. Once the execution scope of an interactive event is independent, both LongRun and the technology calculate the full speed equivalent to complete the work during the event. Since LongRun controls the CPU speed during the measurement period and its operation is faster than when controlled by this technology, the event process corresponding to the results of this technology must be extended. First, this technique calculates the remaining work based on the following formula:

Workpresent technique Remaining = W〇rkLongRun- w〇rkPresent technique 37 200422942 其次，該演算法計算該互動事件的長度所需要被延長的程度一假設本技術的演算法持續於其預測的速度運轉值到其達到緊急臨界冑，並於之後以全速運轉。該統計也相依地調整。發現到使用此技術的結果與我們觀察到於相似的工作負載上（相同的效能測試程式但有些微不同的互動負載）以依據本技術之演算法主動控制該處理器而運轉的結果十分接近。然而’當依據本技術的演算法事實上負責控制時’該效能設定選擇的數量被減少而效能層級會更準確。第13圖顯示使用上述時間偏離校正技術所收集的統計數據。圖中六種圖表均包含兩種堆疊的棚。每個囷表上左手邊的欄與LongRun有關而右手邊的攔與本技術有關。每個攔為堆疊的以代表在該電腦所支援之四種效能層級上的互動事件中所耗去的時間比例。這些效能層級一從最底下往上一為300 Mhz以1〇〇 Mhz為增加量至600 Mhz。即使從一高層級，明顯地依據本技術之演算法於較低的效能層級上耗去較LongRun更多的時間。在某些效能測試程式像是Em acs，幾乎沒有需要快速地執行而於機器停留在其最低可能效能層級時符合其互動截止期限。於該圖譜中的另外一端為Acrobat Reader效能測試程式，其顯示兩種統計方式的行為：該處理器不是以高峰層級就是以最小層級運轉。即使於此效能測試程式許多互動事件可於該處理器之最小效能層級而準時完成❶然而’當涉及演算頁面時， 38 200422942 該處理器之高峰效能層級仍不足以在使用者感覺的感覺臨界值之内於截止期限完成。因此，遇到一夠長的互動事件時，依據本技術之演算法切換該處理器效能層級至高峰值。藉由對照，在執行Konqueror效能測定程式的期間，依據本技術之演算法可利用該處理器的所有四種可用的效能層級。·此*可與LongRun策略加以比較，其使該處理器耗去大半時間於其高峰層級上。綜合來說，以上參考第8至13圖所詳述的模擬結果已顯示實施於該軟體階層體系之不同層級中的兩種效能設定策略於多穣多媒體及互動工作負載上的表現。已發現實施於處理器紉體中的Transmeta LongRun電力管理程式較依據本技術實施於作業系統之核心程式中的演算法採取更多保守的決定。在一組多媒體效能測試程式中依據本技術之演算法達成的結果比使用已知LongRun技術所達成的結果有11 %至3 5 %的平均效能層級降低。由於依據本技術之效能設定演算法較LongRun實施於該軟體堆疊中的更高層，其可基於更豐富的運轉時間資訊組合而做出決定，因而轉化為更高的準確度。雖然已顯示LongRun的韌體方式較實施於核心程式中的一演算法不準確，但並未降低其實用性。LongRun作為未知作業系統的有重要的優勢。已暸解低與高層級實施間的差距可藉由提供一基本效能設定演算法如韌艘中的 LongRun以及為了最佳化處理器效能設定選擇而揭露一介 39 200422942 面至該作業系統而加以彌補。依據本技術之效能測法階層體系提供支援此設計之機制。該堆疊上最低設定策略可實.際地被實施於該處理器的韌體中。雖然本發明的說明性實施例已被詳細地描述並考相關的圖示，應瞭解本發明並不限於那些精確例，而習知技藝人士在不違反由附帶的申請專利範義之本發明的範圍與精神下可於其中實現許多變化整〇【圖式簡單說明】第1圖概要地說明依據本技術之一電源管理系統如實施於一資料處理系統中；第2圖概要地說明依據本技術之效能設定演算法的層體系層級；第3圓概要地說明在一互動事件中設定處理器效能泉略：第4圖概要地說明於處理器上一工作負載的執行以任務A計算利用史窗格；第5圖概要地說明第2圖之三階層體系效能策略堆種實施；第6圖概要地說明依據本技術之一任務追縱計數器第7圖概要地說明一種可依據工作負載特性提供數的固定效能層級的設備；第8圖為一表格，其詳述關於播放各種MPEG視定演算的效能伴隨參的實施圍所定以及調何可被三種階層級的及對一疊的一 600 ；個不同訊的一 40 200422942 ‘plaympeg’視訊播放器的模擬測量結果；第9圖為一表格’其列出在每個工作負載執行期間的處理器效能層級統計數據；第10圓包含播放兩種名為‘Legendary，（第10A圖）與‘Danse de Cable’（第10B圖）之不同的MPEG電影之結果的兩種圖表；第11A、11B與11C圖概要地說明兩種不同的效能設定策略的特性；第12A、12B與12C圖概要地說明於互動工作負載上測試不同的效能設定演算法的模擬結果；第13圖概要地說明使用一時間偏離校正技術所蒐集到的統計數據。【元件代表符號簡單說明】 1 00核心程式 11 0標準核心程式功能性 1 12,132系統呼叫模組 114排程程式 1 1 6習知的電源管理程式1 20智慧型能源管理系統 122策略協調程式 124效能設定控制模組 126事件追蹤 130使用者處理層 134工作管理模組 136特定應用程式資料 140應用程式監控模組 210互動應用程式效能指示器 220特定應用程式效能指示器 230以工作為基礎之處理器利用效能指示器 5 1 0效能指示器策略堆疊 41 200422942 5 12,5 1 6,520,524 命令 5 24如果大於便設定 5 14,51 8,522,526 效能層 530策略事件處理程式 534任務切換事件 540目標效能計算程式 610增量數值暫存器 630硬體控制模組 646時間基礎暫存器 660控制暫存器 730電源供應控制模組 520忽略指7F器 5 3 2重置事件 5 3 6效能改變事件 600計數器 620軟體控制模組 640累加器模組 650,720即時計時器 710 CPU(中央處理單元）Workpresent technique Remaining = W〇rkLongRun- w〇rkPresent technique 37 200422942 Second, the algorithm calculates the extent to which the length of the interactive event needs to be extended-assuming that the algorithm of this technology continues at its predicted speed operation value until it reaches an emergency Critical, and then run at full speed. The statistics are also adjusted independently. It is found that the results of using this technology are very similar to the results we observed on similar workloads (same performance test program but slightly different interactive load) to actively control the processor based on the algorithm of this technology. However, when the algorithm according to the present technology is actually responsible for control, the number of performance setting choices is reduced and the performance level is more accurate. Figure 13 shows the statistics collected using the time offset correction technique described above. The six diagrams in the figure all include two stacked sheds. The left-hand column on each watch is related to LongRun and the right-hand column is related to this technique. Each block is stacked to represent the proportion of time spent in interactive events at the four performance levels supported by the computer. These performance levels range from 300 Mhz from the bottom to the top, with an increase of 100 Mhz to 600 Mhz. Even from a high level, the algorithm obviously based on this technology consumes more time at the lower performance level than LongRun. In some performance testing programs, such as Emacs, there is almost no need to run quickly and meet the deadline for the interaction when the machine stays at its lowest possible performance level. At the other end of the map is the Acrobat Reader performance test program, which shows the behavior of two statistical methods: the processor runs at either the peak level or the minimum level. Even in this performance test program, many interactive events can be completed on time at the minimum performance level of the processor. However, when it comes to calculation pages, 38 200422942 the peak performance level of the processor is still not sufficient at the user's perception threshold. Completed within the deadline. Therefore, when a sufficiently long interactive event is encountered, the algorithm according to the present technology switches the processor performance level to a peak value. By contrast, during the execution of the Konqueror performance measurement program, algorithms based on this technology can utilize all four available levels of performance of the processor. • This * can be compared to the LongRun strategy, which consumes the processor more than half of its peak level. In summary, the simulation results detailed above with reference to Figures 8 to 13 have shown the performance of two performance setting strategies implemented in different levels of the software hierarchy on multi-media and interactive workloads. It has been found that the Transmeta LongRun power management program implemented in the processor body takes more conservative decisions than the algorithms implemented in the core program of the operating system according to this technology. The results achieved by the algorithm based on this technology in a set of multimedia performance test programs are 11% to 35% lower than the average performance level achieved using known LongRun technology. Since the performance setting algorithm based on this technology is higher in the software stack than LongRun, it can make decisions based on a richer set of runtime information, which translates to higher accuracy. Although it has been shown that LongRun's firmware is more inaccurate than an algorithm implemented in the core program, it does not reduce its practicality. LongRun has important advantages as an unknown operating system. It is understood that the gap between low-level and high-level implementations can be bridged by providing a basic performance setting algorithm such as LongRun in tough ships and exposing an operating system to optimize processor performance setting selection. The performance measurement hierarchy based on this technology provides a mechanism to support this design. The minimum setting strategy on the stack can be practically implemented in the processor's firmware. Although the illustrative embodiments of the present invention have been described in detail and taken into consideration the related drawings, it should be understood that the present invention is not limited to those precise examples, and that those skilled in the art will not deviate from the scope of the present invention as defined by the accompanying patent application. Many changes can be realized within the spirit. [Simplified illustration of the figure] Fig. 1 schematically illustrates a power management system according to one of the techniques, such as being implemented in a data processing system; Fig. 2 schematically illustrates the technique according to the present technique. The hierarchy of the performance setting algorithm. The third circle outlines the setting of processor performance in an interactive event. Figure 4 outlines the execution of a workload on the processor. Task A calculates the utilization history pane. Figure 5 outlines the implementation of the three-tier system efficiency strategy in Figure 2; Figure 6 outlines the task tracking counter according to one of the techniques; Figure 7 outlines a method that can provide numbers based on workload characteristics Equipment at a fixed performance level; Figure 8 is a table detailing the implementation of the performance accompanying parameters for playing various MPEG video calculations. Tuning can be performed at three levels and a stack of 600; a number of different signals. 20042004942942 Analog measurement results of the 'plaympeg' video player; Figure 9 is a table 'which lists the execution of each workload Statistics of processor performance levels during the period; Round 10 contains two charts showing the results of playing two different MPEG movies called 'Legendary, (Figure 10A) and' Danse de Cable '(Figure 10B); Figures 11A, 11B, and 11C outline the characteristics of two different performance setting strategies. Figures 12A, 12B, and 12C outline the simulation results of testing different performance setting algorithms on interactive workloads. Figure 13 outlines Explain the statistical data collected using a time deviation correction technique. [Simple description of component representative symbols] 1 00 core program 11 0 standard core program functionality 1 12, 132 system call module 114 scheduling program 1 1 6 conventional power management program 1 20 smart energy management system 122 strategy coordination program 124 performance Configuration control module 126 Event tracking 130 User processing layer 134 Job management module 136 Application-specific data 140 Application monitoring module 210 Interactive application performance indicator 220 Specific application performance indicator 230 Task-based processor Use the performance indicator 5 1 0 Performance indicator strategy stack 41 200422942 5 12,5 1 6,520,524 Command 5 24 If greater than set 5, 14,51 8,522,526 Performance layer 530 Strategy event handler 534 Task switching event 540 Target performance calculation program 610 Increase Value register 630 hardware control module 646 time base register 660 control register 730 power supply control module 520 ignore finger 7F device 5 3 2 reset event 5 3 6 performance change event 600 counter 620 software control Module 640 Accumulator Module 650, 720 Real-time timer 710 CPU (Central Processing Unit)

4242

Claims

422942 Patent application scope " i A method for calculating a target processor performance level of one of the processors from a history of a processor performing a variety of processing tasks in a utilization history, the method includes at least the following steps: "Task work value" refers to the utilization of a processor that executes a specific processing task within a predetermined task time interval; and calculates the target processor performance level based on the task work value. 2. The method as described in item 1 of the scope of patent application, which at least includes calculating a plurality of task work values, which are corresponding to a plurality of previous executions of the specific processing tasks described above, and combining the plurality of task work values to the above The future execution of one of the specific processing tasks calculates the target processor performance level described above. 3. The method according to item 2 of the scope of patent application, wherein the predetermined task time interval is set independently for each of the plurality of processing tasks. 4. The method according to item 3 of the scope of patent application, wherein the predetermined task time interval is set independently for each execution of the specific processing task. 5. The method according to item 4 of the scope of patent application, wherein the predetermined task time interval extends from the beginning of the first schedule of one of the specific processing tasks to the beginning of one of the subsequent schedules of the specific task, The predetermined task time interval is associated with the first schedule. 6. The method according to item 2 of the scope of patent application, wherein the previously performed multiple task tasks corresponding to the special 43 200422942 scheduled processing task # are combined to calculate an exponential decay completion value for the specific processing task. . 7. The method according to item 1 of the scope of the patent application, which includes detecting at least a duration of idle time within a predetermined task interval, and the task working value and the duration of the idle time are used to calculate a task execution deadline for the upper task. . 8. The method as described in claim 7 of the patent scope, wherein the deadline is calculated for each of a plurality of each of the above-mentioned specific processing tasks, and the multiple task execution deadlines are calculated as an exponential decay The average task execution deadline. 9. The method as described in item 7 of the scope of the patent application, wherein the fixed processing is calculated based on the above-mentioned exponential decay average work completion value of the processing task & numerical agricultural reduction average task execution deadline. The above-mentioned target processor performance level of the task. 10. The method as described in item 11 of the scope of patent application, further comprising] during the processing period of the above-mentioned specific processing task, detecting at least a line period 'the at least one suspended execution period representative The period during which the specific processing task is switched to a further and non-tasking period before the completion of the above: and the calculation of the above-mentioned task work value for this specific processing task, because at least one period of suspended execution period includes processor utilization. The method described in item 10 of the scope of patent application, at least its value is subtracted and the average value is included in the above and based on the above characteristics The combination of execution before the execution of the scheduled task corresponds to this and the above-mentioned special steps: a suspension of the execution of the first task and the same processing. This sets an upper bound within the predetermined task interval described in 44 200422942 above, Therefore, if the specific processing task continues to be executed without detecting the suspended execution period within a duration period greater than or equal to the upper bound, the target processor performance level of the task is automatically recalculated. 1 2. As requested The scope of the patent is the widest ... Each of the tasks has a y, and if the corresponding task has been executed but has been executed, then the above flag value is displayed. 13: The method described in the second item of the scope, where When the value of any 2Γ is used to calculate the above-mentioned target processor performance of the above-mentioned future execution of the above-mentioned task, the processing performance of each task is described by a corresponding time interval on this U-test. To be normalized.% -Carrying- computer programs for controlling __Dianmao's DengTeng program products, which run from a processor to a variety of One of the tasks is to use the history to calculate = one of the processor's target processor performance levels. The above electric cat program calculates at least the task work value calculation function, which is operable to calculate-task work value. The task work value indicates a predetermined The utilization of a processor that executes a specific processing task within a task time interval; and a target processor performance calculation code that is operable to calculate the above-mentioned target processor performance level based on the above-mentioned task work value. One of the products described above, wherein the above-mentioned task work value calculation calculator calculates multiple task work values, which corresponds to the previous execution of 45 200422942 on each of the above specific processing tasks, and combines the above multiple task work values to calculate The above-mentioned target processor performance level for one of the above-mentioned specific processing tasks to be performed in the future. 16. The computer program product according to item 15 of the scope of patent application, wherein the predetermined task time interval is independently set for each of the plurality of processing tasks. 17. A computer program product as described in Item 16 of the Patent Scope, where

The above-mentioned predetermined task time interval is set independently for each execution of the specific processing task. 18. The computer program product according to item 17 of the scope of patent application, wherein the predetermined task time interval is extended from the start of the first schedule of one of the specific processing tasks to the start of the subsequent schedule of one of the specific tasks. For a period of time, the predetermined task time interval is associated with the first schedule.

As one of the computer program products described in item 15 of the scope of patent application, Yin combines the previously performed values of the multiple tasks corresponding to the specific processing task to calculate an exponential decay average work completion for one of the specific processing tasks. value. 20. The computer program product as described in item 14 of the scope of the patent application, which includes a detection code that can be used to measure a continuous value of the idle time within the predetermined task time interval, and is based on the task work value and the above. The idle time duration value calculates a task execution deadline for the specific processing task. 46 200422942 2 1. As one of the computer program products covered by the patent application scope of 20 $, in which the above-mentioned task execution deadline is limited to +0 ^ for each of a plurality of previous executions of the specific processing task. ^ Calculate and combine multiple task execution deadlines to calculate an exponential Fei Nong minus average task execution deadline. 22. A computer program product as described in the scope of application for patent No. 20, wherein the average exponential decay is based on the exponential decay, and the average reduction is based on the above-mentioned Shanghai Yihe sum formula corresponding to the specific processing task. The task execution deadline value is used to calculate the above-mentioned target processor performance level for the Xi & + i '疋 processing task. 23 · As one of the computer program products described in item No. 14 of the scope of application for Gule 14, it further includes: Suspended execution period and code measurement, which is capable of detecting at least one suspended execution during the processing period of the specific processing task mentioned above. Period, the at least one suspended execution period represents a period that has elapsed before switching from the specific processing task to a further and different processing task before completing the first task: and the above-mentioned target working value calculation code, which can be operated on the The processing task calculates the above-mentioned task work value, and thus it includes processor utilization during the at least one suspended execution period. 24. The computer program product as described in item 23 of the scope of patent application, wherein an upper bound is set for the predetermined task time interval, so if the specific processing task is continuously performed without detecting in a period greater than or equal to the upper bound When the above suspended execution period is detected, the above-mentioned eyepiece processor performance level of the task 47 200422942 is automatically recalculated. 25. The computer program product described in item 14 of the declared patent scope stores a flag value for each task. If the corresponding task has not been completed, the above flag value is displayed. 26. When the computer program described in 15 items is combined with the task task value to calculate the target processor performance level above the task, each task task value of one of the specific processing tasks is normalized by a corresponding time interval. Into. 27. A device for controlling a computer to calculate a target level of the processor from a utilization history of a processor in the execution of the task. The device at least contains task task calculation logic, which is operable at Calculated value 'The task work value indicates the processor utilization of a specific processing task during a predetermined task time, and the target processor performance calculation logic, which is operable to calculate the target processor performance level according to the value. 28. The device as described in item 27 of the scope of patent application, wherein the value calculation logic calculates a plurality of task work values, which are respectively previously executed on the respective processing tasks, and combine the above-mentioned values to calculate the above specific processing tasks. One does not describe the target processor performance level. 29. The device as described in item 28 of the scope of patent application, wherein the product 'wherein the execution is started but the product' wherein the scheduled tasks to be performed in the future are previously executed multiple processing tasks are performed by a processor within a task interval The task worker should perform the above-mentioned scheduled 48 422942 service time interval for the specific plurality of task workers to be set for each of the plurality of processing tasks. 30. The device as described in item 29 of the scope of patent application, wherein the predetermined task time interval is independently set for each execution of the specific processing task. 3 U The device according to item 30 of the scope of patent application, wherein the predetermined task time interval is a period extending from the start of the first schedule of one of the specific processing tasks to the start of the subsequent schedule of one of the specific tasks. The scheduled task time interval is related to the above-mentioned first schedule · 32. The device as described in item 28 of the scope of the patent application, wherein the previously-executed values of the plurality of tasks corresponding to the specific processing task are combined to calculate the above-mentioned One of the specific processing tasks exponentially decays the average work completion value. • The device as described in item 28 of the scope of patent application, which at least contains the test logic, which is operable to detect a duration of an intervening time within the above-mentioned predetermined task interval, and according to the above-mentioned task operating value and the above-mentioned interim The time duration value is used to calculate a task execution deadline for the specific processing task. ， Π " 丨 Waiting the above tasks • Deadline is calculated for a plurality of previous executives of the above specific processing task, and the multiple task execution deadlines are combined to calculate an exponential decay average task execution deadline value. 3 5 · As for the equipment listed in Item 33 of the scope of the patent application, its _ basis is corresponding to: 49 The above-mentioned processing cooking service, the above-mentioned index exponentially decaying average work completion value, and the above-mentioned special-type reducing average task execution deadline value Calculate the above-mentioned target processor performance level on 36 · Rushen: processing tasks. The equipment described in item 28 of the M range further includes: The processing period of the debt measurement service during the execution of the kiosk: 'It can be operated in the above-mentioned specific processing, and the execution is suspended. If at least one suspended execution period is measured, the at least-temporary = Represents the above-mentioned specific processing period from the above-mentioned specific processing period before completing the above-mentioned task: and, to go further-step by step and different processing tasks-time and place filial piety * working value calculation logic, which is operable to calculate the above tasks The task's work value, so it includes processor utilization during the at least one suspended execution period. ~ The device described in item 36 of the patent scope, which sets an upper bound for the above-mentioned pre-task time interval, so if the specific processing task is continuously performed without taking a test within a period greater than or equal to the upper bound When the above-mentioned pause # β, .. dispute period, the above-mentioned target processor performance level of the task is automatically recalculated. 38. The device as described in the scope of application patent (28 items), wherein a flag is stored for each task. If the corresponding task has been started but not completed, the above flag value is displayed. The device as described in claim 28 of the patent scope, wherein when the above task value is combined to calculate the target processor performance level of the future execution of the above task, a corresponding predetermined task interval 50 200422942 is used. Normalize the work value of each task previously performed by each of the specific processing tasks described above. 51