TW200422942A - Performance level setting of a data processing system - Google Patents

Performance level setting of a data processing system Download PDF

Info

Publication number
TW200422942A
TW200422942A TW092131595A TW92131595A TW200422942A TW 200422942 A TW200422942 A TW 200422942A TW 092131595 A TW092131595 A TW 092131595A TW 92131595 A TW92131595 A TW 92131595A TW 200422942 A TW200422942 A TW 200422942A
Authority
TW
Taiwan
Prior art keywords
task
value
execution
performance
mentioned
Prior art date
Application number
TW092131595A
Other languages
Chinese (zh)
Inventor
Krisztian Flautner
Trevor Nigel Mudge
Original Assignee
Advanced Risc Mach Ltd
Univ Michigan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GB0226395A external-priority patent/GB0226395D0/en
Priority claimed from GB0228546A external-priority patent/GB0228546D0/en
Application filed by Advanced Risc Mach Ltd, Univ Michigan filed Critical Advanced Risc Mach Ltd
Publication of TW200422942A publication Critical patent/TW200422942A/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/324Power saving characterised by the action undertaken by lowering clock frequency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3296Power saving characterised by the action undertaken by lowering the supply or operating voltage
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A target processor performance level is calculated from a utilization history of a processor in performance of a plurality of processing tasks. The method comprises calculating a task work value indicating processor utilization in performing a given processing task within a predetermined task time-interval and calculating a target processor performance level in dependence upon the task work value.

Description

200422942 玖’ 【發 明係 之資 【先 電源 理器 這些 應的 器效 可接 軟體 後間 上述 算法 次作 能源 的利 窗格 級的 ,發明說明: 明所屬之技術領域】 本發明係有關於資料處理系統領域。p # 史特言之,本發 有關於可於處理器中設定給多個不同虛w l a « u疼理器效能層級 料處理系統所設置的效能層級。 前技術】 處理器設計的一重要目標為提供更好的效能並且降低 消耗。某些現今的處理器提供設定其為數種不同的處 效能層級之一種能力,視當時應用程式的要求而定。 處理器利用降低該處理器的時脈頻率之事實,以及對 操作電壓可能以二次方地減少能源消耗。然而,處理 能的降低僅在使用者沒什麼感覺或沒有效能影響時才 受。因此重要的是處理器效能層級的降低不應該造成 錯失其執行期限。在期限前完成一特定處理任務,然 置比起慢點執行該任務在能源上是比較沒有效率的, 慢點執行該任務為確保其更準確地達到符合期限。 已知的效能層級設定技巧包括所謂的以間隔為主的演 ’其主要概念描述於Weiser等人於1994年11月第一 業系統&十與實施座談會論文集中發表的“降低cpu 的排程”。此一已知的以間隔為主的演算法監控處理器 用歷史並藉由計算一固定且簡短(10-50毫秒)之時間 =的閒置時間與忙碌時間之比例而導出一適當效能層 才曰不。典型上計算最近時間間隔中的整體處理器利 3 200422942 用’若其超過一臨界值則增加處理器的效能層級;反之若 該時間間隔大多包含閒置時間,則降低該效能層級。雖然 此已知方法對於規則的工作負載還應付得不錯,但對於非 規則(即非週期性的)工作負載以及互動應用程式便顯得左 支右紬。其他已知的技術使用整體處理器利用的加權平均 做為未來利用的引導,然而已顯示此一替代性技術並未產 生可大大增進處理器利用並降低閒置時間的一時脈速度 (參見Grunwald等人於2000年1〇月第四次作業系統設計 與實施研討會論文集中所發表之“動態時脈排程策略,,。 因此對於廣大範圍的工作負載例如包括非規則與互動 性工作負載,需要一種可更精確地預測一適當的處理器效 能層級之一效能層級設定技術。 本發明提供一種從執行多種處理任務的處理器利用史 中計算一目標處理器效能層級的方法,該方法包含: 計算一任務工作值,其係於一對應的任務時間間隔内 指出在執行一預定處理任務的處理器利用。 依據上述任務工作值計算上述目標處理器效能層級。 本發明認知單獨的處理任務(或處理任務群組)常具有 可識別的利用期於任務層級處,然而當估算一適當執行效 能時,其對於所有累積之被觀察的任務而言可能為模糊 的。藉由聚焦於該任務層級的處理器利用,效能設定策略 可更佳地調和處理任務的多樣性與關於其所需的效能層 級。本發明允許對每個處理任務直接地預測效能層級,而 非藉由整體工作負載所指定的一任意數量而間接地調高或 4 200422942 調低處理器效能層級。 單一過去時間窗格中的任務工作值可被用於預測該處 理任務一適當的未來效能層級。然而較佳的實施例結合同 一處理任務中對應至數個過去執行的任務工作值以預測1 未來效此層級。此具有提供關於特定任務之靜態上 μ 又平確 的效能層級預測的優點。 須知該任務時間間隔對於每個處理任務的每個執行可 設為一固定值。然而,透過獨立對每個處理任務設定任務 時間間隔,可使效能預測系統更加適合於不同的工作負栽 種類。尤其是,鑑於一短時間週期可適用於一互動性處理 任務,而一相對較長的時間週期似乎更適合用於一非互動 性任務。如果選擇一不適合的短任務時間間隔,可造成效 能層級之間的震盪。讓任務時間間隔可獨立地對每個處理 任務設定會增加一穩定效能預測將被選定的可能性。此 外’透過對一特定處理任務的每個執行獨立地設定時間週 期,該效能預測可適合於考慮在執行時構成工作負荷之部 分的其他任務。這些其他共同存在的任務由於任務先佔之 緣故很有可能會影響一特定任務的總執行時間。 任務時間間隔可被彈性地定義,假如該間隔包括於其 範圍内的某處執行上述特定處理任務。然而,較佳的實施 例定義遠任務時間間隔使其起始於特定處理任務的一第— 排程開始並於該特定處理任務之後續執行前結束。此優點 在於容易實施且該任務時間間隔與該特定處理任務執行的 頻率相關。此使得該技巧更適用於非週期性的處理任務。 200422942 將瞭解關於一特定處理任務之多個先前處理的任務工 作值能以許多不同的方式加以結合來預測該任務的一未來 =能層級’例如可計算任務卫作值的—平均或—加權平 :。將任務工作值結合以計算指數式衰減的工作完成值會 疋更佳的,因為此將使最近計算的任務工作值 大的影響。 负更 雖然可計算相關任務群組的執行截止期π,但依據任 :基:計算一任務的執行截止期限是較佳的,目為其將使 理器效能可更精細地被調整。任務工作值以及任務時間 間隔内偵測到的閒置時間較佳地用於計算該任務執行截止 期限:由於該執行截止期限可被正規化以考慮到該任務時 間:隔内的普遍執行效能’ &相較於使用完成該任務之一 先前執行所用的真實時間,此可提供一更精確的估算。 分別記錄該任務工作值的指數平均以及一特定處理任 務的執行截止期限的優點在於效能層級預測的重要性可依 據該任務工作值被測量的間隔長度而t。此可防止與最長 :務時間間隔有關的任務工作值支配預測因而補償廣泛改 變的視窗大小。 將瞭解該任務工作值可僅包括在對應的任務時間窗格 二:定處理任務的處理器利用1而,較佳實施例该 P㈣定處理任耗其他不料處理任務先佔並將該先 相μ::處理盗利用包含於工作完成值中。此優點在於將 -特—ΓΓ以結合並相應地估算適當的效能層、級。預期到 的後續執行中,相同的其他任務也很有可能先 6 200422942 以及效能層級預 佔”亥特疋任務。因而該任務執行截止期限 測應將這些先佔任務考量在内。 雖然任務時間窗格可允許依照執行時間以及特定任 的執行頻率而改變大小,但設定該任務時間窗格的上限界 是較佳的。此優點在於可防止未被先佔的長時間處理在二 不適當的效能層級上繼續執行。當任務時間窗格達到上限 界時便開始重新計算效能層級。 又 、在較佳實施例中,實施效能層級設定方法於作業系統 核心程式軟體中H點在於該軟體可依據依較豐富的執 行時間資訊組合做出選擇,因而產生較佳的準確度。 L發明内容】 ’其係提供一種依據處 用史來計算一目標處理 含以下步驟: 在 預疋任務時間間隔 用:以及 目標處理器效能層級。 ,其係提供一載著一電 產品,以依據一處理器 而計算該處理器之一目 少包含: 計算一任務工作值,該 間隔内執行一特定處理 自本發明的更進一步態樣觀之 理器在執行多個處理任務時的一利 器效能層級的方法,該方法至少包 計算一任務工作值,其係指出 内執行一特定處理任務的處理器利 依據上述任務工作值計算上述 自本發明的更進一步態樣觀之 月b程式用於控制一電腦之電腦程式 於執行多個處理任務的一利用史中 標處理器效能層級,該電腦程式至 可操作的任務工作值計算碼以 任務工作值指出在一預定任務時間 200422942 任務的 值計算 上 以下詳 第 料處理 式功能 組11 2 智慧蜇 包含/ 一事件 叫模組 1 3 6 〇 言! 供應資 該 分的核 程式為 主機上 取特權 叫的一 處理器利用;以及 搡作的目標處理器效能計算碼以依據上述任務工作 上述目標處理器效能層級。 述與本發明之其他物件、特性以及優點將明顯地從 細的說明實施例描述配合附隨的圖式中得知。 方式】 1圖概要地說明電源管理系統如何可被實施於一資 系統中。該資料處理系統包含一種具有標準核心程 怏模組的核心程式1 00,該模組包括一系統呼叫模 排程程式1 1 4以及一習知電源管理程式1 1 6。一 能源管理程式系統1 20被實施於該核心程式中並且 策略協調程式122、一效能設定控制模組1 24以及 追蹤模組126。一使用者處理層丨30包含一系統呼 132、一任務管理模組134以及特定應用程式資料 〖使用者處理層級130透過一應用程式監控模組140 訊至該核心程式100。 核心程式1 00為提供基本服務給作業系統之其他部 心程式。該核心程式可與外殼程式相對比,該外殼 作業系統的最外面部分而與使用者指令互動。在其 執行該核心程式碼對實體資源像是記憶體有完全存 。該系統的其他部分或一應用程式透過名為系統呼 組程式介面請求該核心程式的服務。使用者處理層 8 200422942 112與132。該排 程程式 其順序200422942 玖 '[Invention of the Department of Information [The power of the power processor can be connected to the software and the above algorithm can be used as a profitable pane of energy, the description of the invention: The technical field to which the invention belongs] The present invention is related to information Processing system area. p # In particular, this issue is related to the performance level that can be set in the processor to multiple different virtual w l a «u processor performance levels. Previous technology] An important goal of processor design is to provide better performance and reduce consumption. Some modern processors offer the ability to set it to several different levels of processing performance, depending on the requirements of the application at the time. The processor takes advantage of the fact that the clock frequency of the processor is reduced, and the operating voltage may be reduced to a quadratic power consumption. However, the reduction in processing energy is only affected if the user does not feel it or has no effect on performance. It is therefore important that a reduction in the level of processor performance should not result in missed execution deadlines. Complete a specific processing task before the deadline, but it is less energy efficient to perform the task more slowly, and to perform the task slower to ensure that it meets the deadline more accurately. Known performance level setting techniques include the so-called interval-based performance. Its main concepts are described in Weiser et al., "Reducing the CPU Rank Cheng. " This known interval-based algorithm monitors the processor's history and derives an appropriate performance layer by calculating a fixed and short (10-50 milliseconds) time = ratio of idle time to busy time. . Typically, the overall processor benefit in the most recent time interval is calculated. If it exceeds a critical value, the processor's performance level is increased; otherwise, if the time interval mostly includes idle time, the performance level is decreased. Although this known method works well for regular workloads, it appears to be side-by-side for non-regular (that is, non-periodic) workloads and interactive applications. Other known techniques use the weighted average of overall processor utilization as a guide for future utilization, however this alternative technique has been shown to not produce a clock speed that can significantly increase processor utilization and reduce idle time (see Grunwald et al. The “Dynamic Clock Scheduling Strategy” published in the Proceedings of the Fourth Operating System Design and Implementation Symposium in October 2000. Therefore, for a wide range of workloads, including irregular and interactive workloads, a It is possible to more accurately predict one of the appropriate processor performance levels. Performance level setting technology. The present invention provides a method for calculating a target processor performance level from the history of processor utilization of multiple processing tasks. The method includes: Task work value, which indicates the utilization of a processor that executes a predetermined processing task within a corresponding task time interval. The target processor performance level is calculated according to the task work value. The present invention recognizes a separate processing task (or processing task) Groups) often have identifiable utilization periods at the task level, However, when estimating a proper execution performance, it may be ambiguous for all accumulated observed tasks. By focusing on processor utilization at that task level, performance setting strategies can better tune the diversity of processing tasks With regard to its required performance level, the present invention allows the performance level to be predicted directly for each processing task, rather than indirectly increasing or lowering the processor performance level by an arbitrary number specified by the overall workload 4 200422942 The task work value in a single past time pane can be used to predict an appropriate future performance level for the processing task. However, the preferred embodiment combines the task work values corresponding to several past execution tasks in the same processing task to predict 1 The future effect is at this level. This has the advantage of providing static μ and accurate performance level predictions for specific tasks. Note that the task interval can be set to a fixed value for each execution of each processing task. However, through independent Setting the task interval for each processing task can make the performance prediction system more suitable for different tasks The type of load. In particular, given that a short period of time can be suitable for an interactive processing task, and a relatively long period of time seems more suitable for a non-interactive task. If you choose an unsuitable short task interval, Can cause turbulence between performance levels. Allowing task intervals to be set independently for each processing task increases the likelihood that a stable performance prediction will be selected. In addition, 'set independently for each execution of a specific processing task Time period, the performance prediction may be suitable for considering other tasks that constitute part of the workload during execution. These other co-existing tasks are likely to affect the total execution time of a particular task due to task preemption. Task interval It can be flexibly defined if the interval is included somewhere within its scope to perform the specific processing task described above. However, the preferred embodiment defines a long task time interval so that it starts at the first of a specific processing task—the beginning of a schedule It ends before the subsequent execution of the specific processing task. This has the advantage that it is easy to implement and that the task time interval is related to the frequency of execution of that particular processing task. This makes the technique more suitable for aperiodic processing tasks. 200422942 It will be understood that a number of previously processed task work values for a particular processing task can be combined in many different ways to predict a future for the task = energy level ', for example, an -average or -weighted level that can calculate the task guard value. :. Combining task work values to calculate exponentially decayed work completion values is better because it will have a large impact on the recently calculated task work values. Negative change Although the execution deadline π of the relevant task group can be calculated, it is better to calculate the execution deadline of a task based on Ren: Base, so that the performance of the processor can be adjusted more finely. The task work value and the idle time detected during the task time interval are preferably used to calculate the task execution deadline: since the execution deadline can be normalized to take into account the task time: universal execution performance within the interval '& amp This provides a more accurate estimate than using the real time it took to complete one of the previous executions of the task. The advantage of separately recording the exponential average of the task's work value and the execution deadline of a particular processing task is that the importance of performance-level predictions can depend on the length of the interval at which the task's work value is measured. This prevents the task work value associated with the longest service interval from dominating the predictions and thus compensates for the widely changing window size. It will be understood that the task work value may only be included in the corresponding task time pane 2: the processor that determines the processing task uses 1, and in the preferred embodiment, the predetermined processing consumes other unexpected processing tasks first and the first μ :: Handling of misappropriation is included in the work completion value. This has the advantage of combining-special-ΓΓ and estimating the appropriate performance level and level accordingly. In the expected subsequent execution, the same other tasks are also likely to be preempted by 6 200422942 and the "Hyatt" task at the efficiency level. Therefore, the task execution deadline should consider these preempted tasks. Although the task time window The grid can change the size according to the execution time and the specific execution frequency, but it is better to set the upper bound of the task time pane. This advantage is that it can prevent the unused long-term processing from inappropriate performance. Continue execution at the level. When the task time pane reaches the upper bound, the performance level is recalculated. Also, in a preferred embodiment, the performance level setting method is implemented in the operating system core program software. The point H is that the software can A richer set of execution time information makes a choice, which results in better accuracy. L SUMMARY OF THE INVENTION 'It provides a method for calculating a target based on the application history, including the following steps: at the time interval of the preliminary task: and Target processor performance level. It provides an electronic product based on a processor. One of the items of the processor includes: calculating a task work value, performing a specific process within the interval, a method of a sharp tool efficiency level when the processor of the present invention further performs multiple processing tasks, the method At least one task work value is calculated, which indicates that a processor that executes a specific processing task can calculate the above-mentioned month b program from a further aspect of the present invention based on the task work value to control a computer program of a computer in A utilization history of executing multiple processing tasks is awarded to the processor performance level. The computer program is operable to calculate the task work value. The task work value is specified at a predetermined task time. 200422942 The task value calculation is detailed below. The group 11 2 smart card contains / an event called module 136 words! The core program that supplies the funds is used by a processor on the host to obtain a privileged call; and the target processor performance calculation code is based on the above The task works at the target processor performance level described above. Other objects, features, and advantages described herein will be apparent. Obviously know from the detailed description of the embodiment and the accompanying drawings. Mode] 1 diagram outlines how the power management system can be implemented in a capital system. The data processing system includes a standard core program. The core program 100 of the module includes a system call module scheduling program 1 1 4 and a conventional power management program 1 1 6. An energy management program system 1 20 is implemented in the core program and strategy coordination Program 122, a performance setting control module 1 24, and a tracking module 126. A user processing layer 丨 30 includes a system call 132, a task management module 134, and application-specific data [user processing level 130 through an application The program monitoring module 140 sends a signal to the core program 100. The core program 100 is the other central program that provides basic services to the operating system. The core program can be compared with the shell program, which interacts with user commands in the outermost part of the operating system. The core code in which it runs has full storage of physical resources like memory. Other parts of the system or an application program request the services of the core program through a system call program interface. User Processing Layer 8 200422942 112 and 132. The scheduler its order

以及該核心程式均具有系統呼叫模組 間中給每個處理使用處理器。該習知電源管理程式ι Μ藉 由依據處理器利用層級而在一省電休眠模式以及一標準清 醒模式之間切換以管理供給電壓。 該智慧型能源管理程式120負責計算並設定處理器效 能目標。該智慧型能源管理程式12〇使中央處理器(cpu) 的操作電壓以及處理器的時脈降低而不會使應用程式軟髅 錯過處理(即任務)的截止期限,而非僅依賴休眠模式達到 省電目的。當該CPU以全速運轉時許多處理任務將會在其 截止期限内完成,而該處理器將會閒置直到下一個任務開 始舉例來說,產生資料的一種任務的一任務載止期限為 該產生的資料被其他任務所需要的時間點0 —互動任務的 截止期限會是使用者的感覺臨界值(50·丨00毫秒)❶以全效 能運轉而後間置相較於慢點完成該任務以致更準確地符合 截止期限而言是較沒有能源效率的。當處理器的頻率下降 時,其電壓也可降低以達到節省能源的目的。對於實施於 互補金氧半導體(CMOS)科技的處理器而言,一特定工作負 載所使用的能源正比於電壓的平方。該策略協調程式管理 數個效能設定演算法,每個演算法適合不同的運轉時間情 況。對於一特定條件之最適合的效能設定演算法於運轉時 間中選擇。效能設定控制模組1 24接收每個效能設定演算 法的結果並藉由按優先順序處理這些結果重複計算一目# 200422942 處理器效能。事件追蹤模組i 26監控位於核心程式U 0和 使用者處理層級130中的事件,並將收集到的資訊傳送給 效能設定控制模組1 24以及策略協調程式1 22。 在使用者處理層級中,監控處理工作係透過··系統呼 叫事件1 3 2、包含任務切換、任務建立與任務離開事件的 處理任務事件134以及特定應用程式資料。智慧型能源管 理程式1 20被實施為一組核心程式模組並將掛勾嵌補於標 準核心程式功能性模組中並且供作控制處理器的速度與電 壓之用。該智慧型能源管理程式1 2〇實施的方法使其相對 地獨立於該核心程式1 〇〇中的其他模組。此優點在於使效 能設定控制機制較不會干擾主作業系統。實施於核心程式 中同時也意味使用者應用程式不需被調整。因此,該智慧 型能源管理程式1 2 0與系統呼叫模組11 2、排程程式114 以及核心程式的傳統能源管理程式丨丨6共同存在,雖然在 這些子系統中其可能需要某些掛勾(h〇ok)。該智慧槊能源 管理程式120用於透過檢查執行任務間的通訊型態自作業 系統核心程式導出任務截止期限與任務分類資訊(例如該 任務是否與一互動應用程式結合)。其同時也用於監控哪個 系統呼叫被每個任務所存取以及資料如何於核心程式中的 通訊架構間流動。 第2圖概要地說明依據本技術之效能設定演算法的三 階層鱧系。須注意到在一特定處理器上的頻率/電壓設定選 項一般為間斷的而非連續的。因此該目標處理器效能層級 必須從固定的預定值組合中選擇。雖然計算一目標處理器 10 200422942 效能層級的已知技術包含使用一單一效能設定演算法, 本計數利用多種演算法,其各自擁有適合不同運轉時間 況的不同特性。對一特定處理情況而言,最適合的演算 於運轉時選擇。該策略協調程式模組122協調該效能設 演算法並藉由連接該標準核心程式110中的掛勾而提供 享的功能給多個效能設定演算法。該多個效能設定演算 的結果被整理並分析以對一目標處理器效能層級決定一 體的估計。將各種演算法組織為一判斷階層體系(或演算 堆疊),其中由較高(較為支配的)階層體系的演算法輸出 效能層級指示器有權優先於由較低(較不為支配的)階層 系之演算法輸出的效能層級指示器。第2圖的示範實施 具有三階層體系。於該階層體系的最高層具有一互動應 程式效能指示器,於該階層體系的中間層具有一特定應 程式效能指示器220,而於該階層體系的最低層具有一 於任務之處理器利用效能指示器230。 該互動應用程式效能指示器210的計算是由基 Flautner等人於200 1年7月在國際行動運算及網路會議 文集中發表之“關於動態電壓調整之自動效能設定,,所描 的一演算法來執行。該互動應用程式效能層級預測演算 意在藉由找出直接影響使用者經驗的執行期間並確保這 事件不會有不當的延遲而完成以提供良好互動效能保證 該演算法使用一相對簡單的技術以自動分離互動事件。 技術依賴監控來自為GUI控制程式之X伺服器的通訊以 追蹤被觸發為一結果之任務執行。 但 情 法 定 共 法 整 法 的 體 例 用 用 基 於 論 述 法 些 〇 此 及 200422942 一互動事件(典型地包含多個任務)的展開由使用者開 始並由- GIH事件表明,例如按下一滑鼠按鈕或鍵盤按 鍵。因此,該GUI控制程式(在此情況中為χ伺服器)發送 一訊息至負責處理此事件的任務。藉由監控適當的系統呼 叫(各種讀取、寫入與選擇的版本),該智慧型能源管理程 式120可自動地偵測一互動事件的展開。當該事件開始 時,該GIH控制程式以及接收該訊息的任務被標示為正^ 於一互動事件中。如果一互動事件的任務與未標示的任務 通訊,則尚未被標示的任務也會被標示。在此處理期間, 該智慧型能源管理程式1 20追蹤已被先佔之標示任務的數 量。當被先佔的任務為零時,代表所有的任務均已完成, 故該事件結束。 第3圖說明在一互動事件期間設定處理器效能層級的 策略。一互動事件的期間已知會隨著數種重要順序而改變 (從萬分之一秒至大約一秒鐘)。然而一開始轉換的延遲或 “略過臨界值’’被設定為5毫秒以過濾掉最短的互動事件因 而減少請求的效能層級轉換的數量。低於1毫秒的互動事 件典型地為按至視窗之重複按鍵或移動滑鼠越過螢幕並重 繪小矩形的結果。設定該略過臨界值為5毫秒是因為可使 簡短事件自效能指示器預測中過濾掉而不會不利地影響最 糟的情況。 如果該互動事件期間超過該略過臨界值,則相關的效 能層級數值被包含於整體互動效能層級預測中。對所有過 去互動事件計算之效能參數的一加權指數式衰減平均提供 12 200422942 次一互動事件的效能參數。需注意依據本技術該互動應用 程式效能設定演算法對於系統中一互動事件的必要效能層 級使用單一整體預測。[此與上述提到之論文中描述的技 術不同,其依照開啟該事件的任務使用每個任務不同的效 能層級預測。] 為了限制一錯誤效能層級預測於使用者經驗的最壞影 響情形,如果該互動事件並未在達到一所謂的“緊急臨界 值”之前完成,則指定最上階層的效能層級預測為最大效能 層級因這是一最上層預測,所以該系統將會強制執行。 在問題互動事件的結尾,該互動演算法計算該事件的正確 效能設定應該為何並將此修正值納入指數式衰減平均而將 影響未來的預測〃執行一額外最佳化因而若在一互動事件 過程中達到緊急臨界值,便重新調整移動平均因而將修正 的效能層級以一較高的權值(使用k=1而非k==3)納入該指 數式衰減平均之中。對所有較該略過臨界值為久的事件來 說,計算該效能預測。 互動事件“截止期限用於對每個定義的互動事件取得 一效能層級指不器。該截止期限為一任務必須被完成以避 免不良地影響效能的最後時間。依據與特定互動事件有關 之人類感覺臨界值而計算互動事件的效能層級指示器。舉 例來說,已知每秒20至30畫面已夠快讓使用者覺得一系 列的影像為一連續的流動因而可設定一互動影像顯示事件 的感覺臨界值為50毫秒。雖然實際使用的感覺臨界值依使 用者以及進行中的任務而定,50毫秒的固定值仍被認為適 13 200422942 合於階層體系的互動演算法。以下的方程式乃是用於計算 短於感覺臨界值之事件的效能需求。 ^ Work ise P , -—~ βη Perception Threshold 其中全速等效工作肠Au是從該互動事件開始時測量。 階層體系中間層的特定程式效能指示器220藉由整理 一種暸解效能層級設定功能之應用程式的資訊輸出而獲 得。過去已採用這些應用程 型能源管理程式120與其特 可提供新的API項目給作業 關效能需求的通訊。 效能指示器230是藉由 得,其中該演算法是根據最 的使用。此演算法為每個獨 據任務基礎調整一任務上以 時間週期的大小。以預期為 執行的所有任務種類,而最 考量互動任務。儘管該互動 證高水準互動效能的效能層 高層,基於預期的演算法不 窗格。由於在適當時可選擇 層艘系的最下層使用較長利 能。如果該利用史窗格過短 個固定值之間快速地震盈。 式以送給(透過系統呼叫)智慧 疋效月b需求有關的特定資訊。 系統以及應用程式以促進此有 實作一基於演算法的預測而獲 近利用史來估計該處理器未來 立任務導出一使用估計並且依 計算利用史(即利用史窗格)之 基礎的演算法考量處理器正在 上層的互動應用程式演算法僅 應用程式演算法計算一用於保 級指示器且位於階層體系的最 需被限制於保守的短暫利用史 一更強勢的省電策略,在此階 用史窗格的可能性將會提升效 將可能造成效能層級預測於兩 典型上有必要於使用單一統一 14 200422942 、 冼(而非一演算法的階層體系組合)時設置一短暫的利 用史1^格以對所有運轉情況設定效能層級。為了可處理間 2 14的密集處理器互動事件,此統一運算法必須讓利用史 窗格為短暫的。 每個二層堆疊的效能設定演算法使用一種在一特定時 間間隔中處理完成工作的方式。在此實施例中,所使用的 70成工作方法為該時間間隔中執行的全速(處理器)等效工 作。此全速等效工作乃是依據下列公式而估計: worb^YjiPi i=l 其中i為在該特定時間間隔實施之[種不同處理器效 能層級的其中之一 ;ti為效能層級i中消耗的非閒置時間 秒數;而Pi為處理器效能層級i以高峰(全速)處理器效能 層級的分數表示。此方程式於一時間標誌計數器(任務計數 器)即時測量的系統中有效。該完成工作於替代的實施例中 可使用計數率隨著目前處理器頻率而改變的週期計數器而 有不同的計算。此外,該方程式隱含一工作負載的運轉時 間與處理器頻率成反比的假設。該假設提供完成工作一合 理的估計。然而,主要由於在效能調整的過程申匯流排速 度與處理器速度的比值是非線性的,因此該假設並非總是 準確。在替代實施例中可精細地調整該完成工作計算以考 量這些因素。 第4圖概要地說明在處理器上執行一工作負載以及對 15 200422942 -任務A計算利用史窗格。第4圏的橫袖代表時間。任務 A首先於時間S開始執行,隨後開始數個依據任務的資料 結構❶有四種資料結構對應至下列四部分f訊:⑴任務計 數器的現在狀態;(ii)目前時間(即時);(iH)_閒置時間計 數器的目冑狀態;以及(iv)設定一執行位元至邏輯層級], 指出該任務已開始執行,務計數器,計數器以及 閒置時間計數器用於計算與任務八有關之處理器利用並随 後計算任務Λ的效能需求。於pE時任務a尚未執行完 畢但被其他另一任務B先佔。當任務排程程式ιΐ4判定2 他的任務較執行中任務具有較高的優先性時便會出現先 佔。當任務A被先佔時該執行位元維持在邏輯層級‘丨,以指 出該任務仍有任務待完成β於RE時,任務A繼續執行0 其被重新排程並持續執行直到於Tc時完成並隨後自願放 棄處理時間。在完成時任務A可起始一系統呼叫使處理器 處理其他任務。當任務A於TC完成時該執行位元被重設 至邏輯層級‘ 0 ’。 在TC時之後有一間置期間隨後執行一進階任務,之 後又有一閒置期間。於RS時,任務A開始執行第二次。 於RS時,與任務A有關之執行位元的‘〇,狀態指出資訊存 在以開始計算任務A的效能需求,因而處理器目標效能層 級也可依此設定以供即將到來的任務A重新執行。一特定 任務的利用史窗格被定義為該特定任務第一次執行的開始 直到該任務下一次執行的開始之期間並應於相關窗格中包 16 200422942 含至少一該任務的先佔事件(在此情形中為任務B於玟£處 先佔任務A)。因此,在此情形中任務的利用史窗格被定義 為時間S至時間RS的期間。在此窗格中任務a的目標效 能層級因而如以下計算:And the core program has a system call module for each process using a processor. The conventional power management program ι manages the supply voltage by switching between a power-saving sleep mode and a standard awake mode according to the processor utilization level. The smart energy management program 120 is responsible for calculating and setting processor performance targets. The intelligent energy management program 12 reduces the operating voltage of the central processing unit (CPU) and the clock of the processor without causing the application software to miss the deadline for processing (ie, tasks), instead of relying solely on the sleep mode to reach Power saving purpose. When the CPU is running at full speed, many processing tasks will be completed within its deadline, and the processor will be idle until the next task starts. For example, a task deadline for a task that generates data is the one generated The time point at which the data is required by other tasks 0 — The deadline of the interactive task will be the user's critical threshold (50 · 丨 00 milliseconds). It runs at full efficiency and then completes the task more slowly than the slower one to make it more accurate. It is less energy efficient to meet the deadline. When the frequency of the processor drops, its voltage can be reduced to save energy. For processors implemented in complementary metal-oxide-semiconductor (CMOS) technology, the energy used for a particular workload is proportional to the square of the voltage. The strategy coordination program manages several performance setting algorithms, each of which is suitable for different running time situations. The most suitable performance setting algorithm for a specific condition is selected in the running time. The performance setting control module 1 24 receives the results of each performance setting algorithm and repeats the calculation by processing these results in priority order # 200422942 processor performance. The event tracking module i 26 monitors events located in the core program U 0 and the user processing level 130 and transmits the collected information to the performance setting control module 1 24 and the policy coordination program 1 22. In the user processing level, the monitoring and processing work is through the system call event 1 2 2. Processing task event 134, including task switching, task creation, and task leaving events, and application-specific data. The smart energy management program 120 is implemented as a set of core program modules and hooks are embedded in the standard core program functional modules and are used to control the speed and voltage of the processor. The method implemented by the smart energy management program 120 makes it relatively independent of other modules in the core program 1000. This has the advantage that the performance setting control mechanism is less likely to interfere with the main operating system. Implementation in the core program also means that the user application does not need to be adjusted. Therefore, the smart energy management program 1 2 0 coexists with the system call module 11 2, the scheduling program 114, and the core program's traditional energy management program 丨 6. Although these subsystems may require some hooks (H〇ok). The smart energy management program 120 is used to derive task deadlines and task classification information (such as whether the task is integrated with an interactive application program) from the core program of the operating system by checking the communication type between execution tasks. It is also used to monitor which system calls are accessed by each task and how data flows between the communication frameworks in the core program. Fig. 2 schematically illustrates the three-tier relationship of the performance setting algorithm according to the present technology. It should be noted that the frequency / voltage setting options on a particular processor are generally intermittent rather than continuous. Therefore, the target processor performance level must be selected from a fixed combination of predetermined values. Although known techniques for calculating the performance level of a target processor 10 200422942 include the use of a single performance setting algorithm, the count uses multiple algorithms, each of which has different characteristics suitable for different operating time conditions. For a particular processing situation, the most suitable calculation is selected during operation. The policy coordination program module 122 coordinates the performance setting algorithm and provides shared functions to multiple performance setting algorithms by connecting the hooks in the standard core program 110. The results of the multiple performance setting calculations are collated and analyzed to determine an overall estimate of a target processor performance level. Organize algorithms into a hierarchy of judgments (or stacks of algorithms), where the algorithm's output performance level indicator from the higher (more dominant) hierarchy has the right to take precedence over the lower (less dominant) hierarchy The performance level indicator of the algorithm output. The model implementation in Figure 2 has a three-tier system. There is an interactive application performance indicator at the highest level of the hierarchy, a specific application performance indicator 220 at the middle level of the hierarchy, and a processor utilization performance at the lowest level of the hierarchy. Indicator 230. The calculation of the interactive application performance indicator 210 is based on the calculations described in "Automatic Performance Settings for Dynamic Voltage Adjustment", published in the International Mobile Computing and Web Conference Proceedings by Flatner et al. In July 2001. The interactive application performance level prediction algorithm is intended to provide good interactive performance by identifying the execution period that directly affects the user experience and ensuring that the event is not unduly delayed to ensure that the algorithm uses a relative Simple technology to automatically separate interactive events. Technology relies on monitoring communication from the X server that is a GUI control program to track the execution of tasks that are triggered as a result. However, the method of the common law integration method is based on the discussion method. This and 200422942 The development of an interactive event (typically containing multiple tasks) is initiated by the user and indicated by a -GIH event, such as pressing a mouse button or a keyboard key. Therefore, the GUI control program (in this case, χ server) sends a message to the task responsible for handling this event. By monitoring the appropriate system call (Various versions of reading, writing, and selecting), the smart energy management program 120 can automatically detect the development of an interactive event. When the event starts, the GIH control program and the task to receive the message are marked Is positive in an interactive event. If the task of an interactive event communicates with an unmarked task, the unmarked task will also be marked. During this processing, the smart energy management program 1 20 tracking has been first Occupied indicates the number of tasks. When the preempted task is zero, it means that all tasks have been completed, so the event ends. Figure 3 illustrates the strategy for setting the processor performance level during an interactive event. An interactive event The period is known to change with several important sequences (from tenths of a second to about a second). However, the delay or "skip threshold" at the beginning of the conversion is set to 5 milliseconds to filter out the shortest interactions Events thus reduce the number of requested performance level transitions. Interactive events under 1 millisecond are typically the result of pressing a button repeatedly to the window or moving the mouse across the screen and redrawing a small rectangle. The skip threshold is set to 5 milliseconds because short events can be filtered from the performance indicator forecast without adversely affecting the worst case. If the skip threshold is exceeded during the interaction event, the relevant performance level value is included in the overall interaction performance level prediction. A weighted exponential decay of the performance parameters calculated for all past interaction events provides an average of 12 200422942 performance parameters for one interaction event. It should be noted that according to this technology, the interactive application performance setting algorithm uses a single overall prediction for the necessary performance level of an interactive event in the system. [This is different from the technique described in the paper mentioned above, which uses a different performance level prediction for each task depending on the task that opened the event. ] In order to limit the worst-case scenario where a wrong performance level is predicted from user experience, if the interaction event is not completed before reaching a so-called "emergency threshold", the performance level prediction at the top level is designated as the maximum performance level factor This is a top-level forecast, so the system will enforce it. At the end of the problem interactive event, the interactive algorithm calculates what the correct performance setting for the event should be and incorporates this correction into the exponential decay average that will affect future predictions. An additional optimization is performed and thus if an interactive event process When the critical threshold is reached in mid-range, the moving average is readjusted so that the modified performance level is incorporated into the exponential decay average with a higher weight (using k = 1 instead of k == 3). The performance prediction is calculated for all events older than the skip threshold. The "interaction event deadline" is used to obtain a performance level indicator for each defined interaction event. The deadline is the last time a task must be completed to avoid adversely affecting performance. It is based on the human perception associated with a particular interaction event A performance level indicator for calculating interactive events based on critical values. For example, it is known that 20 to 30 frames per second is fast enough for users to think that a series of images is a continuous flow, so an interactive image can be set to display the event feeling The critical value is 50 milliseconds. Although the actual sensory critical value depends on the user and the task in progress, a fixed value of 50 milliseconds is still considered to be suitable for the interactive algorithm of the hierarchical system. The following equation is used Calculate performance requirements for events shorter than the sensory threshold. ^ Work ise P,-~~ βη Perception Threshold where the full-speed equivalent work intestine Au is measured from the beginning of the interactive event. Program-specific performance indicators in the middle layer of the hierarchy 220 is obtained by sorting out the information output of an application that understands the performance level setting function. These application-based energy management programs 120 have been adopted to provide new API items to communicate performance requirements. The performance indicator 230 is obtained by using the algorithm based on the most used. This algorithm is for each Each task is adjusted based on the time period of a task. All types of tasks performed with expectations are the most important consideration for interactive tasks. Although the interactive certificate has a high level of effectiveness at the high-level, the algorithm based on expectations does not window Since the lowest layer of the ship system can be selected to use a longer power when appropriate. If the history pane is too short and the value between the fixed and fast earthquakes is profitable, it is given to (through the system call) wisdom to effect the month. bRequires specific information related to the system and applications to facilitate the implementation of an algorithm-based forecast and recent utilization history to estimate the processor's future tasks. Derive a usage estimate and calculate the utilization history (that is, use the history window). Grid) based algorithm to consider the processor's interactive application algorithm at the upper level. Only the application algorithm is used to calculate one for security. Level indicator and located in the hierarchical system need to be constrained by a conservative short-term history of use-a stronger power-saving strategy, the possibility of using the history pane at this stage will improve efficiency and may cause performance levels to be predicted on the two typical It is necessary to set a short utilization history 1 ^ grid when using a single unified 14 200422942, 冼 (not a combination of hierarchical systems of an algorithm) to set the performance level for all operating conditions. In order to deal with the intensive processors of 2 to 14 For interactive events, this unified algorithm must make the history pane short. The performance setting algorithm for each two-tier stack uses a way to process work at a specific time interval. In this embodiment, the A 70% working method is the full-speed (processor) equivalent work performed during this time interval. This full-speed equivalent work is estimated based on the following formula: worb ^ YjiPi i = l where i is one of the different processor performance levels implemented at that particular time interval; ti is the non-consumption consumed in performance level i Idle time in seconds; and Pi is the processor performance level i is expressed as a peak (full speed) processor performance level score. This equation is valid in systems where a time stamp counter (task counter) measures in real time. This completion is done in an alternative embodiment and can be calculated differently using a cycle counter whose count rate changes with the current processor frequency. In addition, the equation implies the assumption that a workload's operating time is inversely proportional to the processor frequency. This assumption provides a reasonable estimate of the work done. However, this assumption is not always accurate, mainly because the ratio of the bus speed to the processor speed during the performance adjustment process is non-linear. The completion calculation may be fine-tuned in alternative embodiments to take these factors into account. Figure 4 outlines the execution of a workload on a processor and the use of 15 200422942-Task A calculation history pane. The horizontal sleeve at 4th represents time. Task A starts execution at time S first, and then starts several data structures based on the task. There are four data structures corresponding to the following four parts: f) the current status of the task counter; (ii) the current time (instant); (iH ) _ The current status of the idle time counter; and (iv) set an execution bit to the logical level], indicating that the task has started execution, the service counter, counter and idle time counter are used to calculate the processor utilization related to task eight And then calculate the efficiency requirement of task Λ. At the time of pE, task a has not been completed but is occupied by another task B. Preemption occurs when the task scheduler 4 determines that his task has a higher priority than the running task. When task A is preempted, the execution bit is maintained at the logic level '丨 to indicate that the task still has tasks to be completed β at RE, task A continues to execute 0, it is rescheduled and continues to execute until completion at Tc And then voluntarily gave up processing time. Upon completion, task A can initiate a system call for the processor to perform other tasks. The execution bit is reset to logic level '0' when task A is completed at TC. After the TC hours, there is an idle period followed by an advanced task, followed by an idle period. At RS, task A starts to execute a second time. At the time of RS, the status of the execution bit '0' related to task A indicates that information exists to start calculating the performance requirements of task A. Therefore, the target performance level of the processor can be set accordingly for the re-execution of the upcoming task A. The utilization history pane of a specific task is defined as the period from the start of the first execution of the specific task to the start of the next execution of the task and should be included in the relevant pane. 16 200422942 contains at least one preemption event for the task ( In this case, task B preempts task A at 玟 £). Therefore, the utilization history pane of the task in this case is defined as a period from time S to time RS. The target performance level for task a in this pane is thus calculated as follows:

WorkEstNew = (Λ X WorkEstold + Workjse) !{k +1)WorkEstNew = (Λ X WorkEstold + Workjse)! (K +1)

Deadline New =(kx Deadlineold + {Work]se + Idle)) !{k +1)Deadline New = (kx Deadlineold + (Work) se + Idle))! (K +1)

其中k為一權值,Idle為第4囷中時間s至時間Rs 之間的間置時間秒數,而任務A的截止期限被定義為 (+ 。在此特別的範例中,對於像是第4圓任務B 之先佔任務執行偵測引導演算法判定每個任務的利用史窗 袼。處理在次一未被先佔之任務A排程之前執行的任務往 往與任務A的執行高度相關。TC處至RS處之間的間置時 間為“遲緩期’,,處理器可以一下降的效能層級於該期間執 行。然而’工作C由於縮小了可用的遲緩期而被納入效能 層級計算的考量。 上述方程式中的职^與各代表一指數式 衰減平均。此指數式衰減平均讓更新的估計比較舊的估計 對平均有更大的影響力。權值k為關於指數式衰減平均的 一參數。已知k = 3可有效地作用且此微小數值指出每個估 計為一好的估計。藉由個別追蹤工作預測程式以及截止期 限預測程式,效能預測的重要性依據利用史窗格的長度而 定。此確保與較大窗格大小有關的效能估計不會主導效能 預測。此演算法的效能層級指示器…epecM.vej 由 17 200422942 兩指 Perfper 立的效 負载的 務所估 基礎於 自適合 上每5 與已知 出一使 格大小 均,但 i計算 史窗格 依 新的非 器很長 的情況 時這樣 用等待 上層臨 則其工 的應用 數式衰減平均之比而得出: sepecrnvu-hw" 心。對每個工作計算獨 能層級數值。依據本技術的策略,在一從屬於工作 時間間隔50亳秒至i 5〇毫秒間重新計算對一特定任 計的工作%。然而由於%是以任務 一任務上被計算,因此每個被執行的任務利用其各 之以任務為基礎的%值,該%4W值事實 到1 〇亳秒便被修正(反映任務切換事件)。此演算法 以間隔為主的演算法不同,前者對每個任務分別導 用估計同時藉由任務基礎調整一任務上的利用史窗 。雖然已知的統一效能設定算法使用指數式衰減平 其在所有執行任務的固定使用窗格(10至50毫秒) 一整體平均,而非在一變動之以任務為基礎的利用 上來計算一以任務為基礎的平均。 據本技術之以預期為基礎的演算法,有必要避免一 互動CPU極限任務在未被先佔之情形下使用處理 段期間《由於僅能於一旦該任務被先佔至少一次 下疋義”亥利用史窗格,故對該任務採用該效能層級 t引質的等待時間。為了避免不想要的效能採 時間&汁算工作估計t未被先佔6¾ i§程中設置-界值特別地,如果一任務持續1 00毫秒未被先佔, 作估计由預設值來重新計算。考慮到確保一更迫切 程式歷史1¾格透過階層體系層21〇提供給互動應用 18 200422942 程式,故選擇1 00毫秒的數值。同時也考慮到可能會被1 00 毫秒窗格臨界值影響的僅有使用者應用程式種類為密集計 算的批次作業例如編譯,其可能執行數秒或甚至數分鐘。 在這樣的情形中一額外的1 00毫秒(〇. i秒)執行時間可能是 重要的明智效能。 第5圖概要地說明第2圖之三階層體系效能策略堆疊 的一種實施。該實施包含一效能指示器策略堆疊510以及 一策略事件處理程式530,兩者均輸出資訊至一目標效能 計算程式540。該目標效能計算程式540用於整理來自四 種效能設定演算法的結果:高層互動演算法、中層以應用 程式為基礎的演算法以及兩種不同的低層演算法。該四種 演算法可以同時被執行。該目標效能計算程式540從該策 略堆疊5 1 0所產生之多種效能指示器(在此例中為四種)導 出一單一整體目標效能層級。該策略堆疊510連同該策略 事件處理程式530以及該目標效能計算程式540提供一彈 性架構給多種效能設定策略,因而該堆疊每層的策略演算 法可依據使用者的要求而被替換或交換。因此該效能策略 堆叠提供一種可納入使用者自訂之效能設定策略的平台以 供實驗。 多種效能設定演算法中的各個均專門對付不同特定種 類之運轉時間事件。然而,由於在第5圖之示範實施例中 有四種輸出不同效能指示器的不同演算法,該軟體必須決 定以四種效能指不器的何者為優先以設定整體目標數值。 19 200422942 此外也必須決定可有效計算之一整體目標效能層級的時 間,假設每個效能設定演算法可獨立地執行並於不同的時 間產生輸出。同時也必須考慮在多個效能設定演算法均以 相同的處理事件作為決定之基礎的情況下如何結合該效能 指示器,否則可能發生假的目標更新。 為了處理這些議題以圖中所示之三階層體系組織該策 略堆疊510演算法,其中較高層級之策略可優先於導自較 低(叫不具支配性的)層的效能層級請求。因此,階層演算 法可優先於階層演算法,而後者可優先於階層〇的兩種演 算法。注意每個階層體系層級本身可包含多個替代性的效 能設定演算法。不同的效能設定演算法並不知道其於階層 體系中的位置並可基於系統中的任何事件決定其效能。當 一特定演算法請求一效能層級,其送出一指令伴隨其想要 的效能層級至策略堆疊 510。該策略堆疊的每個演算法包 含一命令512、516、5 20、5 24且儲存一對應的效能層級指 示器514、518、522與526。用於階層1演算法的忽略指 令5 20向目標效能計算程式440指出在計算整體效能目標 時應忽略相關的效能層級指示器。已被指定給階層〇之兩 演算法的設定指令512與516使目標效能計算輕式5 40不 管任何來自於階層醴系較低層的效能層級請求而設定對應 的效能層級。然而該設定指令 無法優先於來自較高階層體系層級的效能層級請求。 在此實施例中一階層〇演算法已請求將該效能設定至峰值 20 200422942 水平的5 5%,而另一階層〇演算法以要求將 峰值水平的25%。該目標效能計算程式使用 結合此具有相同優先性的請求,在此例t較 數值為該階層〇效能指示器。於階層2,如 指令連同一 8 0 %效能指示器而被指定。該“ 定”指令提供該目標效能計算程式54〇必須 層級為8 0 °/。,假設此大於任何來自較低階層 能指示器。在此例中該階層〇效能指示器為 1效能指示器將被忽略以使整體目標將真正 值效能的80%。 由於每個演算法之最近計算的效能層級 略堆疊5 1 〇儲存於記憶餿中,該目標效能計 於任何時間計算一新的整體目標數值而不需 &疋演算法。當該堆疊上的其中之一演算法 能層級請求,該目標效能計算程式自底層向 能資料結構的内容以計算一更新的整體目標 此於第5圖的範例中,於層級〇設定整體預 層級1仍維持在5 5 %而於層級2改變該整體 雖然每個效能設定演算法可被觸發(由系統 件)以於任何時候存在一組讓所有效能設定 回應之共同事件時計算一新的效能層級。該 程,530將監控這些事件並為其加上旗標, 供策略事件資訊給目標效能計算程式54〇。 該效能設定至 一操作程式以 佳地設定55% 果大於便設定 如果大於便設 設定整體效能 體系層級之效 5 5 %而該階層 地被設定為峰 指示器由該策 算程式540可 調用每個效能 計算一新的效 上評估指令效 效能層級。因 測為55%,於 預測為80%。 中的一處理事 演算法將傾向 策略事件處理 該處理程式提 此特別事件分 21 200422942 類包含重置事件532、任務切換事件534以及效能改變事 件536。該效能改變事件536為一通知,其警告每個效能 設定演算法注意處理器的現在效能層級,即使其通常不會 更改該策略堆疊510上的效能請求。關於此特別分類的策 略事件532、534及536,並不會每次其中一演算法送出一 更新的效能層級指示器便計算整體效能層級。反之,該效 能層級計算被整合,因而對於每個事件通知而言僅於所有 有關的效能設定演算法的所有事件處理程式已被調用之後 才計算一次** 可提供一應用程式介面(API)給裝置驅動程式或裝置 本身,該介面使一個別裝置將任何操作條件上的重大改變 通知該策略堆疊5 1 0及/或個別效能設定演算法。這使得該 效能設定演算法觸發目標效能層級的重新計算因而促使快 速地採取該操作條件之改變。舉例來說,當一密集處理器 CPU極限任務開始時,該裝置可送出一通知至該策略堆憂 510〇此〆通知為選擇性的而該效能設定演算法於接收時可 不需要對其回應。 第6圖概要地說明依據本技術之一工作追蹤計數器 600。該工作追縱计數器600包含:一增量數值暫存器“ο, 其具有〆軟體控制模組620與一硬體控制模組630; 一累 加器模組640,其具有一工作計數值暫存器與一時間計數 值暫存器’一時間基礎暫存器646 ; —即時計時器65〇以 及一控制暫存器660。此示範實施例的工作追蹤計數器可 22 200422942 與已知的時間標誌計數器以及CPU週期計數器不同’本實 施例之計數器增量數值在接近或位於計數值被增量時與處 理器實際執行的工作成正比。該增量數值暫存器610包含 一完成工作計算器,其估計在每個計數器週期該處理器完 成的工作。該完成工作估計透過該軟體控制模組620及/ 或透過硬體工作模組6 3 0而取得。該軟體控制模組實施一 種將增量數值與現在處理器速度相關聯的簡單完成工作計 算。如果該處理器以高峰效能的7 0 %運轉則該增量數值將 為0.7,而若該處理器以高峰效能的40%運轉則該增量數 值將為0.4。當該效能控制模組620偵測到該處理器於計 數器週期為間置的則設定該增量數值為0 °在替代的工作 追縱計數器實施例中使用一更精密的軟體演算法以計算一 精確的完成工作估計。 第1表列出測量數據,其指出當考量一效能層級於兩 種不同的處理器速度間轉換(在此例中為高速至低速)時關 於一 cpu極限迴圈以及一 MPEG視訊工作負載之一預期運 轉時期間以及一實際運轉時期間之間的百分比差異。該結 果乃是基於在明確處理器效能層級-300、400及5 OOMhz(如 表中最左方攔位所示)之轉換後運轉。第1表的最上列列 出將轉變至最左方欄位所列出之對應處理器速度的起始效 能層級。於CPU極限迴圈,預測與實際測量無法與雜訊相 分辨,然而於MPEG工作負載,在每個l〇〇Mhz處理器頻 率階段大約有6%-7%的不正確損失。在這些工作負载的最 23 200422942 大不準確率看起來是低於20%(19·4%),對於僅有數個固定 效能層級的系統而言認為是可接受的。然而當一系統中可 選擇的最小至最大處理器效能之可用範圍增加而每個效能 層級階段的範圍減少時,似乎將需要一更準確的工作估計 程式。 轉換後 CPU極限迴 圈 MPEG 視訊工作 負載 速度 400Mhz 5 0 OMhz 600Mhz 400Mhz 500Mhz 600Mhz 300Mhz -0.3% -0.4 % -0.3% 7.1 % 1 3.5 % 19.4% 400Mhz -0.1% 0.0 % 6.9 % 1 3.3 % 500Mhz 0.1 % 6.8 % 第1 表 替代示範實施例之更精密的演算法使用更準確的完成 工作估計技術,其包含監控指示特性資料(透過技術器追蹤 像是記憶體存取等重大事件)與估計的及實際的工作負載 降低比率’而非假設完成工作與處理器速度成正比。進一 步的替代實施例使用快取命中率以及記憶體系統效能指示 器以精確元成工作估計。更進一步的替代示範實施例使用 軟體監控執行一程式設計應用所使用之處理時間比率相對 於執行奇景應用程式任務所使用之處理時間的比率。 該硬艘控制模組63〇即使於該處理器在兩固定效能層 級間切換中的轉換期間仍可估計完成工作。每個處理器效 心轉換可有大約2〇微秒的一暫停,於該期間該處理器不會 24 200422942 送出任何指示。此暫停是由於需要時間以將鎖住相位的迴 圈重新同步至新的目標處理器頻率。此外,在改變該處理 器頻率之前,為了新的目標頻率須穩定電壓至一適當數 值。因此有最多一秒的一轉變時間,於該期間可假設該處 理器以舊的目標頻率運轉但卻以新的目標層級消耗能源 (因為已設定電壓至新的目標層級該頻率可透過中間頻 率階段而躍升數個層級以影 頻率動態改變的轉變期間可 量數值暫存器而考量該軟體 範實施例同時使用硬體及軟 成工作,替代的示範實施例 之一以估計完成工作。 累加器模組640定時自 數值並將其加入工作計數值 該工作計數值暫存器在每個 值。該計時器時間記號為即 訊號。為測量一預定時間間 該累加益模組6 4 〇中的工作 預定時間間隔的開始而另一 的差異提供完成工作於預定 該即時計時器也控制暫 增加的速度。此時間計數值 的時間基礎運作但用於測量 響效能層級改變。在此處理器 操作該硬艎控制模組以更新增 未察覺的動態改變。雖然此示 體控制模組6 2 0、6 3 0以計算完 可以僅使用此兩種模組的其中 增量數值暫存器610讀取增量 暫存器中儲存的一累加總和。 計時器時間記號增加工作計數 時计時器650所導出的一時間 隔之中的工作計數值,儲存於 計數值被讀取兩次,一次於該 -人為其結尾。這兩種數值之間 時間間隔中的一指示。 存器644中儲存之時間計數值 暫存器與該工作計^:值以相同 消耗的時間而非完成的工作。 25 200422942 同時具有一時間計數器以及一完成工作計數器形成效能 定演算法。提供時間基礎暫存器646的目的在於多平台 容性以及轉換為秒。其用於指定兩計數器642、644之時 基礎(頻率)因而時間可為準確且一致的,換言之儲存於 時間計數值暫存器的累積數值提供一種耗去時間毫秒的 測。該控制暫存器模,組660包含兩控制暫存器,每個計 器各用其中之一。一計數器可透過適當的控制暫存器而 動、停止或重置。 第7圖概要地說明一種可依據工作負載特性提供數 不同的固定效能層級的設備該設備包含一 CPU 710、一 時計時器720、一電源供應控制模組730以及第6圖中 工作追蹤計數器之增量數值暫存器610。該電源供應控 模組7 3 0判斷該C P U目刖被設定以何種固定的效能層級 行並為即時計時器720選擇一適當的時脈。該電源供應 制模組730將目前處理器頻率上的資訊輸入至增量數值 存器610。因此該增量的值與處理器頻率成正比,其依 提供該處理器完成之可用工作的估計。 該策略堆疊5 1 0的許多效能測定演算法使用該處理 於一特定時間間隔(窗格)的利用史以估計該處理器未來 適當目標速度。任何效能設定策略的主要目標為藉由使 理器頻率與電壓層級降低一適當目標效能層級而最大化 處理器於執行開始至任務截止期限之過程中的忙碌時間 為了實際地預測目標效能層級,該智慧型能源管理 設 相 間 該 預 數 啟 種 即 的 制 執 控 暫 次 器 的 處 該 〇 程 26 200422942 式1 20提供一提取(abstraction)以追蹤該處理迄於一特定 時間間隔中完成的實際工作。此完成工作提取使得效能改 變以及間置時間可被納入考量而不管各平台間可能有所變 動之特定硬體計數器的實施。依據本計數,為了取得一時 間間隔中的一工作測量估計,每個效能設定演算法被分配 一“工作架構’’資料結構。設定每個演算法以於時間間隔開 始時呼叫一“工作開始功能,,並於該特定時間間隔結束時呼 叫一“工作停止功能。在該完成工作測量期間,該工作結 構的内容被自動地更新以指定由該處理器之個別效能層級 所分配的閒置時間比例與使用處理器時間比例。儲存於工 作結構中的資訊隨後用於計算全速度等效工作數值 (’該數值隨後用於目標效能層級預測。此完成工 作提取功能實施於該智慧型能源管理程式1 2 0的軟體中並 提供連至該智慧型能源管理程式1 2〇之一便利介面給效能 層級預測演算法開發者。該完成工作提取也簡化本技術之 效能設定系統的埠連接為不同的硬體架構。 替代的硬體平台間的一重大差異為該平台上測量時間 的方式°特別地,某些架構透過時間標誌計數器提供一低 經常性的週期計數方法,而其他架構僅提供外部可程式化 的時間岔斷給使用者。然而即使提供時間標誌計數器也並 非必然地測量相同事物。舉例來說,第一種硬體平台同時 包含Intel [RTM]以及ARM [RTM]處理器。在這些處理器 中該計數器計算CPU週期因而計數率與該處理器的速度 27 200422942 有關而該計數器於處理器進入休眠模式時停止計數 種硬體平台包含Crusoe [RTM]處理器,其實施一時 計數器一致地計算該處理器於高峰速率的週期並持 該高峰速率的計數,即使該處理器處於休眠模式中 成工作提取幫助本目標效能設計技術實施於此兩種 體平台上。 在此實施例中計算的工作估計並未考 定工作以高峰效能的一半執行並非必然會耗去於處 速運轉時的兩倍時間才會執行完成的事實。此違反 結果的一種原因在於即使該處理器核心程式速度減 憶體系統卻非如此。因此,核心程式與記憶鱧的效 加對記憶體較為有利。 執行模擬以估計本效能設定技術對照一已知技 別地’該已知技術為内建於Transrneta Crusoe CPU, 4長時間運轉(LongRun),電源管理程式。Transmeta CPU將‘LongRun’電源管理程式内建於處理器韌 LongRun與其他已知的電力管理技術不同,其避免 業系統以使電力管理生效的需要。L〇ngRun使用處 歷史使用以導引時脈選擇:若為高度使用便增加處 度’而於低度使用時降低效能。不像其他實施於更 統處理器上’該電力管理策略可相對容易地被 Crusoe處理器上,因為該處理器已具有一隱藏的軟 行動態的二進位轉譯與最佳化。該模擬的目標在於 。第二 間標諸 續增加 。該完 替代硬 慮一特 理器全 直覺之 慢,記 能比增 術。特 t7的一 Crusoe 體中。 調整作 理器的 理器速 多的傳 實施於 體層執 建立一 28 200422942 種如LongRun的策略實施於軟體階層體系的如此低層如何 有效地執行。本技術與LongRun —同在同一處理器上執行。 於Sony Vaio PCG-C1VN筆記型電腦上執行此模擬, 該筆記型電腦使用 Transmeta Crusoe處理器並於數個固 定效能層級300Mhz至600Mhz之間以lOOMhz之效能層級 階段運轉。此模擬使用一種具有Linux 2.4.4 acl8核心程 式之改進版本的Mandrake 7.2作業系統。用於比較之估計 的工作負載如下:Plaympeg SDL MPEG播放器程式庫、用 於演算 PDF檔案的 Acrobat Reader、用於文字編輯的 Emaes、用於閱讀新聞的NetScape郵件與新聞4.7、用於網 頁澳I覽的Konqueror 1_9.8以及作為一 3D遊戲的Xwelltris 1·〇.〇。用於互動外殼程式命令的效能測試程式為一使用者 於大約 3 0分鐘範圍期間中執行雜項外殼程式操作的記 錄。為了避免該Crusoe處理器之動態轉譯引擎可能產生的 變化性,多數效能測試程式至少執行兩次以讓該動態轉譯 快取記憶體做好準備,而除了最後執行的以外,所有產生 的模擬資料均被忽略。 依據本發明之效能設定演算法的設計將不會阻礙其主 機平台控制計時器的方式。為了此模擬的目的,本技術提 供一次毫秒解析度定時器,而不會改變該 Linux内建之 l〇ms解析度計時器工作的方式。此目標藉由背負一計時器 分配常式(其檢查計時器事件)至該核心程式常執行的部分 如排程程式與系統呼叫上而達成。 29 200422942 由於依據本技術之效能設定演算法被設計為具有至核 心程式的掛勾以使其可岔斷某些系統呼叫以找出互動事件 且其於每個任務切換均被調用,因此可直接地附加一些指 示至這些掛勾以管理計時器分配。每個掛勾藉由實施一時 間標諸&十數器讀取、一種與次一計時器事件時間標誌、之比 較以及於成功時實施一分支至該計時器分配常式而被擴 充。在實際操作中發現此策略建立一具有次毫秒準確度的 計時器。 下面的第2表詳述附屬於該模擬中的計時器統計。最 遭情況的計時器解析度被該排程程式中的1 〇毫秒(看似與 第2表不一致)時間單位所限制。然而,由於依據本計數之 效能設定演算法專注於測量之情形通常發生於接近時間觸 發器之處’因此達成的解析度被視為是足夠的。已證明該 系統的軟镀計時器於該處理器處於休眠模式時停止工作是 有利的,因為此意味著該計時器岔斷並未改變執行中作業 系統與應用程式之休眠特性。所使用的計時器具有高解析 度與低額外負擔。 這些計時器的優點促進一同時具有主動模式與被動模 式之實施的發展。主動模式中控制依據本技術的效能設定 演算法。而被動模式中該内建的LongRun電力管理程式負 責效能,雖然本技術之智慧型能源管理程式作為執行與效 能改變的觀察者》 第2表 30 200422942 存取至一時間標誌計數器 所需 30至40週期 計時器檢査的平均間隔 __〜〇 · 1氅秒 __〜1毫秒 計時器準確度 — 平均計時器檢査舆分配持 續期間(包括可能執行一事件處 理程式) 1 〇〇至15〇週期 監控由LongRun產生的效能改變類似於該計時器分配 常式而達成。依據本技術之智慧型能源管理程式12〇透過 一特定機器暫存器定時地讀取該處理器的效能層級並將結 果與一先前數值相比較。如果兩數值不同,則其變化會兮己 錄至一緩衝器中。依據本技術之智慧型能源管理程式包含 一追蹤機制,其保留一核心程式緩衝器中重大事件的記 錄。此記錄包含源自不同策略的效能層級請求、任務先佔、 任務IDs(識別記號)以及該處理器的效能層級。在執行該模 擬時可比較LongRun以及在相同執行運轉期間中依據本技 術之效能設定演算法:LongRun控制效能設定而依據本計 數之智慧型能源管理程式120可用於輸出在相同工作負載 上其於控制中而可能做的決定。此模擬策略用於客觀地評 估已知的LongRun技術與本技術之間的互動效能測試程式 之不可重複執行之間的差異。 為了評估使用該測量與效能設定技術的額外負擔,依 據本技術之效能設定演算法配有標示器,其持續追蹤於運 轉時間該效能設定演算法碼中消耗的時間。雖然依據本技 術於一 Pentium II上的運轉時間額外負擔為大約0.1 %至 31 200422942 0 · 5%,但於Transmeta Crusoe處理器上的額外負擔為1% 至4%。於虚擬機器中的進一步測量例如‘VMWare’與4使用 者模式linux,(UML)確認依據本技術之效能設定演算法的 額外負擔於虛擬機器中可能遠大於在傳統處理器架構上。 然而此額外負擔可藉由演算法最佳化而降低。 MPEG(動態畫面專家群組)視訊播放對於所有測試的 效能設定演算法提出一難難考驗。雖然該效能設定演算法 典型地將一週期性負載置於該系統上,效能需求仍可依據 MPEG訊框種類而改變。因此,如果一效能設定演算法對 應過去的(高度變化性的)MPEG訊框解碼事件使用一相對 的長時間窗格以預測未來的效能需求,其可能錯過執行密 集運算訊框(較不具代表性的)的截止期限。另一方面,如 果該演算法僅考慮一短暫的間隔,則其將不會收斂於單一 效能數值而會快速地於多個設定間震盪。由於每個效能層 級的改變導致一轉換延遲,於不同的效能數值間快速震盪 是不令人滿意的。關於LongRun模擬結果確認於MPEG效 能測定程式之震盪行為。 本技術藉由依據位於階層體系最上層的互動效能設定 演算法以限制最糟情況的回應而處理此MPEG工作負載的 震盪問題。於階層體系的最下層有更多傳統的間隔基礎預 期演算法將可採取更長期的效能層級需求觀點。 第 8圖為一表格,其詳述關於該‘plaympeg’放影機 (lL本·ίP://www.I〇kipames.com/develor)ment/smpeg.php3)播放 32 200422942 各種MPEG視訊時的模擬測量結果。該放影機的某些内部 變數已被揭露以提供該放影機如何受到執行時動態地改變 該處理器效能層級產生之結果的影響。這些數字顯示於該 表格的MPEG解碼欄中。特別地,該‘提早(Ahead)’測量每 個訊框有多接近截止期限。至截止期限的接近程度以播放 每種視訊之累積秒數表示。為了最大的電力效率,該提早 變數值應儘可能接近零,雖然該處理器最慢的效能層級對 提早數值可被降低的程度設了 一下限。位於該表格最右側 攔位的一‘正好位於時間攔位’指出正好符合其截止期限的 訊框總數量。正好準時的訊框數越多,則該效能設定演算 法就越接近理論最佳值。第8圖表格中執行統計攔的資料 被監控子系統的智慧型能源管理程式1 20所收集。為了收 集關於LongRun的資訊,該智慧型能源管理程式12〇於被 動模式中被用於聚集效能改變的軌跡而不控制該處理器效 能層級。該閒置攔位指出該核心程式之閒置迴圈中(可能處 理内部雜務或僅在盤旋)耗去的時間比例,而該休眠棚位才匕 出該處理器實際處於一低電力休眠模式所耗去的時間比 例。可從第8囷中看出對於每個這些效能測量本技術表現 地較LongRun好上許多。 第9圖為一表格,其列出於每個工作負栽的執行期間 所收集的處理器效能層級統計。將每個效能層級的時間比 例運算為該工作負載之執行期間的總共非閒置時間的一部 份。該表格之‘平均效能,層級攔指出在每個工作負載之執 33 200422942 行期間的平均效能層級(以高峰效能之百分比表示)。因此 於所有情形中,使用本技術之每個工作負載的平均效能層 級較使用LongRun為低,最後一攔指出關於LongRun達成 的平均效能降低。LongRun工作負載與本發明之工作負載 的播放品質是相同的,即相同的訊框速度且沒有遺漏訊框。 結果顯示本技術較已知的LongRun技術可更準確地預 測必要的效能層級。增加的準確度造成在執行效能測試程 式的期間中該處理器之平均效能層級有1 1 %至3 5 %的降 低。由於執行一工作負載之間的工作數量應該保持相同, 該較低的平均效能層級暗示當本技術之智慧型能源管理程 式啟動時可預期有降低的閒置與休眠時間。該模擬結果確 認了此一預期。相似地,當本技術之智慧效能管理程式啟 動時正好符合其截止期限的訊框數量將會增加,而當解碼 早於其截止期限所累積的時間數量將會降低。 該中間效能層級(由第9圖表格中的每欄以粗體強調) 也顯示重大的降低。依據本發明之效能設定演算法於多數 效能測定程式上選定一低於高峰的單一效能層級給最多數 的執行時間(>88%),而LongRun通常設定該處理器以全速 運轉。此通常規則的例外為‘Danse De Cable,工作負載,其 中依據本技術之效能設定演算法選定最低的兩種效能層級 並於此兩層級間震盪。此震盪行為的原因乃是由於該 Crusoe處理器的特定效能層級。依據本技術之效能設定演 算法將決定選擇僅高於300 Mhz —點點的一效能層級,因 34 200422942 而當效能層級預測在300 Mhz的上下波動時, 層級將被設為最接近的兩效能層級數值。該已 技術與本技術之效能上最值得注意的不同在於 偵測到大量的處理器活動時,其非常快地躍升 而顯得過度謹慎。 關於所有的工作負載,使用LongRun的平 能層級從未低於 80%,而由本技術設定的多 ‘Red’s Nightmare Small’效能測試程式中下降J 本技術之演算法較LongRun更為主動但於服務 所下降時可快速地反應。由於LongRun並未擁 互動效能的資訊,其被迫於一較短的時間訊框 的行動而該模擬結果顯示此致使無效率。 第10圖包含播放兩種名為‘Legendary’(第 ‘Danse de Cable)(第 10B 圊)之不同 MPEG 電 表。每個圖表說明對於LongRun與本技術於四 能層級(300,400,500, 600 MHz)各自耗去時間 然每次執行的播放品質是相同的,但仍可由圖 用依據本技術之演算法時該處理器於高峰效能 的時間大大地長於該LongRun技術指定該效 況。第10A圖中描繪之播放‘Legendary,電影的 據本技術之演算法選定一 500 MHz效能層級。 所繪關於‘Danse de Cable,電影之結果顯示使 術之演算法,該處理器於兩效能層級300 MHz 該目標效能 知 LongRun 當 LongRun 該效能層級 均處理器效 吹能層級於 L 52%。依據 品質顯示有 有任何關於 中採取保守 10A圖)與 影的結果圖 種處理器效 的比例。雖 表中看出使 之下所耗去 能層級的情 結果顯示依 第10B圖中 用依據本技 與 400 MHz 35 200422942 之間切換 定演算法 Mhz 〇 第1 入觀察。 然而依據 一目標效 LongRun 級。第1: 程式但啟 示執行期 術之效能 上可請求 些情形中 實上低於 如今 術。由於 作負載的 服此困難 體地說, 式的控制 120僅被 設定選擇 °藉由對照’關於該兩部電影該LongRun效能設 於大部分的執行時間選擇高峰處理器速度600 1圖提供對兩種不同效能設定策略之品質上的深 LongRun持續快速交替地上下切換該效能層級, 本技術所控制的系統之處理器效能層級保持接近 心層級。第11A囷之兩圖表(上排)顯示在啟動 執行一效能測試程式支期間該處理器的效能層 、 c囷(中排與下排)顯示關於相同效能測試 動本發明之演算法的效能層級結果。第11B圓顯 間中的實際效能層級,而第11C圖反映依據本技 p又疋演算法在一可於任何效能層級運轉之處理器 的效能層級(假設有相同的最大效能)。注意於某 ’依據本技術之演算法所計算的需求效能層級事 該處理器上可達成的最低效能層級。 考慮模擬結果用於比較互動工作負載上的兩種技 難以建立重複執行的互動效能測試程式,互動工 估計較多媒體效能測試程式要難上許多。為了克 ’實驗上的測量與一簡易模擬技術相結合。更具 該互動效能測試程式於本機LongRun電力管理程 下運轉’而依據本技術之智慧型能源管理程式 動模式中執行,因而其僅記錄其可能做出的效能 而不會實際改變該處理器的效能層級。 36 200422942 第12圖顯示在一模擬運轉過程中收集的效能資料以 供評估互動工作負載。第12A圖為關於LongRun技術之相 對於時間(以粆為單位)的效能層級百分比圖表且在此情形 中所緣之結果相當於該處理器於該測量中的實際效能層 級。第12B圖為一數值化的效能層級圖表,而第12C圖為 本技術之效能設定演算法於控制該處理器可設定的一原始 效能層級時間函數圖表。注意若本技術之演算.法事實上於 控制中,其效能設定選擇將與LongRun所做的選擇有一不 同的運轉時間影響。因為此因素於第12B與12C圊之圖表 中的時間軸將被視為近似值。 為了避免該統計之時間偏離問題,依據本技術之模擬 的被動效能層級軌跡於往後被處理以評估使用本技術而非 LongRun所可能導致之增加執行時間的影響。僅關注互動 事件而非整個效能層級軌跡。本技術之互動效能設定演算 法包含找出對使用者具有一直接影響之執行過程的功能。 此技術給予有效的讀數而不論何種演算法負責控制因而用 於關注我們的測量。一旦一互動事件的執行範圍被獨立, LongRun與本技術均計算於該事件過程中的全速等效完成 工作。由於在測量期間中LongRun控制該CPU速度而其 運轉較本技術控制時為快,對應本技術之結果的事件過程 必須被延長。首先,本技術依據下列公式計算剩餘工作:Where k is a weight, Idle is the number of seconds between the time s and the time Rs in the fourth frame, and the deadline of task A is defined as (+. In this particular example, for the 4-round task B The preemptive task execution detection guidance algorithm determines the utilization history of each task. Processing tasks executed before the next unpreempted task A is scheduled is often highly related to the execution of task A. The interval between the TC and the RS is the "latency period", and the processor can execute a reduced level of performance during this period. However, the "work C" is included in the performance level calculation due to the reduction of the available latency period. The positions in the above equation and each represent an exponential decay average. This exponential decay average makes the updated estimate have a greater influence on the average than the old estimate. The weight k is a parameter about the exponential decay average. .K = 3 is known to work effectively and this small value indicates that each estimate is a good one. With individual tracking job predictors and deadline predictors, the importance of performance predictions is based on the use history pane Depending on the length. This ensures that performance estimates related to larger pane sizes do not dominate performance predictions. The performance level indicator for this algorithm ... epecM. vej is calculated based on 17 200422942 two-fingered Perfper's payload. It is based on self-adaptation and every known 5 equals the size of the grid, but it is used when the calculation history pane is long according to the new negator. Wait for the upper layer to calculate the average ratio of the attenuation of the applied formula to obtain: sepecrnvu-hw " heart. Calculate the unique level value for each job. In accordance with the strategy of the present technology, the% of work for a particular task is recalculated between a subordinate working time interval of 50 亳 to i 50 milliseconds. However, because% is calculated on a task-to-task basis, each executed task uses its task-based% value, and the% 4W value fact is corrected to 10 seconds (reflecting the task switching event) . This algorithm is different in interval-based algorithms. The former uses estimates for each task and adjusts the usage history window on a task by task basis. Although known uniform performance setting algorithms use exponential decay to flatten their fixed-use panes (10 to 50 milliseconds) across all tasks performed as a whole, rather than calculating a task-based utilization over a variable task-based utilization Based averaging. According to the anticipation-based algorithm of this technology, it is necessary to avoid an interactive CPU extreme task from being used without being preempted during the processing section. "Because it can only be downloaded at least once once the task is preempted." The history pane is used, so the waiting time of the performance level t prime is used for this task. To avoid unwanted performance time & calculation work estimate t is not pre-occupied 6¾ i§ set the limit value in particular If a task lasts for 100 milliseconds and is not preempted, it is recalculated based on the preset value. Considering to ensure a more urgent program history, 1¾ grid is provided to the interactive application 18 200422942 program through the hierarchical system layer 21, so choose 1 The value of 00 milliseconds. It is also considered that only user applications that may be affected by the 100 millisecond pane threshold are batch-intensive batch operations such as compilation, which may take seconds or even minutes. In this case an additional 100 milliseconds (0.  i seconds) execution time can be an important wise performance. Figure 5 outlines an implementation of the three-tier system performance strategy stack of Figure 2. The implementation includes a performance indicator strategy stack 510 and a strategy event handler 530, both of which output information to a target performance calculator 540. The target performance calculation program 540 is used to organize the results from four performance setting algorithms: a high-level interactive algorithm, a middle-level application-based algorithm, and two different low-level algorithms. The four algorithms can be executed simultaneously. The target performance calculation program 540 derives a single overall target performance level from the multiple performance indicators (four in this example) generated by the strategy stack 5 10. The strategy stack 510, together with the strategy event handler 530 and the target performance calculation program 540, provides a flexible framework for multiple performance setting strategies, so the strategy algorithms of each layer of the stack can be replaced or exchanged according to user requirements. Therefore, the performance strategy stack provides a platform for user-defined performance setting strategies for experimentation. Each of the various performance setting algorithms is dedicated to dealing with different specific types of uptime events. However, since there are four different algorithms for outputting different performance indicators in the exemplary embodiment of FIG. 5, the software must decide which of the four performance indicators is preferred to set the overall target value. 19 200422942 It is also necessary to determine the time at which one overall target performance level can be efficiently calculated, assuming that each performance setting algorithm can be executed independently and produce output at different times. At the same time, it is also necessary to consider how to combine the performance indicator under the condition that multiple performance setting algorithms all use the same processing event as a basis for determination, otherwise false target updates may occur. In order to deal with these issues, the strategy stacking 510 algorithm is organized in a three-tier system as shown in the figure, where higher-level policies may take precedence over performance-level requests that are derived from lower (not dominant) layers. Therefore, the hierarchical algorithm can take precedence over the hierarchical algorithm, and the latter can take precedence over the two algorithms of the level 0. Note that each hierarchy can itself contain multiple alternative performance setting algorithms. Different performance setting algorithms do not know their position in the hierarchy and can determine their performance based on any event in the system. When a specific algorithm requests a performance level, it sends an instruction to the strategy stack 510 along with its desired performance level. Each algorithm of the strategy stack includes a command 512, 516, 5 20, 5 24 and stores a corresponding performance level indicator 514, 518, 522, and 526. The ignore instruction 5 20 for the layer 1 algorithm indicates to the target performance calculation program 440 that the related performance level indicator should be ignored when calculating the overall performance target. The setting instructions 512 and 516 of the two algorithms that have been assigned to the level 0 make the target performance calculation light 5 40 and set the corresponding performance level regardless of any performance level request from the lower level of the hierarchy. However, this setting command cannot take precedence over performance level requests from higher-level system levels. In this embodiment, one level 0 algorithm has requested to set the performance to 55% of the peak 20 200422942 level, while the other level 0 algorithm requires to set 25% of the peak level. The target performance calculation program uses the request with the same priority in combination with the value of t in this example as the performance indicator of the layer. In level 2, if the command is specified with the same 80% performance indicator. The "set" instruction provides the target performance calculation program 54. The level must be 80 ° /. Assume that this is greater than any performance indicator from a lower class. In this example, the level 0 performance indicator is 1 and the performance indicator will be ignored so that the overall target will actually be 80% of the performance. Since the most recently calculated performance level of each algorithm is slightly stacked in the memory 5, the target performance is calculated at any time to calculate a new overall target value without the & algorithm. When one of the algorithms on the stack is able to request a hierarchy, the target performance calculation program calculates an updated overall target from the bottom to the content of the data structure. In the example in Figure 5, the overall pre-level is set at level 0. 1 remains at 55% and changes the whole at level 2 although each performance setting algorithm can be triggered (by system components) to calculate a new performance at any time when there is a set of common events that all performance settings respond to Hierarchy. During this process, 530 will monitor these events and flag them for strategic event information to the target performance calculator 54. The performance is set to an operating program to optimally set 55% if it is greater than the setting. If it is greater than the setting, the overall performance of the system level is set to 55% and the level is set as a peak indicator. The calculation program 540 can call each A new performance calculation evaluates the performance level of the command effectiveness. The estimate is 55% and the forecast is 80%. A processing event in the algorithm will tend to be strategic event processing. This handler will be divided into 21 200422942 categories including reset event 532, task switching event 534, and performance change event 536. The performance change event 536 is a notification that warns each performance setting algorithm to pay attention to the current performance level of the processor, even though it typically does not change the performance request on the policy stack 510. Regarding the strategic events 532, 534, and 536 of this special classification, the overall performance level is not calculated every time one of the algorithms sends an updated performance level indicator. On the contrary, the performance level calculation is integrated, so for each event notification, it is only calculated once after all the event handlers of all relevant performance setting algorithms have been called ** An application programming interface (API) can be provided to The device driver or the device itself. This interface enables a device to notify the strategy stack of 5 10 and / or individual performance setting algorithms of any significant changes in operating conditions. This causes the performance setting algorithm to trigger a recalculation of the target performance level and thus promptly take changes in the operating conditions. For example, when a CPU-intensive processor-bound task starts, the device may send a notification to the policy stack 510. This notification is optional and the performance setting algorithm may not need to respond to it when received. FIG. 6 schematically illustrates a work tracking counter 600 according to one of the techniques. The work tracking counter 600 includes: an incremental value register "ο, which has a software control module 620 and a hardware control module 630; an accumulator module 640, which has a work count value Register and a time count register 'a time base register 646;-an instant timer 65 and a control register 660. The work tracking counter of this exemplary embodiment can be 22 200422942 with a known time The flag counter and the CPU cycle counter are different. The increment value of the counter in this embodiment is close to or at the time when the count value is incremented, and is proportional to the actual work performed by the processor. The increment value register 610 includes a work completion calculator. , Which estimates the work completed by the processor in each counter cycle. The completed work is estimated to be obtained through the software control module 620 and / or through the hardware work module 630. The software control module implements a The amount of work is simply calculated in relation to the current processor speed. If the processor is running at 70% of peak performance, the incremental value will be 0. 7, and if the processor is operating at 40% of peak performance, the incremental value will be 0. 4. When the performance control module 620 detects that the processor is interposed in the counter cycle, the increment value is set to 0 °. In the alternative work tracking counter embodiment, a more sophisticated software algorithm is used to calculate a Precise job estimates. Table 1 lists the measurement data, which indicates one of a cpu limit loop and an MPEG video workload when considering a performance level transition between two different processor speeds (high to low in this example). The percentage difference between the expected operational period and an actual operational period. The result is based on the conversion after clear processor performance levels -300, 400 and 5 OOMhz (as shown in the leftmost stop in the table). The top column of Table 1 lists the initial performance levels that will transition to the corresponding processor speeds listed in the leftmost column. At the CPU limit loop, predictions and actual measurements cannot be distinguished from noise. However, for MPEG workloads, there is an incorrect loss of about 6% -7% at each 100Mhz processor frequency stage. The largest inaccuracies in these workloads appear to be less than 20% (19.4%), which is considered acceptable for systems with only a few fixed performance levels. However, as the available range of selectable minimum to maximum processor performance in a system increases and the range of each performance level phase decreases, it appears that a more accurate job estimation procedure will be required. CPU limit loop MPEG video workload after conversion 400Mhz 5 0 OMhz 600Mhz 400Mhz 500Mhz 600Mhz 300Mhz -0. 3% -0. 4% -0. 3% 7. 1% 1 3. 5% 19. 4% 400Mhz -0. 1% 0. 0% 6. 9% 1 3. 3% 500Mhz 0. 1% 6. 8% Table 1 replaces the more sophisticated algorithms of the exemplary embodiment using more accurate job estimation techniques, which include monitoring indication characteristics (tracking of major events such as memory access through a technology device) and estimated and actual The workload reduction ratio 'is not assuming that the completion of work is directly proportional to processor speed. A further alternative embodiment uses a cache hit ratio and a memory system performance indicator to accurately generate a job estimate. A further alternative exemplary embodiment uses software to monitor the ratio of the processing time used to execute a programming application relative to the processing time used to execute a wonder application task. The hardship control module 63 can estimate the completion of work even during the transition of the processor between two fixed performance levels. Each processor can have a pause of approximately 20 microseconds during the transition, during which time the processor will not send any instructions. This pause is due to the time required to resynchronize the phase-locked loop to the new target processor frequency. In addition, before changing the processor frequency, the voltage must be stabilized to a proper value for the new target frequency. Therefore, there is a transition time of at most one second, during which it can be assumed that the processor runs at the old target frequency but consumes energy at the new target level (because the voltage is set to the new target level, the frequency can pass through the intermediate frequency stage While jumping up several levels to measure the value register during the transition period where the shadow frequency changes dynamically, the software example embodiment uses both hardware and software to work, one of the alternative exemplary embodiments to estimate the completion of the work. Group 640 is timed from the value and added to the work count value. The work count value register is at each value. The timer time is marked as a signal. To measure the work in the accumulation module 6 4 〇 for a predetermined time The beginning of a predetermined time interval while another difference provides the completion of work. The instant timer also controls the rate of temporary increase. The time basis of this time count operates but is used to measure changes in response levels. Here the processor operates the hard艎 Control module to update and add undetected dynamic changes. Although this display control module 6 2 0, 6 3 0 can be used only after calculation In the two modules, the incremental value register 610 reads an accumulated sum stored in the incremental register. The timer time mark increases the work count value in the time interval derived by the timer 650 when the work count is increased. The value stored in the count is read twice, once at the end of the person. An indication of the time interval between these two values. The time count value register stored in the memory 644 and the work count ^: The value takes the same time consumed instead of the completed work. 25 200422942 has both a time counter and a completed work counter to form a performance deterministic algorithm. The purpose of providing a time-based register 646 is multi-platform capacity and conversion to seconds. Its It is used to specify the base (frequency) of the two counters 642, 644, so the time can be accurate and consistent, in other words, the accumulated value stored in the time count value register provides a measure that takes time milliseconds. The control register mode Group 660 contains two control registers, each of which uses one of them. A counter can be moved, stopped, or reset through an appropriate control register. Figure 7 To explain a device that can provide a number of different levels of fixed performance depending on the characteristics of the workload. The device includes a CPU 710, a timer 720, a power supply control module 730, and the incremental value of the work tracking counter in Figure 6. Register 610. The power supply control module 730 determines the fixed performance level at which the CPU is set and selects an appropriate clock for the real-time timer 720. The power supply system module 730 will The information on the current processor frequency is input to the incremental value memory 610. Therefore, the value of the increment is proportional to the processor frequency, which provides an estimate of the available work done by the processor. This strategy stacks many 5 1 0 The performance measurement algorithm uses the utilization history of the process at a specific time interval (pane) to estimate the appropriate target speed of the processor in the future. The main goal of any performance setting strategy is to maximize the busy time of the processor from the start of execution to the deadline of the task by reducing the processor frequency and voltage level to an appropriate target performance level. In order to actually predict the target performance level, the Intelligent energy management sets the pre-launch and start-up control relays between phases. 26 200422942 Formula 1 20 provides an extraction to track the actual work that the process has completed in a specific time interval . This completion work abstraction allows performance changes and lag time to be taken into account regardless of the implementation of specific hardware counters that may vary between platforms. Based on this count, in order to obtain a work measurement estimate for a time interval, each performance setting algorithm is assigned a "work structure" data structure. Each algorithm is set to call a "work start function" at the beginning of the time interval , And call a "work stop function" at the end of the specific time interval. During the completion of the work measurement, the content of the work structure is automatically updated to specify the proportion of idle time allocated by the individual performance level of the processor and Use processor time scale. The information stored in the work structure is then used to calculate the full-speed equivalent work value ('The value is then used to predict the target performance level. This completed work extraction function is implemented in the smart energy management program 1 2 0 software and provides a convenient interface to the smart energy management program 1220 for developers of performance-level prediction algorithms. The completion of the task extraction also simplifies the performance setting system of this technology. The port connection of the system is different hardware. Architecture. A major difference between alternative hardware platforms is the way in which time is measured on that platform. ° Specifically, some architectures provide a low recurrence cycle counting method through time stamp counters, while other architectures only provide externally programmable time breaks to the user. However, even if a time stamp counter is provided, it does not necessarily measure the same For example, the first hardware platform contains both Intel [RTM] and ARM [RTM] processors. In these processors, the counter calculates CPU cycles and the counting rate is related to the speed of the processor 27 200422942. The counter stops counting when the processor enters the sleep mode. The hardware platform includes the Crusoe [RTM] processor, which implements a time when the counter consistently calculates the period of the processor at the peak rate and holds the peak rate count, even if the processor is in The work extraction in the sleep mode helps the target efficiency design technology be implemented on these two platforms. The estimated work calculated in this embodiment does not determine that the work is performed at half the peak performance and is not necessarily consumed at speed. The fact that it takes twice as long to complete. One reason for this violation is that This is not the case with the processor's core program speed reduction memory system. Therefore, the core program and memory card effect increase is more beneficial to the memory. Run simulation to estimate the performance setting technology compared to a known technology 'the known technology It is built in Transrneta Crusoe CPU, 4 long run (LongRun), power management program. Transmeta CPU has 'LongRun' power management program built into the processor. LongRun is different from other known power management technologies, and it avoids the system. The need for power management to take effect. LOngRun uses historical usage to guide clock selection: if it is highly used, it will increase the degree 'and reduce performance when it is used at a low level. Unlike other implementations on more unified processors 'The power management strategy can be relatively easily implemented on the Crusoe processor, because the processor already has a hidden soft-line dynamic binary translation and optimization. The goal of this simulation is. The second standard has continued to increase. This finish is slower than replacing a full-intuition-thinking processor. Special t7 in a Crusoe body. Adjust the processor speed of the processor. Many implementations are implemented at the system level. 28 200422942 Strategies such as LongRun are implemented at such a low-level layer of the software hierarchy how to execute them effectively. This technique executes on the same processor as LongRun. This simulation was performed on a Sony Vaio PCG-C1VN laptop, which uses a Transmeta Crusoe processor and operates at a fixed performance level of 300Mhz to 600Mhz at a performance level of 100OMhz. This simulation uses one with Linux 2. 4. 4 An improved version of the acl8 core program, Mandrake 7. 2 operating system. The estimated workload for comparison is as follows: Plaympeg SDL MPEG player library, Acrobat Reader for calculating PDF files, Emaes for text editing, NetScape mail and news for reading news 4. 7.Konqueror 1_9 for web browsing. 8 and Xwelltris 1.0 as a 3D game. 〇. The performance test program for interactive shell commands is a record of a user performing miscellaneous shell operations during a period of approximately 30 minutes. In order to avoid the variability that the dynamic rendering engine of the Crusoe processor may generate, most performance test programs are executed at least twice to prepare the dynamic rendering cache memory, and all the generated simulation data except the last one is executed. be ignored. The design of the performance setting algorithm according to the present invention will not hinder the way the host platform controls the timer. For the purpose of this simulation, this technology provides a millisecond resolution timer without changing the way the Linux built-in 10ms resolution timer works. This goal is achieved by carrying a timer assignment routine (which checks for timer events) to parts of the core program that are often executed, such as schedulers and system calls. 29 200422942 Because the performance setting algorithm based on this technology is designed to have hooks to the core program so that it can break certain system calls to find interactive events and it is called at every task switch, it can be Attach some indicators to these hooks to manage timer assignments. Each hook is augmented by implementing a time-scaled & ten-counter read, a time stamp with the next timer event, a comparison, and a branch-to-timer assignment routine on success. It is found in practice that this strategy establishes a timer with sub-millisecond accuracy. Table 2 below details the timer statistics attached to this simulation. In the worst case, the resolution of the timer is limited by the time unit of 10 milliseconds (seemingly inconsistent with Table 2) in the scheduler. However, since the performance-setting algorithm based on this count focuses on measurement, which usually occurs close to the time trigger ', the resolution reached is considered sufficient. It has proven to be advantageous for the system's soft-plated timer to stop working when the processor is in sleep mode, because this means that the interruption of the timer does not change the sleep characteristics of the operating system and applications during execution. The timer used has high resolution and low overhead. The advantages of these timers facilitate the development of implementations with both active and passive modes. In the active mode, the control is based on the performance setting algorithm of this technology. In the passive mode, the built-in LongRun power management program is responsible for performance, although the smart energy management program of this technology acts as an observer of performance and performance changes. Table 2 30 200422942 30 to 40 required to access a time stamp counter Average interval of period timer check __ ~ 〇 · 1 leap second__ ~ 1 millisecond timer accuracy — average timer checks duration of allocation (including possible execution of an event handler) 1 00 to 15 The performance change produced by LongRun is similar to that achieved by the timer allocation routine. The smart energy management program 12 according to the present technology periodically reads the processor's performance level through a specific machine register and compares the result with a previous value. If the two values are different, the changes are recorded in a buffer. The smart energy management program according to the technology includes a tracking mechanism that keeps a record of major events in a core program buffer. This record contains performance level requests, task preemption, task IDs (identification tokens), and the performance level of the processor from different strategies. When running the simulation, LongRun can be compared with the performance setting algorithm of this technology during the same execution period: LongRun controls the performance setting and the intelligent energy management program 120 based on this count can be used to output its control over the same workload. A possible decision. This simulation strategy is used to objectively evaluate the difference between the known LongRun technology and the non-repeatable execution of the interactive performance test program between the technology. In order to evaluate the additional burden of using this measurement and performance setting technology, the performance setting algorithm according to this technology is equipped with a marker that continuously tracks the elapsed time of the performance setting algorithm code in runtime. Although the extra operating time burden on a Pentium II according to the technology is about 0. 1% to 31 200422942 0 · 5%, but the additional burden on the Transmeta Crusoe processor is 1% to 4%. Further measurements in virtual machines, such as 'VMWare' and 4-user mode linux, (UML) confirm that the additional burden of performance setting algorithms based on the technology may be much greater in virtual machines than on traditional processor architectures. However, this additional burden can be reduced by algorithm optimization. MPEG (Motion Picture Experts Group) video playback poses a difficult test for the performance setting algorithms of all tests. Although the performance setting algorithm typically places a periodic load on the system, the performance requirements may still vary depending on the type of MPEG frame. Therefore, if a performance setting algorithm responds to past (highly volatile) MPEG frame decoding events using a relatively long pane of time to predict future performance requirements, it may miss the execution of intensive computing frames (less representative Deadline). On the other hand, if the algorithm only considers a short interval, it will not converge to a single performance value and will quickly oscillate between multiple settings. Since each performance level change causes a conversion delay, rapid oscillation between different performance values is not satisfactory. The LongRun simulation results are confirmed in the oscillating behavior of the MPEG performance measurement program. This technology addresses the oscillating problem of this MPEG workload by setting algorithms based on interactive performance settings at the top of the hierarchy to limit the worst-case response. There are more traditional interval-based prediction algorithms at the bottom of the hierarchy that can take a longer-term view of performance-level requirements. Figure 8 is a table detailing the 'plaympeg' player (lL 本 · ίP: // www. I〇kipames. com / develor) ment / smpeg. php3) 32 200422942 Analog measurement results when playing various MPEG videos. Certain internal variables of the player have been disclosed to provide how the player is affected by the results of dynamically changing the processor performance level during execution. These numbers are shown in the MPEG decode column of the table. In particular, this 'Ahead' measures how close each frame is to the deadline. The closeness to the deadline is indicated by the cumulative number of seconds of playing each video. For maximum power efficiency, the early variable value should be as close to zero as possible, although the processor's slowest performance level places a lower limit on how early the value can be reduced. A 'just-in-time' block located on the far right of the form indicates the total number of frames that just met its deadline. The larger the number of just-in-time frames, the closer the performance setting algorithm is to the theoretical optimal value. The data of the execution statistics in the table in Figure 8 were collected by the smart energy management program 120 of the monitoring subsystem. In order to collect information about LongRun, the smart energy management program 120 is used in the passive mode to gather the trajectory of performance changes without controlling the processor performance level. The idle stop indicates the percentage of time spent in the idle loop of the core program (which may handle internal chores or only hovering), and the hibernation booth only consumes the processor when it is actually in a low-power sleep mode. Time proportion. It can be seen from Section 8 that for each of these performance measures the technology performs much better than LongRun. Figure 9 is a table listing the processor performance level statistics collected during the execution of each job load. Calculate the time ratio for each performance level as part of the total non-idle time during the execution of that workload. The ‘Average Performance, Levels’ column in the table indicates the average performance level (expressed as a percentage of peak performance) during the execution of each workload. Therefore, in all cases, the average performance level of each workload using this technology is lower than using LongRun, and the last block indicates that the average performance achieved with LongRun is reduced. The playback quality of the LongRun workload and the workload of the present invention is the same, i.e. the same frame speed and no missing frames. The results show that the technology can predict the necessary performance level more accurately than the known LongRun technology. The increased accuracy results in a 11% to 35% reduction in the average performance level of the processor during the performance test program. Since the amount of work performed between performing a workload should remain the same, this lower average performance level implies that reduced idle and hibernation times can be expected when the technology's smart energy management program is launched. The simulation results confirmed this expectation. Similarly, the number of frames that exactly meet its deadline will increase when the Smart Performance Manager of this technology starts, and the amount of time accumulated when decoding is earlier than its deadline will decrease. This intermediate performance level (emphasized by each column in the table in Figure 9) also shows significant reductions. According to the performance setting algorithm of the present invention, a single performance level below the peak is selected on most performance measurement programs to give the maximum execution time (> 88%), and LongRun usually sets the processor to run at full speed. The exception to this general rule is ‘Danse De Cable, workload, where the two lowest performance levels are selected based on the technology ’s performance setting algorithm and oscillate between these two levels. The reason for this oscillating behavior is due to the specific performance level of the Crusoe processor. The performance setting algorithm based on this technology will decide to choose a performance level that is only higher than 300 Mhz—a little bit. Because 34 200422942 and when the performance level prediction fluctuates up and down at 300 Mhz, the level will be set to the closest two performances. Level value. The most noteworthy difference between this technology and the performance of this technology is that when a large amount of processor activity is detected, it jumps very quickly and appears overly cautious. Regarding all workloads, the level level of using LongRun has never been lower than 80%, and the multiple 'Red's Nightmare Small' performance test programs set by this technology have dropped. The algorithm of this technology is more active than LongRun but is used by service agencies. Quick response when descending. Because LongRun did not have information on interactive performance, it was forced to act in a short time frame and the simulation results showed that this made it inefficient. Figure 10 contains two different MPEG meters called 'Legendary' (#Danse de Cable) (# 10B 圊). Each chart shows that the time spent for LongRun and the technology at the four-energy level (300, 400, 500, 600 MHz) is the same, but the playback quality is the same each time. The processor's peak performance time is much longer than the LongRun technology specifies for this condition. The playback of 'Legendary' depicted in Figure 10A, the movie's algorithm based on this technology selects a 500 MHz performance level. The results of the movie ‘Danse de Cable’ show that the algorithm is operational. The processor is at 300 MHz at the two performance levels. The target performance is known as LongRun. When LongRun, the performance level is equal to the processor efficiency. The blowing level is at L 52%. According to the quality display, there is anything about the ratio of the processor efficiency in the conservative 10A graph) and the shadow result graph. Although it can be seen in the table that the energy level is consumed below, the results show that according to Fig. 10B, the switching algorithm between 400 MHz 35 200422942 based on this technology and the fixed algorithm Mhz 〇 First observation. However, it is based on a goal of LongRun level. The first: the program but the execution period of the technique can be requested. In some cases, it is actually lower than the current technique. Due to the difficulty of serving the load, the type of control 120 is only set and selected. By comparing the two movies, the LongRun performance is set at most of the execution time and the peak processor speed 600 is selected. The deep LongRun in the quality of different performance setting strategies continuously and quickly alternates the performance level up and down. The processor performance level of the system controlled by this technology remains close to the heart level. The two graphs (upper row) of 11A 囷 show the performance level of the processor during the execution of a performance test program. result. Figure 11B shows the actual performance level in the circle, and Figure 11C reflects the performance level of a processor that can operate at any performance level (assuming the same maximum performance). Pay attention to the required performance level calculated by the algorithm based on the technology. The lowest performance level that can be achieved on the processor. Considering the simulation results used to compare the two techniques on interactive workloads, it is difficult to create a repetitive interactive performance test program. The interactive work is estimated to be much more difficult than the multimedia performance test program. In order to combine the experimental measurements with a simple simulation technique. Furthermore, the interactive performance test program runs under the local LongRun power management process and is executed in the intelligent energy management program based on the technology, so it only records the performance it may make without actually changing the processor. Level of effectiveness. 36 200422942 Figure 12 shows performance data collected during a simulated run for evaluating interactive workloads. Figure 12A is a graph of the performance level percentage of LongRun technology versus time (in units of 粆) and the result in this case is equivalent to the processor's actual performance level in the measurement. Fig. 12B is a numerical performance level chart, and Fig. 12C is an original performance level time function chart that can be set by the performance setting algorithm of the technology to control the processor. Note the calculations of this technique. In fact, in the control, its performance setting choices will have a different run time impact than the choices made by LongRun. Because of this factor, the time axis in the graphs of 12B and 12C 圊 will be regarded as approximate. In order to avoid the problem of time deviation of the statistics, the passive performance level trajectory of the simulation based on this technology is processed later to evaluate the impact of increased execution time that may be caused by using this technology instead of LongRun. Focus only on interactive events, not the entire performance-level trajectory. The interactive performance setting algorithm of this technology includes the function of finding an execution process that has a direct impact on the user. This technology gives effective readings regardless of the algorithm responsible for control and is therefore used to focus on our measurements. Once the execution scope of an interactive event is independent, both LongRun and the technology calculate the full speed equivalent to complete the work during the event. Since LongRun controls the CPU speed during the measurement period and its operation is faster than when controlled by this technology, the event process corresponding to the results of this technology must be extended. First, this technique calculates the remaining work based on the following formula:

Workpresent technique Remaining = W〇rkLongRun- w〇rkPresent technique 37 200422942 其次,該演算法計算該互動事件的長度所需要被延長 的程度一假設本技術的演算法持續於其預測的速度運轉值 到其達到緊急臨界冑,並於之後以全速運轉。該統計也相 依地調整。發現到使用此技術的結果與我們觀察到於相似 的工作負載上(相同的效能測試程式但有些微不同的互動 負載)以依據本技術之演算法主動控制該處理器而運轉的 結果十分接近。然而’當依據本技術的演算法事實上負責 控制時’該效能設定選擇的數量被減少而效能層級會更準 確。 第13圖顯示使用上述時間偏離校正技術所收集的統 計數據。圖中六種圖表均包含兩種堆疊的棚。每個囷表上 左手邊的欄與LongRun有關而右手邊的攔與本技術有關。 每個攔為堆疊的以代表在該電腦所支援之四種效能層級上 的互動事件中所耗去的時間比例。這些效能層級一從最底 下往上一為300 Mhz以1〇〇 Mhz為增加量至600 Mhz。即 使從一高層級,明顯地依據本技術之演算法於較低的效能 層級上耗去較LongRun更多的時間。在某些效能測試程式 像是Em acs,幾乎沒有需要快速地執行而於機器停留在其 最低可能效能層級時符合其互動截止期限。於該圖譜中的 另外一端為Acrobat Reader效能測試程式,其顯示兩種統 計方式的行為:該處理器不是以高峰層級就是以最小層級 運轉。即使於此效能測試程式許多互動事件可於該處理器 之最小效能層級而準時完成❶然而’當涉及演算頁面時, 38 200422942 該處理器之高峰效能層級仍不足以在使用者感覺的感覺臨 界值之内於截止期限完成。因此,遇到一夠長的互動事件 時,依據本技術之演算法切換該處理器效能層級至高峰 值。藉由對照,在執行Konqueror效能測定程式的期間, 依據本技術之演算法可利用該處理器的所有四種可用的效 能層級。·此*可與LongRun策略加以比較,其使該處理器耗 去大半時間於其高峰層級上。 綜合來說,以上參考第8至13圖所詳述的模擬結果已 顯示實施於該軟體階層體系之不同層級中的兩種效能設定 策略於多穣多媒體及互動工作負載上的表現。已發現實施 於處理器紉體中的Transmeta LongRun電力管理程式較依 據本技術實施於作業系統之核心程式中的演算法採取更多 保守的決定。在一組多媒體效能測試程式中依據本技術之 演算法達成的結果比使用已知LongRun技術所達成的結果 有11 %至3 5 %的平均效能層級降低。 由於依據本技術之效能設定演算法較LongRun實施於 該軟體堆疊中的更高層,其可基於更豐富的運轉時間資訊 組合而做出決定,因而轉化為更高的準確度。 雖然已顯示LongRun的韌體方式較實施於核心程式中 的一演算法不準確,但並未降低其實用性。LongRun作為 未知作業系統的有重要的優勢。已暸解低與高層級實施間 的差距可藉由提供一基本效能設定演算法如韌艘中的 LongRun以及為了最佳化處理器效能設定選擇而揭露一介 39 200422942 面至該作業系統而加以彌補。依據本技術之效能測 法階層體系提供支援此設計之機制。該堆疊上最低 設定策略可實.際地被實施於該處理器的韌體中。 雖然本發明的說明性實施例已被詳細地描述並 考相關的圖示,應瞭解本發明並不限於那些精確 例,而習知技藝人士在不違反由附帶的申請專利範 義之本發明的範圍與精神下可於其中實現許多變化 整〇 【圖式簡單說明】 第1圖概要地說明依據本技術之一電源管理系統如 實施於一資料處理系統中; 第2圖概要地說明依據本技術之效能設定演算法的 層體系層級; 第3圓概要地說明在一互動事件中設定處理器效能 泉略: 第4圖概要地說明於處理器上一工作負載的執行以 任務A計算利用史窗格; 第5圖概要地說明第2圖之三階層體系效能策略堆 種實施; 第6圖概要地說明依據本技術之一任務追縱計數器 第7圖概要地說明一種可依據工作負載特性提供數 的固定效能層級的設備; 第8圖為一表格,其詳述關於播放各種MPEG視 定演算 的效能 伴隨參 的實施 圍所定 以及調 何可被 三種階 層級的 及對一 疊的一 600 ; 個不同 訊的一 40 200422942 ‘plaympeg’視訊播放器的模擬測量結果; 第9圖為一表格’其列出在每個工作負載執行期間的處理 器效能層級統計數據; 第10圓包含播放兩種名為‘Legendary,(第10A圖)與‘Danse de Cable’(第10B圖)之不同的MPEG電影之結果的兩種圖 表; 第11A、11B與11C圖概要地說明兩種不同的效能設定策 略的特性; 第12A、12B與12C圖概要地說明於互動工作負載上測試 不同的效能設定演算法的模擬結果; 第13圖概要地說明使用一時間偏離校正技術所蒐集到的 統計數據。 【元件代表符號簡單說明】 1 00核心程式 11 0標準核心程式功能性 1 12,132系統呼叫模組 114排程程式 1 1 6習知的電源管理程式1 20智慧型能源管理系統 122策略協調程式 124效能設定控制模組 126事件追蹤 130使用者處理層 134工作管理模組 136特定應用程式資料 140應用程式監控模組 210互動應用程式效能指示器 220特定應用程式效能指示器 230以工作為基礎之處理器利用效能指示器 5 1 0效能指示器策略堆疊 41 200422942 5 12,5 1 6,520,524 命令 5 24如果大於便設定 5 14,51 8,522,526 效能層 530策略事件處理程式 534任務切換事件 540目標效能計算程式 610增量數值暫存器 630硬體控制模組 646時間基礎暫存器 660控制暫存器 730電源供應控制模組 520忽略 指7F器 5 3 2重置事件 5 3 6效能改變事件 600計數器 620軟體控制模組 640累加器模組 650,720即時計時器 710 CPU(中央處理單元)Workpresent technique Remaining = W〇rkLongRun- w〇rkPresent technique 37 200422942 Second, the algorithm calculates the extent to which the length of the interactive event needs to be extended-assuming that the algorithm of this technology continues at its predicted speed operation value until it reaches an emergency Critical, and then run at full speed. The statistics are also adjusted independently. It is found that the results of using this technology are very similar to the results we observed on similar workloads (same performance test program but slightly different interactive load) to actively control the processor based on the algorithm of this technology. However, when the algorithm according to the present technology is actually responsible for control, the number of performance setting choices is reduced and the performance level is more accurate. Figure 13 shows the statistics collected using the time offset correction technique described above. The six diagrams in the figure all include two stacked sheds. The left-hand column on each watch is related to LongRun and the right-hand column is related to this technique. Each block is stacked to represent the proportion of time spent in interactive events at the four performance levels supported by the computer. These performance levels range from 300 Mhz from the bottom to the top, with an increase of 100 Mhz to 600 Mhz. Even from a high level, the algorithm obviously based on this technology consumes more time at the lower performance level than LongRun. In some performance testing programs, such as Emacs, there is almost no need to run quickly and meet the deadline for the interaction when the machine stays at its lowest possible performance level. At the other end of the map is the Acrobat Reader performance test program, which shows the behavior of two statistical methods: the processor runs at either the peak level or the minimum level. Even in this performance test program, many interactive events can be completed on time at the minimum performance level of the processor. However, when it comes to calculation pages, 38 200422942 the peak performance level of the processor is still not sufficient at the user's perception threshold. Completed within the deadline. Therefore, when a sufficiently long interactive event is encountered, the algorithm according to the present technology switches the processor performance level to a peak value. By contrast, during the execution of the Konqueror performance measurement program, algorithms based on this technology can utilize all four available levels of performance of the processor. • This * can be compared to the LongRun strategy, which consumes the processor more than half of its peak level. In summary, the simulation results detailed above with reference to Figures 8 to 13 have shown the performance of two performance setting strategies implemented in different levels of the software hierarchy on multi-media and interactive workloads. It has been found that the Transmeta LongRun power management program implemented in the processor body takes more conservative decisions than the algorithms implemented in the core program of the operating system according to this technology. The results achieved by the algorithm based on this technology in a set of multimedia performance test programs are 11% to 35% lower than the average performance level achieved using known LongRun technology. Since the performance setting algorithm based on this technology is higher in the software stack than LongRun, it can make decisions based on a richer set of runtime information, which translates to higher accuracy. Although it has been shown that LongRun's firmware is more inaccurate than an algorithm implemented in the core program, it does not reduce its practicality. LongRun has important advantages as an unknown operating system. It is understood that the gap between low-level and high-level implementations can be bridged by providing a basic performance setting algorithm such as LongRun in tough ships and exposing an operating system to optimize processor performance setting selection. The performance measurement hierarchy based on this technology provides a mechanism to support this design. The minimum setting strategy on the stack can be practically implemented in the processor's firmware. Although the illustrative embodiments of the present invention have been described in detail and taken into consideration the related drawings, it should be understood that the present invention is not limited to those precise examples, and that those skilled in the art will not deviate from the scope of the present invention as defined by the accompanying patent application. Many changes can be realized within the spirit. [Simplified illustration of the figure] Fig. 1 schematically illustrates a power management system according to one of the techniques, such as being implemented in a data processing system; Fig. 2 schematically illustrates the technique according to the present technique. The hierarchy of the performance setting algorithm. The third circle outlines the setting of processor performance in an interactive event. Figure 4 outlines the execution of a workload on the processor. Task A calculates the utilization history pane. Figure 5 outlines the implementation of the three-tier system efficiency strategy in Figure 2; Figure 6 outlines the task tracking counter according to one of the techniques; Figure 7 outlines a method that can provide numbers based on workload characteristics Equipment at a fixed performance level; Figure 8 is a table detailing the implementation of the performance accompanying parameters for playing various MPEG video calculations. Tuning can be performed at three levels and a stack of 600; a number of different signals. 20042004942942 Analog measurement results of the 'plaympeg' video player; Figure 9 is a table 'which lists the execution of each workload Statistics of processor performance levels during the period; Round 10 contains two charts showing the results of playing two different MPEG movies called 'Legendary, (Figure 10A) and' Danse de Cable '(Figure 10B); Figures 11A, 11B, and 11C outline the characteristics of two different performance setting strategies. Figures 12A, 12B, and 12C outline the simulation results of testing different performance setting algorithms on interactive workloads. Figure 13 outlines Explain the statistical data collected using a time deviation correction technique. [Simple description of component representative symbols] 1 00 core program 11 0 standard core program functionality 1 12, 132 system call module 114 scheduling program 1 1 6 conventional power management program 1 20 smart energy management system 122 strategy coordination program 124 performance Configuration control module 126 Event tracking 130 User processing layer 134 Job management module 136 Application-specific data 140 Application monitoring module 210 Interactive application performance indicator 220 Specific application performance indicator 230 Task-based processor Use the performance indicator 5 1 0 Performance indicator strategy stack 41 200422942 5 12,5 1 6,520,524 Command 5 24 If greater than set 5, 14,51 8,522,526 Performance layer 530 Strategy event handler 534 Task switching event 540 Target performance calculation program 610 Increase Value register 630 hardware control module 646 time base register 660 control register 730 power supply control module 520 ignore finger 7F device 5 3 2 reset event 5 3 6 performance change event 600 counter 620 software control Module 640 Accumulator Module 650, 720 Real-time timer 710 CPU (Central Processing Unit)

4242

Claims (1)

422942 拾、申請專利範圍" i 一種從一處理器於執行多種處理任務的一利用史中來 β十算該處理器之一目標處理器效能層級的方法,該方法 至少包含以下步驟: 十算任務工作值’其係指出在一預定任務時間間隔内 執行一特定處理任務的處理器利用;及 依據上述任務工作值,計算上述目標處理器效能層級。 2·如申請專利範圍第1項所述之方法,其至少包含計算多 個任務工作值,其係對應於上述特定處理任務的各自多 個先前執行,並結合上述多個任務工作值以對於上述特 定處理任務之一未來的執行計算上述目標處理器效能 層級。 3.如申請專利範圍第2項所述之方法,其中上述預定任務 時間間隔對於上述多個處理任務的每^一者是被獨立地 設定。 4·如申請專利範圍第3項所述之方法,其中上述預定任務 時間間隔對於上述特定處理任務的每個執行是被獨立 地設定。 5·如申請專利範圍第4項所述之方法,其中上述預定任務 時間間隔係從上述特定處理任務之一第一排程的開始 延伸至上述特定任務之一後續排程的開始之一時期 間,上述預定任務時間間隔係關聯於上述第一排程。 6.如申請專利範圍第2項所述之方法,其中使對應於該特 43 200422942 定處理任#之先前執行的上述多個任務工作 合以對於上述特定處理任務計算一指數式衰 作完成值。 7·如申請專利範圍第1項所述之方法,其至少包 預定任務時間間隔内偵測一閒置時間持續值, 述任務工作值與上述閒置時間持續值對於上 理任務计算一任務執行截止期限。 8.如申”耷專利範圍第7項所述之方法,其中上述 截止期限係對於上述特定處理任務之多個先 每一者來被計算,並將多個任務執行截止期限 算一指數式衰減平均任務執行截止期限值。 9·如申請專利範圍第7項所述之方法,其中依據 處理任務之上述指數式衰減平均工作完成值 &數式农減平均任務執行截止期限值來計算 定處理任務的上述目標處理器效能層級。 1〇·如申請專利範圍第11項所述之方法,更包含] 在上述特定處理任務的處理期間中,偵測至少 行期’上述至少一暫停執行期代表在完成上述 前自上述特定處理任務切換至更進一步且不 任務所歷經的一時期:及 對於該特定處理任務計算上述任務工作值,因 至少一暫停執行期的期間包括處理器利用。 又1·如申請專利範圍第1〇項所述之方法,其至少 值加以結 減平均工 含在上述 並依據上 述特定處 任務執行 前執行之 結合以計 對應於該 以及上述 關於該特 「列步驟: 一暫停執 第一任務 同之處理 此在上述 包含於上 44 200422942 述預定任務時間間隔内設定一上限界,因而如果上述特 定處理任務繼續執行而未於一大於或等於該上限界的 -持續期間中姨測到上述暫停執行期,則自動重新計算 該任務的上述目標處理器效能層級。 1 2.如申請專利範圍第 广… 方其中對每個任務擁 y,若該對應的任務已開始執行但 成執 行’則上述旗標值顯示。 13:申…範圍第2項所述之方法,其中當结合上述任 2Γ值以計算上述任務之上述未來執行的上述目標 處理器效能廣級時,藉由一對應的 此U 上 试糈時間間隔而 述特疋處理任務之-各自先前執行的每個任務工 作值加以正規化。 % -載著-電腦程式用於控制__電猫之電滕程式產品,其 從一處理器於執行多種處理任務的一利用史中來計算 =理器之一目標處理器效能層級,上述電猫程式至少 任務工作值計算瑪’其係可操作於計算—任務工作值, 该任務工作值指出在一預定任務時間間隔内執行一特 定處理任務之處理器利用;及 目標處理器效能計算碼,其係可操作於依據上述任務工 作值計算上述目標處理器效能層級。 15.如申請專利範圍第14項所述之一電堪程式產品,其中 上述任務工作值計算瑪計算多個任務工作值,其係對應 45 200422942 於各自多個上述特定處理任務之先前執行,並結合上述 多個任務工作值以計算關於上述特定處理任務之一未 來執行的上述目標處理器效能層級。 16·如申請專利範圍第15項所述之一電腦程式產品,其中 上述預定任務時間間隔對於每個上述多個處理任務係 為被獨立地設定。 17. 如申清專利範圍第16項所述之一電腦程式產品,其中422942 Patent application scope " i A method for calculating a target processor performance level of one of the processors from a history of a processor performing a variety of processing tasks in a utilization history, the method includes at least the following steps: "Task work value" refers to the utilization of a processor that executes a specific processing task within a predetermined task time interval; and calculates the target processor performance level based on the task work value. 2. The method as described in item 1 of the scope of patent application, which at least includes calculating a plurality of task work values, which are corresponding to a plurality of previous executions of the specific processing tasks described above, and combining the plurality of task work values to the above The future execution of one of the specific processing tasks calculates the target processor performance level described above. 3. The method according to item 2 of the scope of patent application, wherein the predetermined task time interval is set independently for each of the plurality of processing tasks. 4. The method according to item 3 of the scope of patent application, wherein the predetermined task time interval is set independently for each execution of the specific processing task. 5. The method according to item 4 of the scope of patent application, wherein the predetermined task time interval extends from the beginning of the first schedule of one of the specific processing tasks to the beginning of one of the subsequent schedules of the specific task, The predetermined task time interval is associated with the first schedule. 6. The method according to item 2 of the scope of patent application, wherein the previously performed multiple task tasks corresponding to the special 43 200422942 scheduled processing task # are combined to calculate an exponential decay completion value for the specific processing task. . 7. The method according to item 1 of the scope of the patent application, which includes detecting at least a duration of idle time within a predetermined task interval, and the task working value and the duration of the idle time are used to calculate a task execution deadline for the upper task. . 8. The method as described in claim 7 of the patent scope, wherein the deadline is calculated for each of a plurality of each of the above-mentioned specific processing tasks, and the multiple task execution deadlines are calculated as an exponential decay The average task execution deadline. 9. The method as described in item 7 of the scope of the patent application, wherein the fixed processing is calculated based on the above-mentioned exponential decay average work completion value of the processing task & numerical agricultural reduction average task execution deadline. The above-mentioned target processor performance level of the task. 10. The method as described in item 11 of the scope of patent application, further comprising] during the processing period of the above-mentioned specific processing task, detecting at least a line period 'the at least one suspended execution period representative The period during which the specific processing task is switched to a further and non-tasking period before the completion of the above: and the calculation of the above-mentioned task work value for this specific processing task, because at least one period of suspended execution period includes processor utilization. The method described in item 10 of the scope of patent application, at least its value is subtracted and the average value is included in the above and based on the above characteristics The combination of execution before the execution of the scheduled task corresponds to this and the above-mentioned special steps: a suspension of the execution of the first task and the same processing. This sets an upper bound within the predetermined task interval described in 44 200422942 above, Therefore, if the specific processing task continues to be executed without detecting the suspended execution period within a duration period greater than or equal to the upper bound, the target processor performance level of the task is automatically recalculated. 1 2. As requested The scope of the patent is the widest ... Each of the tasks has a y, and if the corresponding task has been executed but has been executed, then the above flag value is displayed. 13: The method described in the second item of the scope, where When the value of any 2Γ is used to calculate the above-mentioned target processor performance of the above-mentioned future execution of the above-mentioned task, the processing performance of each task is described by a corresponding time interval on this U-test. To be normalized.% -Carrying- computer programs for controlling __Dianmao's DengTeng program products, which run from a processor to a variety of One of the tasks is to use the history to calculate = one of the processor's target processor performance levels. The above electric cat program calculates at least the task work value calculation function, which is operable to calculate-task work value. The task work value indicates a predetermined The utilization of a processor that executes a specific processing task within a task time interval; and a target processor performance calculation code that is operable to calculate the above-mentioned target processor performance level based on the above-mentioned task work value. One of the products described above, wherein the above-mentioned task work value calculation calculator calculates multiple task work values, which corresponds to the previous execution of 45 200422942 on each of the above specific processing tasks, and combines the above multiple task work values to calculate The above-mentioned target processor performance level for one of the above-mentioned specific processing tasks to be performed in the future. 16. The computer program product according to item 15 of the scope of patent application, wherein the predetermined task time interval is independently set for each of the plurality of processing tasks. 17. A computer program product as described in Item 16 of the Patent Scope, where 上述預定任務時間間隔對於該特定處理任務的每個執 行是被獨立地設定。 18. 如申請專利範圍第17項所述之一電腦程式產品,其中 上述預定任務時間間隔係為從上述特定處理任務之一 第一排程的開始延伸至上述特定任務之一後續排程的 開始的一時期,上述預定任務時間間隔係關聯於上述第 一排程。The above-mentioned predetermined task time interval is set independently for each execution of the specific processing task. 18. The computer program product according to item 17 of the scope of patent application, wherein the predetermined task time interval is extended from the start of the first schedule of one of the specific processing tasks to the start of the subsequent schedule of one of the specific tasks. For a period of time, the predetermined task time interval is associated with the first schedule. 如申請專利範圍第15項所述之一電腦程式產品,其尹 將對應於該特定處理任務之先前執行的上述多個任務 工作值加以結合以計算上述特定處理任務之一指數式 衰減平均工作完成值。 20·如申請專利範圍第14項所述之一電腦程式產品,其至 乂包含可操作於上述預定任務時間間隔内俄測一閒置 時間持續值的偵測碼,並依據上述任務工作值與上述閒 置時間持續值計算上述特定處理任務的一任務執行截 止期限。 46 200422942 2 1·如申請專利範圍第2〇 $所迷之一電腦程式產品,其中 上述任務執行截止期限 ,^ 限係對於該特定處理任務之多個 先前執行的每一者來被+ 0 ^ ^ 卞算,並將多個任務執行截止期 限結合以計算一指數式溱 飞农減平均任務執行截止期限值。 22·如申請專利範圍第2〇 $所述之一電腦程式產品,其中 依據上述指數式衰減平均 巧工作元成值,以及依據對應於 該特定處理任務之上述沪一 兄和數式哀減平均任務執行截止 期限值來計算關於兮· & + i ’疋處理任務的上述目標處理器 效能層級。 23 ·如申請專利範圍第1 4 固乐14項所述之一電腦程式產品,更包 含: 暫停執行期㈣測碼’其係可操作於上述特定處理任務 的處理期間中偵測至少一暫停執行期,上述至少一暫停 執行期代表在完成上述第一任務前自上述特定處理任 務切換至更進一步且不同之處理任務所歷經的一時 期:及 其中上述目標工作值計算碼,其係可操作於對該處理任 務計算上述任務工作值,因而其包括在上述至少一暫停 執行期間中的處理器利用。 24·如申請專利範圍第23項所述之電腦程式產品,其中對 於上述預定任務時間間隔設定一上限界,因而如果該特 定處理任務持續執行而未於一大於或等於該上限界的 期間中偵測到上述暫停執行期,則自動重新計算該任務 47 200422942 的上述目樑處理器效能層級。 25·如申明專利範圍第14項所述之電腦程式產 每個任務儲存一旗標值,若該對應的任務已 尚未完成執行,則上述旗標值顯示❶ 26. 如申清專利軏圍第15項所述之電腦程式產 結合上述任務工作值以計算上述任務之上 的上述目標處理器效能層級時,藉由一對應 時間間隔而將上述特定處理任務之一各自 每個任務工作值加以正規化。 27, 種用於控制一電腦以從一處理器於執行 務的一利用史中來計算該處理器之一目標 層級的設備,上述設備至少包含·· 任務工作值計算邏輯,其係可操作於計算 值’该任務工作值指出在一預定任務時間間 特定處理任務之處理器利用·,及 目標處理器效能計算邏輯,其係可操作於依 作值來計算上述目標處理器效能層級。 28·如申請專利範圍第27項所述之設備,其中 作值計算邏輯計算多個任務工作值,其係對 處理任務之各自多個先前執行,並結合上述 作值以計算關於上述特定處理任務之一未 述目標處理器效能層級。 29.如申請專利範圍第28項所述之設備,其中 品’其中對 開始執行但 品’其中當 述未來執行 的預定任務 先前執行的 多種處理任 處理器效能 一任務工作 隔内執行一 據該任務工 上述任務工 應於該特定 多個任務工 來執行的上 上述預定住 48 422942 務時間間隔對於上述多個處理任務之每一者係為被镯 立地設定。 3〇·如申請專利範圍第29項所述之設備,其中上述預定任 務時間間隔對於該特定處癦任務的每個執行係為被獨 立地設定。 3 U如申請專利範圍第30項所述之設備’其中上述預定任 務時間間隔為自上述特定處理任務之一第一排程的開 始延伸至上述特定任務之一後續排程的開始的一時 期,上述預定任務時間間隔係關聯於上述第一排程❶ 32·如申請專利範圍第28項所述之設備,其中將對應於該 特定處理任務之先前執行的上述多個任務工作值加以 結合以計算上述特定處理任務之一指數式衰減平均工 作完成值。 •如申請專利範圍第28項所述之設備,其至少包含该測 邏輯’其係可操作於上述預定任務時間間隔内偵測一間 置時間持續值,並依據上述任務工作值與上述間置時間 持續值來計算上述特定處理任務的一任務執行截止期 限。 ,, π"丨W具干上述任務 •截止期限係對於上述特定處理任務之多個先前執 者來被計算,並將多個任務執行截止期限結合 計算一指數式衰減平均任務執行截止期限值。 3 5 ·如申請專利範圍第3 3 項所逑之設備,其_依據對應: 49 上述處理饪務 上述指數述指數式衰減平均工作完成值以及 上述特〜式哀減平均任務執行截止期限值來計算關於 36·如申浐:處理任務的上述目標處理器效能層級。 Μ範圍第28項所述之設備,更包含: 亭執行期間债測 務的處理期門中㈤’其糸可操作於上述特定處理任 停執行如 測至少一暫停執行期,上述至少-暫 任=代表在完成上述第-任務前自上述特定處理 期:及、至更進—步且不同之處理任務所歷經的-時 處 豸目孝*工作值計算邏輯,其係可操作於計算上述 任務的任務工作值,因而其包括在上述至少一暫停 執行期間中的處理器利用。 〜申清專利範圍第36項所述之設備,其令對於上述預 疋任務時間間隔設定一上限界,因而如果該特定處理任 務持續執行而未於一大於或等於該上限界的期間中摘 測到上述暫停勃# β,.. 爭執仃期,則自動重新計算該任務的上述目 標處理器效能層級。 38·如申請專利範圍帛28項所述之設備,其中對每個任務 儲存-旗標冑,若該對應的任務已開始執行但尚未完成 執行,則上述旗標值顯示。 如申清專利範圍第28項所述之設備,其中當結合上述 務作值以叶算上述任務之上述未來執行的上述目 標處理器效能層級時,藉由一對應的預定任務時間間隔 50 200422942 而將上述特定處理任務之一各自先前執行的每個任務 工作值加以正規化。 51As one of the computer program products described in item 15 of the scope of patent application, Yin combines the previously performed values of the multiple tasks corresponding to the specific processing task to calculate an exponential decay average work completion for one of the specific processing tasks. value. 20. The computer program product as described in item 14 of the scope of the patent application, which includes a detection code that can be used to measure a continuous value of the idle time within the predetermined task time interval, and is based on the task work value and the above. The idle time duration value calculates a task execution deadline for the specific processing task. 46 200422942 2 1. As one of the computer program products covered by the patent application scope of 20 $, in which the above-mentioned task execution deadline is limited to +0 ^ for each of a plurality of previous executions of the specific processing task. ^ Calculate and combine multiple task execution deadlines to calculate an exponential Fei Nong minus average task execution deadline. 22. A computer program product as described in the scope of application for patent No. 20, wherein the average exponential decay is based on the exponential decay, and the average reduction is based on the above-mentioned Shanghai Yihe sum formula corresponding to the specific processing task. The task execution deadline value is used to calculate the above-mentioned target processor performance level for the Xi & + i '疋 processing task. 23 · As one of the computer program products described in item No. 14 of the scope of application for Gule 14, it further includes: Suspended execution period and code measurement, which is capable of detecting at least one suspended execution during the processing period of the specific processing task mentioned above. Period, the at least one suspended execution period represents a period that has elapsed before switching from the specific processing task to a further and different processing task before completing the first task: and the above-mentioned target working value calculation code, which can be operated on the The processing task calculates the above-mentioned task work value, and thus it includes processor utilization during the at least one suspended execution period. 24. The computer program product as described in item 23 of the scope of patent application, wherein an upper bound is set for the predetermined task time interval, so if the specific processing task is continuously performed without detecting in a period greater than or equal to the upper bound When the above suspended execution period is detected, the above-mentioned eyepiece processor performance level of the task 47 200422942 is automatically recalculated. 25. The computer program product described in item 14 of the declared patent scope stores a flag value for each task. If the corresponding task has not been completed, the above flag value is displayed. 26. When the computer program described in 15 items is combined with the task task value to calculate the target processor performance level above the task, each task task value of one of the specific processing tasks is normalized by a corresponding time interval. Into. 27. A device for controlling a computer to calculate a target level of the processor from a utilization history of a processor in the execution of the task. The device at least contains task task calculation logic, which is operable at Calculated value 'The task work value indicates the processor utilization of a specific processing task during a predetermined task time, and the target processor performance calculation logic, which is operable to calculate the target processor performance level according to the value. 28. The device as described in item 27 of the scope of patent application, wherein the value calculation logic calculates a plurality of task work values, which are respectively previously executed on the respective processing tasks, and combine the above-mentioned values to calculate the above specific processing tasks. One does not describe the target processor performance level. 29. The device as described in item 28 of the scope of patent application, wherein the product 'wherein the execution is started but the product' wherein the scheduled tasks to be performed in the future are previously executed multiple processing tasks are performed by a processor within a task interval The task worker should perform the above-mentioned scheduled 48 422942 service time interval for the specific plurality of task workers to be set for each of the plurality of processing tasks. 30. The device as described in item 29 of the scope of patent application, wherein the predetermined task time interval is independently set for each execution of the specific processing task. 3 U The device according to item 30 of the scope of patent application, wherein the predetermined task time interval is a period extending from the start of the first schedule of one of the specific processing tasks to the start of the subsequent schedule of one of the specific tasks. The scheduled task time interval is related to the above-mentioned first schedule · 32. The device as described in item 28 of the scope of the patent application, wherein the previously-executed values of the plurality of tasks corresponding to the specific processing task are combined to calculate the above-mentioned One of the specific processing tasks exponentially decays the average work completion value. • The device as described in item 28 of the scope of patent application, which at least contains the test logic, which is operable to detect a duration of an intervening time within the above-mentioned predetermined task interval, and according to the above-mentioned task operating value and the above-mentioned interim The time duration value is used to calculate a task execution deadline for the specific processing task. , Π " 丨 Waiting the above tasks • Deadline is calculated for a plurality of previous executives of the above specific processing task, and the multiple task execution deadlines are combined to calculate an exponential decay average task execution deadline value. 3 5 · As for the equipment listed in Item 33 of the scope of the patent application, its _ basis is corresponding to: 49 The above-mentioned processing cooking service, the above-mentioned index exponentially decaying average work completion value, and the above-mentioned special-type reducing average task execution deadline value Calculate the above-mentioned target processor performance level on 36 · Rushen: processing tasks. The equipment described in item 28 of the M range further includes: The processing period of the debt measurement service during the execution of the kiosk: 'It can be operated in the above-mentioned specific processing, and the execution is suspended. If at least one suspended execution period is measured, the at least-temporary = Represents the above-mentioned specific processing period from the above-mentioned specific processing period before completing the above-mentioned task: and, to go further-step by step and different processing tasks-time and place filial piety * working value calculation logic, which is operable to calculate the above tasks The task's work value, so it includes processor utilization during the at least one suspended execution period. ~ The device described in item 36 of the patent scope, which sets an upper bound for the above-mentioned pre-task time interval, so if the specific processing task is continuously performed without taking a test within a period greater than or equal to the upper bound When the above-mentioned pause # β, .. dispute period, the above-mentioned target processor performance level of the task is automatically recalculated. 38. The device as described in the scope of application patent (28 items), wherein a flag is stored for each task. If the corresponding task has been started but not completed, the above flag value is displayed. The device as described in claim 28 of the patent scope, wherein when the above task value is combined to calculate the target processor performance level of the future execution of the above task, a corresponding predetermined task interval 50 200422942 is used. Normalize the work value of each task previously performed by each of the specific processing tasks described above. 51
TW092131595A 2002-11-12 2003-11-11 Performance level setting of a data processing system TW200422942A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
GB0226395A GB0226395D0 (en) 2002-11-12 2002-11-12 Automatic performance setting
GB0228546A GB0228546D0 (en) 2002-12-06 2002-12-06 Performance level setting of a data processing system
GB0305442A GB2402504A (en) 2002-11-12 2003-03-10 Processor performance calculation

Publications (1)

Publication Number Publication Date
TW200422942A true TW200422942A (en) 2004-11-01

Family

ID=26247125

Family Applications (1)

Application Number Title Priority Date Filing Date
TW092131595A TW200422942A (en) 2002-11-12 2003-11-11 Performance level setting of a data processing system

Country Status (2)

Country Link
GB (1) GB2402504A (en)
TW (1) TW200422942A (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4700969B2 (en) * 2005-01-06 2011-06-15 富士通株式会社 Monitoring information providing apparatus, monitoring information providing method, and monitoring information providing program
WO2007112781A1 (en) * 2006-04-04 2007-10-11 Freescale Semiconductor, Inc. Electronic apparatus and method of conserving energy
US7783906B2 (en) * 2007-02-15 2010-08-24 International Business Machines Corporation Maximum power usage setting for computing device
JP5547718B2 (en) * 2008-05-13 2014-07-16 スイノプスイス インコーポレーテッド Power manager and power management method
JP2009301500A (en) * 2008-06-17 2009-12-24 Nec Electronics Corp Task processing system and task processing method
CN101893927B (en) * 2009-05-22 2012-12-19 中兴通讯股份有限公司 Hand-held device power consumption management method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1182556B1 (en) * 2000-08-21 2009-08-19 Texas Instruments France Task based adaptive profiling and debugging
US7448025B2 (en) * 2000-12-29 2008-11-04 Intel Corporation Qualification of event detection by thread ID and thread privilege level

Also Published As

Publication number Publication date
GB0305442D0 (en) 2003-04-16
GB2402504A (en) 2004-12-08

Similar Documents

Publication Publication Date Title
US7512820B2 (en) Performance level selection in a data processing system by combining a plurality of performance requests
US7194385B2 (en) Performance level setting of a data processing system
US7321942B2 (en) Performance counter for adding variable work increment value that is dependent upon clock frequency
Flautner et al. Vertigo: Automatic performance-setting for linux
Benini et al. Monitoring system activity for OS-directed dynamic power management
US8364997B2 (en) Virtual-CPU based frequency and voltage scaling
US6986068B2 (en) Arithmetic processing system and arithmetic processing control method, task management system and task management method
Li et al. Performance directed energy management for main memory and disks
US8181047B2 (en) Apparatus and method for controlling power management by comparing tick idle time data to power management state resume time data
US8452999B2 (en) Performance estimation for adjusting processor parameter to execute a task taking account of resource available task inactive period
US20020007387A1 (en) Dynamically variable idle time thread scheduling
US20110010713A1 (en) Computer system, virtual machine monitor and scheduling method for virtual machine monitor
US20070074219A1 (en) Dynamically Variable Idle Time Thread Scheduling
GB2445167A (en) Managing performance of a processor
Sahin et al. MAESTRO: Autonomous QoS management for mobile applications under thermal constraints
Lorch et al. Operating system modifications for task-based speed and voltage
TW200422942A (en) Performance level setting of a data processing system
Liu et al. Chameleon: application level power management with performance isolation
GB2395309A (en) Performance level selection in a data processing system
Bi et al. {IAMEM}:{Interaction-Aware} Memory Energy Management
Gurun et al. Autodvs: an automatic, general-purpose, dynamic clock scheduling system for hand-held devices
Palopoli et al. Legacy real-time applications in a reservation-based system
GB2395310A (en) Data processing system performance counter
Kim et al. Power-Aware Resource Management Techniques for Low-Power Embedded Systems.
Wiedenhoft et al. Power management in the EPOS system