TWI502333B - Heterogeneous multiprocessor design for power-efficient and area-efficient computing - Google Patents

Heterogeneous multiprocessor design for power-efficient and area-efficient computing Download PDF

Info

Publication number
TWI502333B
TWI502333B TW102127477A TW102127477A TWI502333B TW I502333 B TWI502333 B TW I502333B TW 102127477 A TW102127477 A TW 102127477A TW 102127477 A TW102127477 A TW 102127477A TW I502333 B TWI502333 B TW I502333B
Authority
TW
Taiwan
Prior art keywords
core
new
cores
workload
determining
Prior art date
Application number
TW102127477A
Other languages
Chinese (zh)
Other versions
TW201418972A (en
Inventor
Gary D Hicok
Matthew Raymond Longnecker
Rahul Gautam Patel
Original Assignee
Nvidia Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/723,995 external-priority patent/US9569279B2/en
Application filed by Nvidia Corp filed Critical Nvidia Corp
Publication of TW201418972A publication Critical patent/TW201418972A/en
Application granted granted Critical
Publication of TWI502333B publication Critical patent/TWI502333B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5094Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5022Workload threshold
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Description

用於節省功率和節省面積計算的異構性多處理器設計Heterogeneous multiprocessor design for power savings and area savings calculations

本發明大體上關於多處理器電腦系統,且更明確地說,關於一種用於節省功率和節省面積計算的異構性多處理器設計。The present invention relates generally to multiprocessor computer systems and, more particularly, to a heterogeneous multiprocessor design for power saving and area saving calculations.

電池供電式行動計算平台近年來變得越來越重要,因而提高以低成本實現可高度調整計算能力之有效、低功率系統的需求。典型的行動裝置可能需要根據工作負荷(workload)需求而操作在廣泛的效能範圍中。不同的效能範圍習知會映對至不同的操作模式,功率消耗成比例相關於一給定操作模式內的效能。在低功率休眠模式中,行動裝置可提供小額的計算能力,例如,和蜂巢式基地台保持無線電聯繫。在主動模式中,行動裝置可針對使用者輸入提供低等待時間響應,舉例來說,透過視窗管理程式(window manager)。與典型應用相關聯的許多操作係在主動模式中以令人滿意的效能來執行。在高效能模式中,行動裝置需要提供尖峰計算能力,例如,用以執行即時遊戲或是實施短暫的使用者介面操作。主動模式和高效能模式通常需要日益增加的功率消耗。Battery-powered mobile computing platforms have become increasingly important in recent years, thereby increasing the need for efficient, low-power systems that enable highly scalable computing at low cost. A typical mobile device may need to operate in a wide range of performance depending on workload requirements. Different performance ranges are known to be different operating modes, and power consumption is proportionally related to performance within a given mode of operation. In low power sleep mode, the mobile device can provide a small amount of computing power, for example, to maintain radio contact with the cellular base station. In active mode, the mobile device can provide a low latency response to user input, for example, through a window manager. Many of the operations associated with a typical application are performed with satisfactory performance in an active mode. In the high-performance mode, the mobile device needs to provide peak computing capabilities, for example, to perform an instant game or to implement a short user interface operation. Active mode and high performance mode typically require ever-increasing power consumption.

數種技術已被開發用以改善行動裝置的效能和功率效率。此等技術包括藉由縮小裝置尺寸來降低裝置寄生負載、降低操作電壓與臨界電壓、以效能抵換功率效率以及加入被調整成用以在特定操作模式下良好操作的不同的電路配置。Several techniques have been developed to improve the performance and power efficiency of mobile devices. Such techniques include reducing device parasitic loading, reducing operating voltage and threshold voltage, reducing power efficiency with efficiency, and adding different circuit configurations that are tailored to operate well in a particular mode of operation by reducing device size.

於其中一範例中,行動裝置處理器複合體包括一低功率低效能處理器以及一高效能高功率處理器。在閒置和低作用主動模式中,該低功率處理器在較低效能位準處較省電,所以會被選來執行;而在高效能模式中,該高效能處理器較省電,所以會被選來執行較大的工作負荷。於此 情況中,抵換空間包括成本組成,因為該行動裝置載有兩個處理器組成的成本負擔,而每一次僅一個處理器會有作用。當此處理器複合體致能低功率操作和高效能操作時,該處理器複合體會以效率不彰的方式使用昂貴的資源。In one example, the mobile device processor complex includes a low power low performance processor and a high performance high power processor. In the idle and low-active active mode, the low-power processor is more power-efficient at lower performance levels, so it is selected for execution; in high-performance mode, the high-performance processor is more power efficient, so Selected to perform a larger workload. herein In the case, the swap space includes the cost component because the mobile device carries the cost burden of two processors, and only one processor at a time has an effect. When this processor complex enables low power operation and high efficiency operation, the processor complex uses expensive resources in an inefficient manner.

如前面解釋,本技術中需要一種更有效的技術以適應於各種範圍的不同工作負荷。As explained earlier, there is a need in the art for a more efficient technique to accommodate a wide range of different workloads.

本發明的一實施例提出一種用於配置一處理單元內一或多個核心以便執行不同工作負荷的方法,該方法包括:接收和一新工作負荷有關的資訊;以該資訊為基礎來判斷該新工作負荷不同於目前工作負荷;以該資訊為基礎來判斷該等一或多個核心中有多少個應該被配置用以執行該新工作負荷;以該等一或多個核心中有多少個應該被配置用以執行該新工作負荷為基礎來判斷是否需要一新核心配置;以及倘若需要新核心配置的話,那麼便將該處理單元轉變為該新核心配置,或者,倘若不需要新核心配置的話,那麼,便保持目前核心配置來執行該新工作負荷。An embodiment of the present invention provides a method for configuring one or more cores in a processing unit to perform different workloads, the method comprising: receiving information related to a new workload; determining the basis based on the information The new workload is different from the current workload; based on the information, it is determined how many of the one or more cores should be configured to execute the new workload; how many of the one or more cores Should be configured to perform this new workload as a basis to determine if a new core configuration is required; and if a new core configuration is required, then the processing unit is converted to the new core configuration, or if a new core configuration is not required Then, then keep the current core configuration to perform the new workload.

本發明的其它實施例包括,但不限於,一種包括指令的電腦可讀取儲存媒體,當該等指令被一處理單元執行時會導致該處理單元實施本文中所述之技術;以及一種計算裝置,其包括一被配置成用以實施本文中所述技術的處理單元。Other embodiments of the invention include, but are not limited to, a computer readable storage medium including instructions that, when executed by a processing unit, cause the processing unit to perform the techniques described herein; and a computing device It includes a processing unit configured to implement the techniques described herein.

已揭技術的一項優點係可有利地在廣泛工作負荷範圍中改善多核心中央處理單元的功率效率,同時有效地運用處理資源。One advantage of the disclosed technology is that it can advantageously improve the power efficiency of a multi-core central processing unit over a wide range of workloads while efficiently utilizing processing resources.

100‧‧‧電腦系統100‧‧‧ computer system

102‧‧‧中央處理單元102‧‧‧Central Processing Unit

104‧‧‧系統記憶體104‧‧‧System Memory

105‧‧‧記憶體橋接器105‧‧‧Memory Bridge

106‧‧‧通訊路徑106‧‧‧Communication path

107‧‧‧輸入/輸出(I/O)橋107‧‧‧Input/Output (I/O) Bridge

108‧‧‧使用者輸入裝置108‧‧‧User input device

110‧‧‧顯示裝置110‧‧‧ display device

112‧‧‧平行處理子系統112‧‧‧Parallel Processing Subsystem

113‧‧‧第二通訊路徑113‧‧‧Second communication path

114‧‧‧系統碟114‧‧‧System Disc

116‧‧‧切換器116‧‧‧Switcher

118‧‧‧網路轉接器118‧‧‧Network Adapter

120‧‧‧外插卡120‧‧‧Extension card

140(0)‧‧‧第一處理器核心/低功率核心140(0)‧‧‧First Processor Core/Low Power Core

140(N)‧‧‧第二處理器核心/高效能核心140(N)‧‧‧Second Processor Core/High Performance Core

150‧‧‧作業系統核150‧‧‧Operating system core

152‧‧‧排程器152‧‧‧ Scheduler

154‧‧‧裝置驅動器154‧‧‧ device driver

156‧‧‧裝置驅動器156‧‧‧ device driver

210(0)‧‧‧VF域210(0)‧‧‧VF domain

210(N)‧‧‧VF域210(N)‧‧‧VF domain

212‧‧‧可程式化虛擬辨識符(ID)212‧‧‧Programmable virtual identifier (ID)

212(0)‧‧‧未說明212(0)‧‧‧Unspecified

212(N)‧‧‧未說明212(N)‧‧‧Unspecified

220‧‧‧核心互連線220‧‧‧core interconnects

222‧‧‧快取222‧‧‧ cache

224‧‧‧記憶體介面224‧‧‧ memory interface

226‧‧‧中斷分配器226‧‧‧ interrupt distributor

230‧‧‧叢集控制單元230‧‧‧ Cluster Control Unit

310‧‧‧總處理量310‧‧‧ total throughput

312‧‧‧功率312‧‧‧ Power

314‧‧‧低功率核心140(0)的最大總處理量314‧‧‧Maximum total throughput of low power core 140(0)

316‧‧‧高效能核心140(N)的最大總處理量316‧‧‧Maximum total throughput of high performance core 140(N)

320‧‧‧功率曲線320‧‧‧Power curve

322‧‧‧功率曲線322‧‧‧Power curve

324‧‧‧功率曲線324‧‧‧Power curve

330‧‧‧低功率核心區330‧‧‧Low power core area

332‧‧‧高效能核心區332‧‧‧High-performance core area

334‧‧‧雙核心區334‧‧‧Double core area

400‧‧‧方法400‧‧‧ method

410-490‧‧‧步驟410-490‧‧‧Steps

為能夠詳細瞭解本發明的上述特點,本文已參考實施例更明確說明上面簡單摘要已說明之本發明,某些實施例圖解在隨附圖式中。然而,應該注意的係,隨附圖式僅圖解本發明的典型實施例,所以,不應被視為限制本發明的範疇,本發明承認其它等效實施例。The invention has been described in detail with reference to the preferred embodiments of the invention, However, the present invention is intended to be limited only by the exemplary embodiments of the invention.

第一圖所示的係被配置成用以施行本發明之一或多項態樣的電腦系統之方塊圖; 第二圖所示的係根據本發明一實施例之第一圖的電腦系統的中央處理單元(CPU,Central Processing Unit)之方塊圖;第三圖所示的係根據本發明一實施例之包括多核心的CPU的不同操作區;以及第四圖所示的係根據本發明一實施例之用以將包括多核心的CPU配置成操作在省電區內的方法步驟之流程圖。The first diagram is a block diagram of a computer system configured to perform one or more aspects of the present invention; 2 is a block diagram of a central processing unit (CPU) of a computer system according to a first embodiment of the present invention; and the third diagram is included according to an embodiment of the present invention. The different operating areas of the multi-core CPU; and the fourth diagram is a flow diagram of method steps for configuring a CPU including multiple cores to operate in a power saving zone, in accordance with an embodiment of the present invention.

在下面說明中提出許多明確細節,以便更透徹瞭解本發明。然而,熟習本技術的人士便會明白,即使沒有此等明確細節中的一或多者仍可實行本發明。In the following description, numerous specific details are set forth in order to provide a better understanding of the invention. However, it will be apparent to those skilled in the art that the present invention may be practiced without one or more of the precise details.

系統概述System Overview

第一圖所示的係被配置成用以施行本發明之一或多項態樣的電腦系統100之方塊圖。電腦系統100包括一中央處理單元(CPU,Central Processing Unit)102以及一系統記憶體104,它們透過一互連路徑進行通訊,該互連路徑可能包括一記憶體橋接器105。記憶體橋接器105(舉例來說,可能係北橋晶片)會透過一匯流排或其它通訊路徑106(舉例來說,HyperTransport連結線)被連接至一輸入/輸出(I/O,Input/output)橋107。舉例來說,I/O橋107可能係南橋晶片,其會從一或多個使用者輸入裝置108(舉例來說,鍵盤、指標裝置、電容式觸碰平板)處接收使用者輸入並且透過通訊路徑106與記憶體橋接器105將該輸入前傳至CPU 102。一平行處理子系統112會透過一匯流排或第二通訊路徑113(舉例來說,周邊元件互連(PCI,Peripheral Component Interconnect)、加速圖形埠或是HyperTransport連結線)被耦合至記憶體橋接器105。於一實施例中,平行處理子系統112係一圖形子系統,其會傳送像素至一顯示裝置110,顯示裝置110可能係任何習知陰極射線管、液晶顯示器、發光二極體顯示器或是類似物。一系統碟114同樣被連接至一I/O橋107,並且可能被配置成用以儲存供CPU 102與平行處理子系統112使用的內容以及應用程式和資料。系統碟114用以非揮發性儲存應用程式和資料,並且可能包括固定式或抽取式硬碟機、快 閃記憶體裝置以及唯讀記憶光碟(CD-ROM,Compact Disc Read Only Memory)、唯讀記憶數位多功能碟(DVD-ROM,Digital Video Disc Read Only Memory)、藍光、高畫質DVD(HD-DVD,High Definition DVD)或是其它磁式、光學式或固態儲存裝置。The first figure is a block diagram of a computer system 100 configured to perform one or more aspects of the present invention. The computer system 100 includes a central processing unit (CPU) 102 and a system memory 104 that communicate via an interconnection path, which may include a memory bridge 105. The memory bridge 105 (for example, a north bridge wafer) may be connected to an input/output (I/O, Input/output) via a bus or other communication path 106 (for example, a HyperTransport link). Bridge 107. For example, I/O bridge 107 may be a south bridge wafer that receives user input and communicates from one or more user input devices 108 (eg, keyboard, indicator device, capacitive touch panel) Path 106 and memory bridge 105 pass the input forward to CPU 102. A parallel processing subsystem 112 is coupled to the memory bridge through a bus or second communication path 113 (for example, a Peripheral Component Interconnect (PCI), an accelerated graphics port, or a HyperTransport link). 105. In one embodiment, the parallel processing subsystem 112 is a graphics subsystem that transmits pixels to a display device 110. The display device 110 may be any conventional cathode ray tube, liquid crystal display, light emitting diode display, or the like. Things. A system disk 114 is also coupled to an I/O bridge 107 and may be configured to store content and applications and materials for use by the CPU 102 and the parallel processing subsystem 112. System Disk 114 is used for non-volatile storage of applications and data, and may include fixed or removable hard drives, fast Flash memory device and CD-ROM (Compact Disc Read Only Memory), DVD-ROM (Digital Video Disc Read Only Memory), Blu-ray, high-definition DVD (HD- DVD, High Definition DVD) or other magnetic, optical or solid state storage devices.

一切換器116在I/O橋107和其它器件(例如,網路一轉接器118以及各種外插卡120)之間提供連接。其它器件(圖中並未明確顯示)(包括通用序列匯流排(USB,Universal Serial Bus)或其它連接埠、光碟(CD,Compact Disc)機、數位多功能光碟(DVD,Digital Versatile Disc)機、錄影裝置以及類似物)亦可能被連接至I/O橋107。第一圖中所示的各種通訊路徑(包括有明確名稱的通訊路徑106和113)可利用任何合宜協定來施行,例如,PCI Express、加速圖形埠(AGP,Accelerated Graphics Port)、HyperTransport或是任何其它匯流排或點對點通訊協定,而且不同裝置之間的連接可使用本技術中已知的不同協定。A switch 116 provides a connection between the I/O bridge 107 and other devices (e.g., network one adapter 118 and various add-in cards 120). Other devices (not explicitly shown) (including Universal Serial Bus (USB) or other ports, CD (Compact Disc), Digital Versatile Disc (DVD), Video recording devices and the like may also be connected to the I/O bridge 107. The various communication paths shown in the first figure (including communication paths 106 and 113 with well-defined names) can be implemented using any suitable protocol, such as PCI Express, Accelerated Graphics Port (AGP), HyperTransport, or any Other bus or point-to-point communication protocols, and connections between different devices may use different protocols known in the art.

於一實施例中,平行處理子系統112併入針對圖形和視訊處理最佳化的電路系統,舉例來說,包括視訊輸出電路系統,並且構成一圖形處理單元(GPU,Graphics Processing Unit)。於另一實施例中,平行處理子系統112會併入針對一般用途處理被最佳化的電路系統,同時保留基本的計算架構,本文中有更詳細說明。又,於另一實施例中,平行處理子系統112可能與一或多個其它系統元件整合在單一子系統中,例如,結合記憶體橋接器105、CPU 102以及I/O橋107,用以形成一晶片上系統(SoC,System on Chip)。In one embodiment, parallel processing subsystem 112 incorporates circuitry optimized for graphics and video processing, including, for example, video output circuitry, and constitutes a graphics processing unit (GPU). In another embodiment, the parallel processing subsystem 112 will incorporate circuitry optimized for general purpose processing while preserving the basic computing architecture, as described in more detail herein. Moreover, in another embodiment, the parallel processing subsystem 112 may be integrated with one or more other system components in a single subsystem, for example, in conjunction with the memory bridge 105, the CPU 102, and the I/O bridge 107. A system on chip (SoC) is formed.

應該明白的係,本文中所示的系統為解釋性並且可加以改變與修正。連接拓樸(包括橋接器的數量與排列、CPU 102的數量以及平行處理子系統112的數量)可如所希般加以修正。舉例來說,於某些實施例中,系統記憶體104會直接被連接至CPU 102,而非經由橋接器;而其它裝置則透過記憶體橋105接器及CPU 102來與系統記憶體104進行通訊。於其它替代拓樸中,平行處理子系統112係被連接至I/O橋107或直接被連接至CPU 102,而非被連接至記憶體橋105接器。又,於其它實施例中,I/O橋107和記憶體橋接器105可整合成單一晶片,而非一或多個離散裝置。大部 分實施例可能包括二或多個CPU 102以及二或多個平行處理子系統112。本文中所示的特殊器件為非必要性;舉例來說,任何數量的外插卡或週邊裝置皆可被支援。於某些實施例中,切換器116會被消除,而網路轉接器118以及外插卡120則直接連接至I/O橋107。又,於其它實施例中,電腦系統100包括一行動裝置,而網路轉接器118則施行一數位無線通訊子系統。於此等實施例中,輸入裝置108包括一觸碰平板子系統,而顯示裝置110則施行一行動螢幕子系統,例如,液晶顯示模組。It should be understood that the systems shown herein are illustrative and can be modified and modified. The connection topology (including the number and arrangement of bridges, the number of CPUs 102, and the number of parallel processing subsystems 112) can be modified as desired. For example, in some embodiments, system memory 104 is directly connected to CPU 102 rather than via a bridge; while other devices are coupled to system memory 104 via memory bridge 105 and CPU 102. communication. In other alternative topologies, the parallel processing subsystem 112 is connected to the I/O bridge 107 or directly to the CPU 102 rather than being connected to the memory bridge 105 connector. Again, in other embodiments, I/O bridge 107 and memory bridge 105 can be integrated into a single wafer rather than one or more discrete devices. Most A sub-embodiment may include two or more CPUs 102 and two or more parallel processing subsystems 112. The particular devices shown herein are non-essential; for example, any number of add-in cards or peripheral devices can be supported. In some embodiments, the switch 116 is eliminated and the network adapter 118 and the add-in card 120 are directly connected to the I/O bridge 107. Also, in other embodiments, computer system 100 includes a mobile device and network adapter 118 implements a digital wireless communication subsystem. In these embodiments, the input device 108 includes a touch panel subsystem, and the display device 110 implements a mobile screen subsystem, such as a liquid crystal display module.

CPU 102包括至少兩個處理器核心140(0)、140(N)。一第一處理器核心140(0)被設計成用於低功率操作;而一第二處理器核心140(N)被設計成用於高效能操作。於一實施例中,對稱性數量的低功率處理器核心和高效能處理器核心會被施行在CPU 102裡面。一駐存在系統記憶體104中的作業系統核150包括一排程器152以及裝置驅動器154、156。核150被配置成用以提供特定的習知核服務,包括和處理及執行緒管理有關的服務。排程器152被配置成用以管理指派給CPU 102內不同處理器核心140的執行緒與處理指派作業。裝置驅動器154被配置成用以管理哪一個處理器核心140要被致能使用以及哪一個被禁能,例如,透過降低電力(powering down)。裝置驅動器156被配置成用以管理平行處理子系統112,包括處理及緩衝要被處理的命令與輸入資料串。The CPU 102 includes at least two processor cores 140(0), 140(N). A first processor core 140(0) is designed for low power operation; and a second processor core 140(N) is designed for high efficiency operation. In one embodiment, a symmetrical number of low power processor cores and high performance processor cores are implemented within CPU 102. A work system core 150 resident in system memory 104 includes a scheduler 152 and device drivers 154, 156. The core 150 is configured to provide a particular known core service, including services related to processing and thread management. Scheduler 152 is configured to manage thread and process assignment jobs assigned to different processor cores 140 within CPU 102. Device driver 154 is configured to manage which processor core 140 is to be enabled for use and which is disabled, for example, by powering down. Device driver 156 is configured to manage parallel processing subsystem 112, including processing and buffering commands and input data strings to be processed.

異構性多處理器Heterogeneous multiprocessor

第二圖所示的係根據本發明一實施例之第一圖的電腦系統100的CPU 102之方塊圖。如圖所示,CPU 102包括至少兩個核心140(0)、140(N)、一核心互連線220、一快取222、一記憶體介面224、一中斷分配器226以及一叢集控制單元230。The second figure shows a block diagram of a CPU 102 of a computer system 100 in accordance with a first embodiment of an embodiment of the present invention. As shown, the CPU 102 includes at least two cores 140(0), 140(N), a core interconnect 220, a cache 222, a memory interface 224, an interrupt distributor 226, and a cluster control unit. 230.

每一個核心140可操作在對應的電壓-頻率(VF,Voltage-Frequency)域內,不同於其它VF域。舉例來說,與核心140(0)相關聯的電路系統可操作在與VF域210(0)相關聯的一第一電壓和一第一操作頻率;而與核心140(N)相關聯的電路系統可操作在與VF域210(N)相關聯的一第二電壓和一第二操作頻率。於此範例中,每一個電壓和每一個頻率可在技術上可行的範圍內獨立地改變,用以達到特定的功率與效能目標。Each core 140 is operable within a corresponding Voltage-Frequency (VF) domain, unlike other VF domains. For example, circuitry associated with core 140(0) can operate at a first voltage and a first operating frequency associated with VF domain 210(0); circuitry associated with core 140(N) The system is operable at a second voltage and a second operating frequency associated with the VF domain 210(N). In this example, each voltage and each frequency can be independently varied within a technically feasible range to achieve specific power and performance goals.

於此範例中,核心140(0)被設計成用於低功率操作;而核心140(N)被設計成用於高效能操作,同時保留彼此指令集架構(ISA,Instruction Set Architecture)相容性。核心140(N)可透過任何可應用的技術(例如,針對高時脈速度的電路設計、針對同步送出與處理多重同時指令的邏輯設計以及針對改善的快取尺寸與效能的架構設計)達到較高效能。與核心140(0)相關聯的設計抵換可容許高邊際功率消耗來達到較大邊際執行效能。核心140(0)可透過針對減少漏電流、交叉電流(crossbar current)、寄生損失的電路設計以及針對減少與處理一指令相關聯的切換能量的邏輯設計來達到較低功率操作。與核心140(0)相關聯的設計抵換應該通常偏好減少功率消耗,甚至係犧牲時脈速度與處理效能。In this example, core 140(0) is designed for low power operation; core 140(N) is designed for high performance operation while preserving the compatibility of each other's Instruction Set Architecture (ISA). . The core 140(N) can be compared to any applicable technology (eg, circuit design for high clock speeds, logic design for simultaneous delivery and processing of multiple simultaneous instructions, and architectural design for improved cache size and performance) high efficiency. The design offset associated with core 140(0) can tolerate high marginal power consumption to achieve greater marginal performance. Core 140(0) can achieve lower power operation through circuit design for reducing leakage current, crossbar current, parasitic losses, and logic design for reducing switching energy associated with processing an instruction. Design offsets associated with core 140(0) should generally be preferred to reduce power consumption, even sacrificing clock speed and processing performance.

每一個核心140(0)、140(N)皆包括一可程式化虛擬辨識符(ID)212(0)、212(N),其會辨識該處理器核心。每一個核心140(0)、140(N)皆可透過虛擬ID 212(0)、212(N)以一任意核心辨識符加以程式化,該虛擬ID 212可能與排程器152所保留的一特殊執行緒或處理相關聯。每一個核心140皆可能包括用以促成將內部執行狀態複製到另一核心140的邏輯。Each core 140(0), 140(N) includes a programmable virtual identifier (ID) 212(0), 212(N) that identifies the processor core. Each core 140(0), 140(N) can be programmed with an arbitrary core identifier through virtual IDs 212(0), 212(N), which may be reserved with scheduler 152. Special threads or processes are associated. Each core 140 may include logic to facilitate copying an internal execution state to another core 140.

於一實施例中,核心互連線220將核心140耦合至一快取222,快取222進一步被耦合至一記憶體介面224。核心互連線220可被配置成用以在核心140之間促成狀態複製。中斷分配器226被配置成用以接收一中斷訊號並且傳送該中斷訊號至一由虛擬ID 212裡面已程式化數值所辨識的一適當核心140。舉例來說,一以核心零為目標的中斷會被送往一虛擬ID 212被程式化為零的核心140。In one embodiment, core interconnect 220 couples core 140 to a cache 222, which is further coupled to a memory interface 224. Core interconnects 220 can be configured to facilitate state replication between cores 140. The interrupt dispatcher 226 is configured to receive an interrupt signal and transmit the interrupt signal to a suitable core 140 identified by the programmed value in the virtual ID 212. For example, a core zero-targeted interrupt will be sent to a core 140 whose virtual ID 212 is programmed to zero.

叢集控制單元230會管理每一個核心140的可利用狀態,每一個核心140可單獨被熱插入而變成可利用或是被熱拔出而變成無法利用。在將一指定核心熱拔出之前,叢集控制單元230可能會讓該核心的執行狀態被複製到另一核心,以便繼續執行。舉例來說,倘若執行應該從低功率核心轉變成高效能核心的話,那麼,在該高效能核心開始執行之前,該低功率核心的執行狀態可能被複製到該高效能核心。執行狀態係施行方式專有,並且可能包括,但不限於,暫存器資料、轉譯緩衝器資料以及快取狀態。The cluster control unit 230 manages the available state of each of the cores 140, and each of the cores 140 can be individually hot-plugged to become available or hot-drawn to become unusable. Before the hot swapping of a given core, the cluster control unit 230 may have the execution state of the core copied to another core for continued execution. For example, if execution should transition from a low power core to a high performance core, the execution state of the low power core may be copied to the high performance core before the high performance core begins execution. Execution status is proprietary and may include, but is not limited to, scratchpad data, translation buffer data, and cache status.

於一實施例中,叢集控制單元230被配置成用以關閉已被熱拔出之核心的一或多個電壓供應器並且開啟已被熱插入之核心的一或多個電壓供應器。舉例來說,叢集控制單元230可能關閉與VF域210(0)相關聯的電壓供應器,以便熱拔出核心140(0)。叢集控制單元230可能還會施行用於每一個核心140的頻率控制電路系統。叢集控制單元230會從一駐存在裝置驅動器154內的叢集切換器軟體模組處接收命令。該叢集切換器係管理核心配置之間的轉變。舉例來說,叢集切換器能夠指示每一個核心用以保存內文(包括虛擬ID 212)以及載入一已保存的內文(包括任意虛擬ID 212)。該叢集切換器可能包括用以透過叢集控制單元230來保存與載入內文的硬體支援。控制單元230可提供工作負荷改變之自動偵測並且向該叢集切換器表示一新工作負荷需要一新配置。接著,該叢集切換器會指示控制單元230將工作負荷從一核心140轉移至另一核心140,或者透過熱插入額外的核心來致能該等額外核心。In one embodiment, the cluster control unit 230 is configured to turn off one or more voltage supplies of the core that have been hot extracted and turn on one or more voltage supplies of the core that have been hot plugged. For example, the cluster control unit 230 may turn off the voltage supply associated with the VF domain 210(0) to hot pull out the core 140(0). The cluster control unit 230 may also perform frequency control circuitry for each core 140. The cluster control unit 230 receives commands from a cluster switcher software module resident in the device driver 154. The cluster switcher manages the transition between core configurations. For example, the cluster switch can instruct each core to hold the text (including the virtual ID 212) and load a saved text (including any virtual ID 212). The cluster switcher may include hardware support for saving and loading the text through the cluster control unit 230. Control unit 230 can provide automatic detection of workload changes and indicate to the cluster switch that a new workload requires a new configuration. The cluster switcher then instructs the control unit 230 to transfer the workload from one core 140 to another core 140 or to enable the additional cores by hot plugging additional cores.

第三圖所示的係根據本發明一實施例之包括多核心的CPU的不同操作區。CPU(例如,第一圖的CPU 102)包括至少一低功率核心140(0)以及一高效能核心140(N)。如圖所示,低功率核心140(0)的一功率曲線320係繪製為以一總處理量310為函數。同樣地,一功率曲線322係針對高效能核心140(N)所繪製,而一功率曲線324係針對雙核心配置所繪製。總處理量310此處的定義為每秒執行的指令,而功率312的定義為用以維持對應總處理量310所需要的功率單位,例如,瓦(或是其分率)。The third figure shows different operating areas of a CPU including multiple cores in accordance with an embodiment of the present invention. The CPU (e.g., CPU 102 of the first figure) includes at least one low power core 140(0) and a high performance core 140(N). As shown, a power curve 320 of the low power core 140(0) is plotted as a function of a total throughput 310. Similarly, a power curve 322 is drawn for the high performance core 140(N) and a power curve 324 is drawn for the dual core configuration. The total throughput 310 is defined herein as an instruction executed per second, while the power 312 is defined as the power unit required to maintain the corresponding total throughput 310, such as watts (or its fraction).

一核心時脈頻率可改變,以便沿著總處理量310軸達成連續不同程度的總處理量。如圖所示,低功率核心140(0)的一最大總處理量低於高效能核心140(N)的一最大總處理量。於一實行場景,高效能核心140(N)能夠操作在高於低功率核心140(0)的時脈頻率處。於與功率曲線324相關聯的一雙核心模式中,低功率核心140(0)可以一位在相關聯的一上操作範圍中的時脈頻率來驅動,而高效能核心140(N)可以一位在相關聯的中等操作範圍中的不同時脈頻率來驅動。於一配置中,雙核心模式中的每一個核心140(0)、140(N)係在兩個核心的範圍內以相同的時脈頻率來驅動。於一不同的配置中,雙核心模式中的每一個核心140(0)、140(N)係在每一個核心的 相關聯範圍內以一不同的時脈來驅動。於一實施例中,每一個時脈頻率可被選擇用以達成每一個核心之雷同的正向執行進程。於特定的實施例中,多個核心140被配置成用於以共同的電壓供應器來操作並且可操作在獨立的時脈頻率中。A core clock frequency can be varied to achieve a continuous varying degree of total throughput along the total throughput 310 axis. As shown, a maximum total throughput of the low power core 140(0) is less than a maximum total throughput of the high performance core 140(N). In an implementation scenario, the high performance core 140(N) is capable of operating at a clock frequency above the low power core 140(0). In a dual core mode associated with power curve 324, low power core 140(0) can be driven by one bit at a clock frequency in an associated upper operating range, while high performance core 140(N) can be The bits are driven at different clock frequencies in the associated medium operating range. In one configuration, each of the cores 140(0), 140(N) in the dual core mode is driven at the same clock frequency within the range of the two cores. In a different configuration, each of the cores 140(0), 140(N) in the dual core mode is at each core. The correlation range is driven by a different clock. In one embodiment, each clock frequency can be selected to achieve a similar forward execution process for each core. In a particular embodiment, the plurality of cores 140 are configured for operation with a common voltage supply and are operable in separate clock frequencies.

在低功率核心區330內,低功率核心140(0)能夠滿足使用該等三種核心配置(低功率、高效能、雙核心)之最小功率的總處理量需求。在高效能核心區332內,高效能核心140(N)能夠滿足使用該等三種核心配置之最小功率的總處理量需求,同時延伸總處理量310超越低功率核心140(0)的一最大總處理量314。在一雙核心區334內,同時操作低功率核心140(0)與高效能核心140(N)兩者可達成高於高效能核心140(N)之最大總處理量316的一總處理量,從而延伸整體總處理量,但是會犧牲額外的功率消耗。Within the low power core zone 330, the low power core 140(0) is capable of meeting the total throughput requirements of the minimum power using the three core configurations (low power, high performance, dual core). Within the high performance core zone 332, the high performance core 140(N) is capable of meeting the total throughput requirements of the minimum power using the three core configurations while the extended total throughput 310 exceeds a maximum total of the low power core 140(0). Processing amount 314. In a dual core zone 334, simultaneous operation of both the low power core 140(0) and the high performance core 140(N) can achieve a total throughput greater than the maximum total throughput 316 of the high performance core 140(N), This extends the overall total throughput, but at the expense of additional power consumption.

給定有三個操作區330、332、334以及一低功率核心140(0)和一高效能核心140(N),那麼,在不同的核心配置之間會支援六種狀態轉變。第一狀態轉變係在區域330與區域332之間;第二狀態轉變係在區域332與區域330之間;第三狀態轉變係在區域330與區域334之間;第四狀態轉變係在區域334與區域330之間;第五狀態轉變係在區域332與區域334之間;以及第六狀態轉變係在區域334與區域332之間。熟習本技術的人士便會瞭解,額外的核心可新增額外的操作區以及核心配置之間的額外潛在狀態轉變,其並沒有脫離本發明的範疇與精神。Given three operating zones 330, 332, 334 and a low power core 140(0) and a high performance core 140(N), six state transitions are supported between different core configurations. The first state transition is between region 330 and region 332; the second state transition is between region 332 and region 330; the third state transition is between region 330 and region 334; the fourth state transition is at region 334. Between region 330; fifth state transition is between region 332 and region 334; and sixth state transition is between region 334 and region 332. Those skilled in the art will appreciate that additional cores may add additional operational zones and additional potential state transitions between core configurations without departing from the scope and spirit of the present invention.

於一實施例中,CPU 102裡面的核心140的特徵為以電壓和頻率為函數的功率消耗以及總處理量。最終的特徵包括一系列的功率曲線以及具有不同功率需求的不同操作區。該等不同操作區可針對一給定的CPU 102設計來靜態決定。該等不同操作區可儲存在裝置驅動器154裡面的表格中,裝置驅動器154接著能夠將CPU 102配置成用於以主要工作負荷需求為基礎來熱插入及熱拔出的不同核心140。於一實施例中,裝置驅動器154會對目前的工作負荷需求重新作出反應並且重新配置CPU 102裡面的不同核心140,以便最佳符合該等需求。於另一實施例中,排程器152被配置成用以根據可用的核心140來排程工作負荷。排程器152可指示裝置驅動器154以工作負荷需求的現有和未來知識為基礎來熱插入或熱拔出不 同的核心。In one embodiment, core 140 within CPU 102 is characterized by power consumption as a function of voltage and frequency, and total throughput. The final features include a series of power curves and different operating zones with different power requirements. These different operating zones can be statically determined for a given CPU 102 design. The different operating zones may be stored in a table within the device driver 154, which in turn can configure the CPU 102 for different cores 140 for hot plugging and hot unplugging based on primary workload requirements. In one embodiment, device driver 154 re-responsive to current workload demands and reconfigures different cores 140 within CPU 102 to best meet these needs. In another embodiment, scheduler 152 is configured to schedule workloads based on available cores 140. Scheduler 152 can instruct device driver 154 to hot plug or hot pull out based on current and future knowledge of workload requirements The same core.

第四圖所示的係根據本發明一實施例之用以將多核心CPU配置成操作在省電區內的方法步驟之流程圖。該等方法步驟雖然配合第一圖至第二圖的系統作說明;不過,熟習本技術的人士便會瞭解,被配置成以任何順序來實施該等方法步驟的任何系統皆落在本發明的範疇裡面。於一實施例中,該等方法步驟係由第一圖的CPU 102來實施。The fourth diagram shows a flow diagram of method steps for configuring a multi-core CPU to operate in a power saving zone in accordance with an embodiment of the present invention. The method steps are described in conjunction with the systems of the first to second figures; however, those skilled in the art will appreciate that any system configured to perform the method steps in any order falls within the scope of the present invention. Inside the category. In one embodiment, the method steps are implemented by CPU 102 of the first figure.

如圖所示,方法400始於步驟410,第二圖的叢集控制單元230會於該處初始化CPU 102的核心配置。於一實施例中,叢集控制單元230會初始化CPU 102的核心配置,用以反映第一圖的低功率核心140(0)可利用性。於此配置中,核心140(0)會執行一作業系統開機年表,包括核150的載入和啟動執行。As shown, the method 400 begins at step 410 where the cluster control unit 230 of the second diagram initializes the core configuration of the CPU 102. In one embodiment, the cluster control unit 230 initializes the core configuration of the CPU 102 to reflect the low power core 140(0) availability of the first map. In this configuration, core 140(0) performs an operating system boot chronology, including loading and starting execution of core 150.

在步驟412中,裝置驅動器154會接收工作負荷資訊。工作負荷資訊可能包括,但不限於,CPU負載統計、等待時間統計以及類似資訊。該工作負荷資訊可能接收自CPU 102內的叢集控制單元230或是習知的核任務與執行緒服務。倘若在步驟420中由該工作負荷資訊所反映的工作負荷中有改變的話,那麼,該方法便會前往步驟422,否則,該方法會回到步驟412。在步驟422中,該裝置驅動器會決定一用以支援該新工作負荷資訊之相稱的核心配置。該驅動器可能使用靜態事先算出的工作負荷表,該等工作負荷表會將功率曲線資訊映對至支援該工作負荷資訊中所反映之必要工作負荷的有效核心配置。In step 412, device driver 154 receives the workload information. Workload information may include, but is not limited to, CPU load statistics, latency statistics, and the like. The workload information may be received from the cluster control unit 230 within the CPU 102 or a known core task and thread service. If there is a change in the workload reflected by the workload information in step 420, then the method proceeds to step 422, otherwise the method returns to step 412. In step 422, the device driver determines a commensurate core configuration to support the new workload information. The drive may use a static, previously calculated workload table that maps the power curve information to an effective core configuration that supports the necessary workload as reflected in the workload information.

倘若在步驟430中該相稱的核心配置表示目前核心配置改變的話,那麼,該方法便會前往步驟432,否則,該方法會回到步驟412。在步驟432中,該裝置驅動器會讓CPU 102轉變至該相稱的核心配置。該轉變過程可能涉及熱插入一或多個核心並且可能也涉及熱拔出一或多個核心,其係依照一目前核心配置以及該相稱的核心配置之間的差異函數來進行。If the commensurate core configuration indicates a change in the current core configuration in step 430, then the method proceeds to step 432, otherwise the method returns to step 412. In step 432, the device driver causes CPU 102 to transition to the commensurate core configuration. The transition process may involve hot insertion of one or more cores and may also involve hot extraction of one or more cores in accordance with a difference function between a current core configuration and the commensurate core configuration.

倘若在步驟440中該方法應該終止的話,那麼,該方法便會前往步驟490,否則,該方法會回到步驟412。該方法可能必須在收到一終止訊號時終止,例如,在全面性關機事件期間。If the method should be terminated in step 440, then the method proceeds to step 490, otherwise the method returns to step 412. The method may have to be terminated upon receipt of a termination signal, for example during a full shutdown event.

簡言之,本文揭示一種用於管理多核心CPU內之處理器核心的技術。該技術涉及在必要時熱插拔核心資源。每一個核心皆包括一虛擬ID,以便允許從一特殊的實體核心電路處抽離核心執行內文。當系統工作負荷增加時,核心配置可能改變,以便支援該等增加。同樣地,當系統工作負荷減少時,核心配置可能改變,以便減少功率消耗同時支援已減輕的工作負荷。Briefly, this document discloses a technique for managing processor cores within a multi-core CPU. This technology involves hot swapping core resources when necessary. Each core includes a virtual ID to allow the core execution context to be extracted from a particular physical core circuit. As system workload increases, the core configuration may change to support these increases. Similarly, as system workloads decrease, core configurations may change to reduce power consumption while supporting reduced workloads.

該已揭技術的一項優點係可有利地在廣泛工作負荷範圍中改善多核心中央處理單元的功率效率,同時有效地運用處理資源。An advantage of the disclosed technology is that it can advantageously improve the power efficiency of a multi-core central processing unit over a wide range of workloads while efficiently utilizing processing resources.

前面說明雖然關於本發明的實施例;不過,仍可設計出本發明的其它及進一步實施例,其並無脫離本發明的基本範疇。舉例來說,本發明的態樣可以硬體或軟體或硬體與軟體之組合來施行。本發明的一實施例可被施行為用於電腦系統的程式商品。該程式商品的程式定義實施例的功能(包括本文中所述的方法)並且能夠容納在各式各樣的電腦可讀取儲存媒體中。示例性的電腦可讀取儲存媒體包括,但不限於:(i)資訊永久儲存於其上的不可寫入儲存媒體(舉例來說,電腦裡面的唯讀記憶體裝置,例如,可以唯讀記憶光碟(CD-ROM,Compact Disc Read Only Memory)機來讀取的CD-ROM碟片、快閃記憶體、唯讀記憶體(ROM,Read Only Memory)晶片或是任何類型的固態非揮發性半導體記憶體);以及(ii)其上儲存可變更資訊的可寫入儲存媒體(舉例來說,磁碟機或硬碟機裡面的軟磁碟或是任何類型的固態隨機存取半導體記憶體)。The foregoing is a description of the embodiments of the present invention, and other embodiments of the invention may be devised without departing from the basic scope of the invention. For example, aspects of the invention may be practiced in the form of a hard or soft body or a combination of a hardware and a soft body. An embodiment of the present invention can be implemented as a program commodity for a computer system. The program of the program product defines the functionality of the embodiment (including the methods described herein) and can be housed in a wide variety of computer readable storage media. Exemplary computer readable storage media include, but are not limited to: (i) non-writable storage media on which information is permanently stored (eg, a read-only memory device within the computer, for example, a read-only memory) CD-ROM disc, flash memory, read only memory (ROM), or any type of solid non-volatile semiconductor read by CD-ROM (Compact Disc Read Only Memory) And (ii) a writable storage medium on which the changeable information is stored (for example, a floppy disk in a disk drive or a hard disk drive or any type of solid state random access semiconductor memory).

本發明已在上面參考特定實施例說明。然而,熟習本技術的人士便會瞭解,可對其進行各種修正與改變,而不會脫離隨附申請專利範圍中所提出之本發明更廣的精神與範疇。因此,前面說明和圖式應被視為示例性,而沒有限制意義。The invention has been described above with reference to specific embodiments. However, those skilled in the art will appreciate that various modifications and changes can be made thereto without departing from the spirit and scope of the invention as set forth in the appended claims. Therefore, the foregoing description and drawings are to be considered as illustrative and not limiting.

所以,本發明的範疇係由下面的申請專利範圍來決定。Therefore, the scope of the invention is determined by the scope of the following claims.

400‧‧‧方法400‧‧‧ method

410-490‧‧‧步驟410-490‧‧‧Steps

Claims (10)

一種用於配置一處理單元內二或多個核心以便執行不同工作負荷的方法,該方法包括:接收和一新工作負荷有關的資訊;以該資訊為基礎來判斷該新工作負荷不同於目前工作負荷;擷取被包括於該等二或多個核心之每一個核心之關聯於功率消耗特徵的特徵資料;以該資訊及該特徵資料為基礎來判斷該等二或多個核心中有多少個應該被配置用以執行該新工作負荷;以該等二或多個核心中有多少個應該被配置用以執行該新工作負荷為基礎來判斷是否需要一新核心配置;以及倘若需要新核心配置的話,那麼便將該處理單元轉變為該新核心配置,或者,倘若不需要新核心配置的話,那麼,便保持目前核心配置來執行該新工作負荷。 A method for configuring two or more cores in a processing unit to perform different workloads, the method comprising: receiving information related to a new workload; determining, based on the information, that the new workload is different from current work Loading; extracting characteristic data associated with power consumption characteristics included in each of the two or more cores; determining the number of the two or more cores based on the information and the characteristic data Should be configured to execute the new workload; determine if a new core configuration is needed based on how many of the two or more cores should be configured to perform the new workload; and if a new core configuration is required If so, then the processing unit is converted to the new core configuration, or, if a new core configuration is not required, then the current core configuration is maintained to perform the new workload. 如申請專利範圍第1項的方法,其中,僅一低功率核心在該目前核心配置中執行工作,而判斷該等二或多個核心中有多少個應該被配置包括判斷僅有一高效能核心應該被配置用以執行該新工作負荷,以及進一步包括判斷需要一新的核心配置,以及藉由關閉該低功率核心且開啟該高效能核心來執行該新工作負荷而轉變該處理單元。 The method of claim 1, wherein only one low power core performs work in the current core configuration, and determining how many of the two or more cores should be configured includes determining that only one high performance core should Configuring to perform the new workload, and further comprising determining that a new core configuration is required, and transitioning the processing unit by turning off the low power core and turning on the high performance core to execute the new workload. 如申請專利範圍第1項的方法,其中,僅一高效能核心在該目前核心配置中執行工作,而判斷該等二或多個核心中有多少個應該被配置包括判斷僅有一低功率核心應該被配置用以執行該新工作負荷,以及進一步包括判斷需要一新的核心配置,以及藉由關閉該高效能核心且開啟該低功率核心來執行該新工作負荷而轉變該處理單元。 The method of claim 1, wherein only one high performance core performs work in the current core configuration, and determining how many of the two or more cores should be configured includes determining that only one low power core should be Configuring to perform the new workload, and further comprising determining that a new core configuration is required, and transitioning the processing unit by shutting down the high performance core and turning on the low power core to execute the new workload. 如申請專利範圍第1項的方法,其中,僅一低功率核心在該目前核心配置中執行工作,而判斷該等二或多個核心中有多少個應該被配置包括判斷該低功率核心與一高效能核心兩者應該被配置用以執行該新工作負荷,以及進一步包括判斷需要一新的核心配置,以及藉由開啟該 高效能核心來執行該新工作負荷而轉變該處理單元。 The method of claim 1, wherein only one low power core performs work in the current core configuration, and determining how many of the two or more cores should be configured comprises determining the low power core and one Both of the high performance cores should be configured to perform the new workload, and further include determining that a new core configuration is needed, and by turning on the The high performance core executes the new workload to transform the processing unit. 如申請專利範圍第1項的方法,其中,僅一高效能核心在該目前核心配置中執行工作,而判斷該等二或多個核心中有多少個應該被配置包括判斷一低功率核心與該高效能核心兩者應該被配置用以執行該新工作負荷,以及進一步包括判斷需要一新的核心配置,以及藉由開啟該低功率核心來執行該新工作負荷而轉變該處理單元。 The method of claim 1, wherein only one high performance core performs work in the current core configuration, and determining how many of the two or more cores should be configured includes determining a low power core and the Both of the high performance cores should be configured to execute the new workload, and further include determining that a new core configuration is needed, and transitioning the processing unit by turning on the low power core to execute the new workload. 如申請專利範圍第1項的方法,其中,一低功率核心與一高效能核心皆在該目前核心配置中執行工作,而判斷該等二或多個核心中有多少個應該被配置包括判斷僅該高效能核心應該被配置用以執行該新工作負荷,以及進一步包括判斷需要一新的核心配置,以及藉由關閉該低功率核心來執行該新工作負荷而轉變該處理單元。 The method of claim 1, wherein a low power core and a high performance core perform work in the current core configuration, and determining how many of the two or more cores should be configured includes determining that only The high performance core should be configured to execute the new workload, and further includes determining that a new core configuration is needed, and transitioning the processing unit by shutting down the low power core to perform the new workload. 如申請專利範圍第1項的方法,其中,一低功率核心與一高效能核心皆在該目前核心配置中執行工作,而判斷該等二或多個核心中有多少個應該被配置包括判斷僅該低功率核心應該被配置用以執行該新工作負荷,以及進一步包括判斷需要一新的核心配置,以及藉由關閉該高效能核心來執行該新工作負荷而轉變該處理單元。 The method of claim 1, wherein a low power core and a high performance core perform work in the current core configuration, and determining how many of the two or more cores should be configured includes determining that only The low power core should be configured to execute the new workload, and further includes determining that a new core configuration is needed, and transitioning the processing unit by shutting down the high performance core to perform the new workload. 如申請專利範圍第1項的方法,其中,該處理單元包括一中央處理單元或是一圖形處理單元。 The method of claim 1, wherein the processing unit comprises a central processing unit or a graphics processing unit. 如申請專利範圍第1項的方法,其中,該等二或多個核心中所包含的每一個核心可透過一可程式化辨識符來辨識,而且,一或多個可程式化辨識符會被用於將該處理單元轉變至該新核心配置。 The method of claim 1, wherein each core included in the two or more cores is identifiable by a programmable identifier, and one or more programmable identifiers are Used to transition the processing unit to the new core configuration. 一種計算裝置,其包括:一中央處理單元,其包含至少一低功率核心與至少一高效能核心,該中央處理單元已程式化成藉由以下方式來配置二或多個核心,用以執行不同的工作負荷:接收和一新工作負荷有關的資訊;以該資訊為基礎來判斷該新工作負荷不同於目前工作負荷;擷取被包括於該等二或多個核心之每一個核心之關聯於功率消耗特徵的特徵資料; 以該資訊及該特徵資料為基礎來判斷該等二或多個核心中有多少個應該被配置用以執行該新工作負荷;以該等二或多個核心中有多少個應該被配置用以執行該新工作負荷為基礎來判斷是否需要一新核心配置;以及倘若需要新核心配置的話,那麼便將該處理單元轉變為該新核心配置,或者,倘若不需要新核心配置的話,那麼,便保持目前核心配置來執行該新工作負荷。A computing device comprising: a central processing unit comprising at least one low power core and at least one high performance core, the central processing unit having been programmed to configure two or more cores to perform different functions by: Workload: receiving information related to a new workload; determining, based on the information, that the new workload is different from the current workload; extracting the associated power included in each of the two or more cores Characteristic data of the consumption feature; Based on the information and the feature data, it is determined how many of the two or more cores should be configured to execute the new workload; and how many of the two or more cores should be configured to be used Performing this new workload as a basis to determine if a new core configuration is required; and if a new core configuration is required, then the processing unit is converted to the new core configuration, or, if a new core configuration is not required, then Keep the current core configuration to perform this new workload.
TW102127477A 2012-07-31 2013-07-31 Heterogeneous multiprocessor design for power-efficient and area-efficient computing TWI502333B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201261678026P 2012-07-31 2012-07-31
US13/723,995 US9569279B2 (en) 2012-07-31 2012-12-21 Heterogeneous multiprocessor design for power-efficient and area-efficient computing
US201313931122A 2013-06-28 2013-06-28

Publications (2)

Publication Number Publication Date
TW201418972A TW201418972A (en) 2014-05-16
TWI502333B true TWI502333B (en) 2015-10-01

Family

ID=50625698

Family Applications (1)

Application Number Title Priority Date Filing Date
TW102127477A TWI502333B (en) 2012-07-31 2013-07-31 Heterogeneous multiprocessor design for power-efficient and area-efficient computing

Country Status (2)

Country Link
DE (1) DE102013108041B4 (en)
TW (1) TWI502333B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9958932B2 (en) * 2014-11-20 2018-05-01 Apple Inc. Processor including multiple dissimilar processor cores that implement different portions of instruction set architecture
US9898071B2 (en) 2014-11-20 2018-02-20 Apple Inc. Processor including multiple dissimilar processor cores
US9928115B2 (en) 2015-09-03 2018-03-27 Apple Inc. Hardware migration between dissimilar cores

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100146513A1 (en) * 2008-12-09 2010-06-10 Intel Corporation Software-based Thread Remapping for power Savings
TW201205274A (en) * 2010-06-30 2012-02-01 Via Tech Inc Microprocessor, method of operating microprocessor and computer program product
TW201211756A (en) * 2010-05-25 2012-03-16 Nvidia Corp System and method for power optimization
US8140876B2 (en) * 2009-01-16 2012-03-20 International Business Machines Corporation Reducing power consumption of components based on criticality of running tasks independent of scheduling priority in multitask computer

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6501999B1 (en) 1999-12-22 2002-12-31 Intel Corporation Multi-processor mobile computer system having one processor integrated with a chipset
US6981083B2 (en) 2002-12-05 2005-12-27 International Business Machines Corporation Processor virtualization mechanism via an enhanced restoration of hard architected states
US9063730B2 (en) 2010-12-20 2015-06-23 Intel Corporation Performing variation-aware profiling and dynamic core allocation for a many-core processor
US8910177B2 (en) 2011-04-14 2014-12-09 Advanced Micro Devices, Inc. Dynamic mapping of logical cores

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100146513A1 (en) * 2008-12-09 2010-06-10 Intel Corporation Software-based Thread Remapping for power Savings
US8140876B2 (en) * 2009-01-16 2012-03-20 International Business Machines Corporation Reducing power consumption of components based on criticality of running tasks independent of scheduling priority in multitask computer
TW201211756A (en) * 2010-05-25 2012-03-16 Nvidia Corp System and method for power optimization
TW201205274A (en) * 2010-06-30 2012-02-01 Via Tech Inc Microprocessor, method of operating microprocessor and computer program product

Also Published As

Publication number Publication date
DE102013108041B4 (en) 2024-01-04
DE102013108041A1 (en) 2014-05-22
TW201418972A (en) 2014-05-16

Similar Documents

Publication Publication Date Title
US9569279B2 (en) Heterogeneous multiprocessor design for power-efficient and area-efficient computing
US20110213950A1 (en) System and Method for Power Optimization
TWI574159B (en) Method and system for enabling a non-core domain to control memory bandwidth in a processor, and the processor
KR101773224B1 (en) Power-optimized interrupt delivery
US8924758B2 (en) Method for SOC performance and power optimization
JP6197196B2 (en) Power efficient processor architecture
TW201211756A (en) System and method for power optimization
JP5905408B2 (en) Multi-CPU system and computing system including the same
US20120331275A1 (en) System and method for power optimization
JP5093620B2 (en) Platform-based idle time processing
US8607177B2 (en) Netlist cell identification and classification to reduce power consumption
TWI570548B (en) Processor including multiple dissimilar processor cores that implement different portions of instruction set architecture
US20020083356A1 (en) Method and apparatus to enhance processor power management
TWI493332B (en) Method and apparatus with power management and a platform and computer readable storage medium thereof
TWI553549B (en) Processor including multiple dissimilar processor cores
JP2017519294A (en) Multi-host power controller (MHPC) for flash memory-based storage devices
TWI502333B (en) Heterogeneous multiprocessor design for power-efficient and area-efficient computing
US6473810B1 (en) Circuits, systems, and methods for efficient wake up of peripheral component interconnect controller
US8717372B1 (en) Transitioning between operational modes in a hybrid graphics system
US20210089326A1 (en) Dynamic bios policy for hybrid graphics platforms
US20230280809A1 (en) Method and apparatus to control power supply rails during platform low power events for enhanced usb-c user experience
US20240028222A1 (en) Sleep mode using shared memory between two processors of an information handling system
US20110286289A1 (en) System and method of selectively varying supply voltage without level shifting data signals
JP2006338615A (en) Data communication system
WO2023048799A1 (en) Device and method for reducing save-restore latency using address linearization