TW201245948A - Decentralized power management distributed among multiple processor cores - Google Patents

Decentralized power management distributed among multiple processor cores Download PDF

Info

Publication number
TW201245948A
TW201245948A TW100148084A TW100148084A TW201245948A TW 201245948 A TW201245948 A TW 201245948A TW 100148084 A TW100148084 A TW 100148084A TW 100148084 A TW100148084 A TW 100148084A TW 201245948 A TW201245948 A TW 201245948A
Authority
TW
Taiwan
Prior art keywords
core
state
cores
block
power
Prior art date
Application number
TW100148084A
Other languages
Chinese (zh)
Other versions
TWI450084B (en
Inventor
Glenn Henry G
D Gaskins Darius
Original Assignee
Via Tech Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/299,059 external-priority patent/US8782451B2/en
Priority claimed from US13/299,122 external-priority patent/US8635476B2/en
Application filed by Via Tech Inc filed Critical Via Tech Inc
Publication of TW201245948A publication Critical patent/TW201245948A/en
Application granted granted Critical
Publication of TWI450084B publication Critical patent/TWI450084B/en

Links

Abstract

A multi-core processor provides a configurable resource shared by two or more cores, wherein configurations of the resource affect the power, speed, or efficiency with which the cores sharing the resource are able to operate. Internal core power state management logic configures each core to participate in a de-centralized inter-core power state discovery process to discover a composite target power state for the shared resource that is a most restrictive or power-conserving state that will not interfere with any of the corresponding target power states of each core sharing the resource. The internal core power state management logic determines whether the core is a master core authorized to configure the resource, and if so, configures that resource in the discovered composite power state. The de-centralized power state discovery process is carried out between the cores on sideband, non-system bus wires, without the assistance of centralized non-core logic. A multi-core processor including microcode distributed in each core enabling each core to participate in a de-centralized inter-core state discovery process is also disclosed.

Description

201245948 六、發明說明: 【相關申請案之參考文獻】 本申請案優先權之申請係根據該美國專利臨時申請 案,案號:61/426,47〇 ’申請日:12/22/2010 ’名稱為多核 心内之旁路匯流排(MULTI_C0RE INTERNAL BYPASS BUS),該案整體皆納入本案參考。 本中請案與下列同在申請中之美國專利申請案有關, 都具有相同的申請日,每一申請案整體皆納入本案參考。 案號 申請曰 名稱 TBD (CNTR.2503 ) 12/22/2010 多核心處理器之内部旁路 匯流排 Multi-Core Microprocessor Internal Bypass Bus TBD (CNTR.2527) 12/22/2010 多處理器核心間之分散式 電源管理 Decentralized Power Management Distributed Among Multiple Processor Cores TBD (CNTR.2528 ) 12/22/2010 產生多核心晶片之光罩組改 良 Reticle Set Modification to Producer Multi-core Dies TBD 12/22/2010 多核心微處理器之動態配置 4 201245948 (CNTR.2533 ) 發現方法 Dynamic Multiple-Core Microprocessor Configuration Discovery TBD (CNTR.2534) 12/22/2010 多核心微處理器之共享電 源的分散式管理 Distributed Management of a Shared Power Source to a Multi-Core Microprocessor TBD (CNTR.2536 ) 12/22/2010 可動態和選擇性地停用核 心以及重新設定一多核心 微處理器 Dynamic and Selective Core Disablement and Reconfiguration in a Multi-Core Processor 【發明所屬之技術領域】 本發明是有關於多核心微處理器設計之領域,且特別是有關 於多核心之特定操作及多核心處理器之多核心域(d〇main)之管 理與實現。 【先前技術】 口。現代微處理器減少它們的電源消耗之主要方式,係減少微處 _ 卞乍f之頻率及/或電壓。此外,在某些實例中,微處理器可 月b允才時私號對於其電路之多個部分禁能。最後,在某些實例 201245948 中,微處理器可能對於其電路之多個部分一起移除電源。再者, 有時候微處理n需要尖峰性能,使其需要於其最高電壓及頻率下 操作。微處理器採取電源管理動作以控制微處理器之電壓與頻率 位準以及時脈與電源禁能。基本上,微處理器個應來自作業系 統之指導(direction)而採取電源管理之動作。熟知之χ86娜姐 “令係為-種讓作鮮、統執行以要求進人至—個與實際狀況相關 的最佳化狀態之實例’作業纽可使用此狀態以執行進階的電源 呂理。最佳化狀癌可能是休目民(sleeping)或閒置(娜)狀態。熟 知之進階配置電源介面(ACPI)規格,係#由界定操作或電源管 理相關的狀態(例如”(:_狀態”及”p•狀態")以方便作鮮統導向 (operating system-directed )之電源管理。 因為多數的現代倾處理ϋ係為多核^處理^,其中許多處 理核心共用-個❹個電源管理相_資源,所以執行電源管: 動作是複雜的。舉例而言’多個核心可能翻電壓源及/或時脈源。 再者’包含-多核心處理器之計算系統亦基本上包含—晶片组, 其包含多細以橋接處理器匯流排至系統之其他匯流排(例如, 至周邊I/O匯流排)讀流排橋,並包含—個做為多核心處理器斑 系己憶體的介面之記憶體控制器。晶片組可密切地參與各種電 源管理動作’且在本身與多核心處理器間可能需要_機制。 更明確而言,於某㈣統中,在多核心處理器之允許下,晶 片組可能禁能-個處理器匯流排上之時脈信號,處理器接收並使 用此時脈錢衫生其树的⑽日嫌錢之料分。在多核心 201245948 處理器的情況下,所古你 斤有使用匯流排時脈之核心必須準備讓晶片組 禁能其匯流排時脈。亦g 、 才即,直到所有核心準備好之後,晶片組才 被允禁能匯流排時脈。 者在正吊凊形下,晶片組會窺探(snoop)處理器匯流排 之决取。己隐體。舉例而言,當一周邊展置於一周邊匯流排上產 。己隐體存取日彳’ “組會將此記憶體存取傳送至處理器匯流 ”皁此使處理$可窺域快取記㈣關定其是否持有(_) 所窺心位址之貝料。舉例而言,眾人皆知裝置會定期輪詢記 思體位置會於處理减流排上產生翻性的窺探循環(sn〇〇p cycle)在某些系統中,多核心處理器可能進入一深休眠狀態,此 時將清除其快取記憶體的内容且禁驗取的時脈信號以便節省電 源。於此情況下,對多核心處理器而言,為了因應處理器匯流排 上之窺探循伽_其錄(料它們是純,所財遠不會傳 回擊中(⑻訊息)而被麵,然後再回到休眠狀態無疑是種浪 、因此在讀、處理H之允許下,晶片組可被授權不要產生 處理裔匯流排上之窺探循環以達成額外的電源節約 。然而,必須 再次提醒岐’財_心必解備好之後^組找關閉窺探 功能,亦即晶版科_舰舰,除賴有核㈣準備好才 行。 發證給等人(以下以Naveh代表)之美國專利第 入451,333號揭露-種包含多重處理核心之多核心微處理哭,每一 個核心能侧—個要求核心轉變成-閒置狀態之命令。多核心處 7 201245948 理為亦包含硬體協έ周邏輯(Hardware Coordination Logic,HCL), HCL接收來自核心之間置狀態狀況,並基於命令與核心之閒置狀 態狀況來管理核心之電源消耗。更明確而言,HCL決定是否所有 核心已偵測一項要求轉換至一共通狀態之命令。如果不是的話, 則HCL選擇在命令的閒置狀態間的一最淺狀態(shall〇west咖^ ) 以作為每個核心之間置狀態。然而,如果^^^偵測一項要求轉換 成一共通狀態之命令,則HCL可以啟動共用的電源節約特徵,例 如性旎狀怨減少(perf〇rmance state reducti〇n )、一共用的鎖相迴路 (PLL)之關閉、或處理器之執行情;兄之節省。沉匕亦可防止外 部中斷(bfeak)事件觀到達核^,以將彳歧私觀成共通狀 態。此外,HCL可與晶片組實施一交握順序(handshake鄉咖^ 以將核心轉變成共通狀態。 在由Alon Naveh等人所寫之論文中,名稱為"英特爾酷睿核心 處理器中之電源及熱管理(P〇wer and Thermal Manag嶋t出此 Intel Core Duo Processor) ”’其出版於2〇〇6年5月15曰發行之英 特爾科技期刊巾,Naveh等人說明-觀用設置於晶片或平台之 共用區域中之非如硬體協調邏輯(HCL)之相容c_狀態控制結 構,作為在個職心、與晶片及平台上之共时源_ —層。虹 基於核心之_需求決定所需要的cpUU狀態、㈣共用資源 之狀態、她倣-傳統的(iegaey)單核心處理器顧晶片組實現 C-狀態之進入協定。 在由Naveh參考纽喊露賴制巾,肌係集中在核 201245948 之非核邏輯’亚代麵有核心執行電源管理之操作。然 而這種集中化非核心邏輯解決方法有其弊病,特別是在肌被要 料含在與核心相_“時,過大的“尺寸將是難以令人接 又的尤其對希望在晶片上包含更多核心之架構下,這個弊病將 更加明顯。 【發明内容】 在本發明之-個實施樣態中,係提供—種多核心處理器,其 包含多個實體處_㈣及在每健^巾之核^陳態發現微 碼’核心間狀態發賴碼可使如參與—分散式核㈣電源狀態 么現過程與此相關的,係一提供發現一多核心處理器之電源狀 態之分散式微碼實現方法,此多核心處理器包含參與一分散式核 心間狀態發現過程之至少兩個核心。核心間狀態發現過程係經由 在每個參與核心上執行之微碼、以及透過旁路非系統匯流排通訊 配線在核心之間交換之信號之組合而被實現。發現過程是不透過 任何集中式非核心邏輯。此外’在多數實施例中,核心間狀態發 現過程係依據一種使用鏈鎖式核心間通訊之適當的或選擇的階層 式協調系統而被實現。 在其他實施樣態中,提供核心間狀態發現過程係提供微處理 器組態’包含促使核心啟動及多少核心被啟動之資源之利用率與 分佈、以及微處理器之階層式協調構造與系統,包含域與域主識 別之確認。 201245948 在本發明之另1施·中,提供—種多核心處理器, 動的貫體處理核心以及一由兩個以上的核心共用之可 输、、率。對母個核心而言,處理器更包含設定每 間被魏=之内部核心電源狀態管理邏輯,用以參與在核心之 : 種分散式核μ電職態發現触n須隼中式 非核心邏輯之協助。如果核心係為了設定共用資源的二ί201245948 VI. INSTRUCTIONS: [References to Related Applications] The priority of this application is based on the US Patent Provisional Application, Case No.: 61/426, 47〇 'Application Date: 12/22/2010 'Name For the multi-core bypass bus (MULTI_C0RE INTERNAL BYPASS BUS), the case as a whole is included in this case. This application has the same filing date as the following US patent applications in the same application, and each application is incorporated into the case as a whole. Case Number TB Name TBD (CNTR.2503) 12/22/2010 Multi-Core Microprocessor Internal Bypass Bus TBD (CNTR.2527) 12/22/2010 Multi-Processor Core Decentralized Power Management Distributed Multiple Processor Cores TBD (CNTR.2528) 12/22/2010 Improved Reticle Set Modification to Producer Multi-core Dies TBD 12/22/2010 Multi-Core Micro Dynamic Configuration of the Processor 4 201245948 (CNTR.2533 ) Discovery Method Dynamic Multiple-Core Microprocessor Configuration Discovery TBD (CNTR.2534) 12/22/2010 Distributed Management of Shared Power for Multi-Core Microprocessors Distributed Management of a Shared Power Source to a Multi-Core Microprocessor TBD (CNTR.2536) 12/22/2010 Dynamically and selectively disables the core and resets a multi-core microprocessor. Dynamic and Selective Core Disablement and Reconfiguration in a Multi-Core Processor TECHNICAL FIELD OF THE INVENTION The present invention relates to multi-core microprocessing The field of device design, and in particular the management and implementation of multi-core domains (d〇main) for multi-core specific operations and multi-core processors. [Prior Art] Mouth. The main way in which modern microprocessors reduce their power consumption is to reduce the frequency and/or voltage of the micro _ 卞乍f. In addition, in some instances, the microprocessor may disable the private portion of portions of its circuitry. Finally, in some examples 201245948, the microprocessor may remove power from multiple parts of its circuitry. Furthermore, sometimes microprocessing n requires spike performance that requires operation at its highest voltage and frequency. The microprocessor takes power management actions to control the voltage and frequency levels of the microprocessor as well as the clock and power disable. Basically, the microprocessor should take the power management action from the direction of the operating system. The well-known χ86娜姐 "Let the system be a kind of simplification, implementation, and requirements to enter the person - an example of the optimal state related to the actual situation." The operation can use this state to perform advanced power supply. The optimalized cancer may be in the sleep or idle state. The well-known advanced configuration power interface (ACPI) specification is defined by the state of the defined operation or power management (eg "(:_) "Status" and "p•status") are convenient for operating system-directed power management. Because most modern processing systems are multi-core processing, many of the processing cores share one power supply. Management phase_resources, so the implementation of the power supply: The action is complex. For example, 'multiple cores may turn the voltage source and/or the clock source. Furthermore, the computing system containing the multi-core processor also basically contains— A chipset that includes a plurality of bridges that bridge the processor busbars to other busbars of the system (eg, to the peripheral I/O busbars), and includes one as a multicore processor plaque Interface memory The controller. The chipset can closely participate in various power management actions' and may require a mechanism between itself and the multi-core processor. More specifically, in a (four) system, with the permission of the multi-core processor, the chipset May be disabled - the clock signal on the processor bus, the processor receives and uses the money of the (10) day of the money card. In the case of the multi-core 201245948 processor, you The core of the bus that uses the busbar must be prepared to allow the chipset to disable its busbar clock. Also, until all the cores are ready, the chipset is allowed to block the bustling clock. In the shape of a sling, the chipset will snoop the processor's bus bar. It is a hidden body. For example, when a peripheral display is placed on a peripheral bus, it has a hidden access. The group will transfer this memory access to the processor sink. This allows the processing of the Snapshot Cache (4) to determine whether it holds the (_) peek address. For example, everyone Knowing that the device will periodically poll the body position to process the current limit Producing a tumultuous snoop cycle (sn〇〇p cycle) In some systems, a multi-core processor may enter a deep sleep state, which will clear the contents of its cache memory and disable the clock signal for the test. Save power. In this case, for multi-core processors, in order to respond to the sneak peeks on the processor bus (they are pure, the money will not return the hit ((8) message) Being quilted and then going back to sleep is undoubtedly a kind of wave, so the chipset can be authorized not to generate a snoop loop on the processing bus to achieve additional power savings, while reading and processing H. However, it must be reminded again.岐 '财 _ heart will be ready for the ^ group to find off the snooping function, that is, the crystal version of the _ ship, in addition to the nucleus (four) ready to go. U.S. Patent No. 451,333, issued to et al. (hereinafter referred to as Naveh), discloses a multi-core micro-processing crying containing multiple processing cores, each of which can be turned into an idle state command. . Multi-core 7 201245948 also includes Hardware Coordination Logic (HCL), which receives status from core inter-state and manages core power consumption based on command and core idle status. More specifically, the HCL decides whether all cores have detected a command that requires a transition to a common state. If not, the HCL selects the shallowest state (shall〇west) in the idle state of the command as the inter-core state. However, if ^^^ detects a command that requires conversion to a common state, the HCL can initiate a common power saving feature, such as perf〇rmance state reducti〇n, a shared phase-locked loop. (PLL) is turned off, or the processor is executed; the brother saves. Sinking can also prevent the external bfak event from reaching the core to make the dissidents look like a common state. In addition, the HCL can implement a handshake sequence with the chipset (handshake townships to turn the core into a common state. In the paper by Alon Naveh et al., the name is " Power in Intel Core Core Processors and Thermal management (P〇wer and Thermal Manag嶋t out of this Intel Core Duo Processor) "" published in the Intel Science and Technology Journal towel issued May 15, 2002, Naveh et al. - Viewing on the wafer or The compatible c_state control structure in the shared area of the platform is not like the hardware coordination logic (HCL), as the synchronic source on the job, the chip and the platform. The required cpUU state, (iv) the state of the shared resource, her imitation-the traditional (iegaey) single-core processor chip group to achieve the C-state entry agreement. In the Naveh reference button, the muscle system is concentrated in Nuclear 201245948's non-core logic 'Asian generation has core operations for power management. However, this centralized non-core logic solution has its drawbacks, especially when the muscles are expected to be contained in the core phase. size It will be difficult to connect, especially for the architecture that wants to include more cores on the chip. This drawback will be more obvious. [Invention] In an embodiment of the present invention, a multi-core processing is provided. , which includes a plurality of entities _ (four) and in the core of each health towel, the micro-code 'core-to-core state aging code can be related to the process of participating-distributed core (four) power state, A distributed microcode implementation method for discovering the power state of a multi-core processor comprising at least two cores participating in a decentralized inter-core state discovery process. The inter-core state discovery process is via each The implementation of the microcode executed on the core and the combination of signals exchanged between the cores by bypassing the non-system bus communication wiring. The discovery process is not through any centralized non-core logic. In addition, in most embodiments The inter-core state discovery process is implemented in accordance with an appropriate or selected hierarchical coordination system using chain-locked inter-core communication. The inter-core state discovery process provides a microprocessor configuration that includes the utilization and distribution of resources that cause core boot and how many cores are started, and the hierarchical coordination of systems and systems, including domain and domain masters. 201245948 In another aspect of the present invention, a multi-core processor is provided, a dynamic processing core and a transmission and sharing rate shared by two or more cores are provided for the parent core. In other words, the processor further includes setting the internal core power state management logic of each of the cores to participate in the core: a kind of decentralized nuclear-powered state discovery, which is assisted by Chinese non-core logic. In order to set the shared resource

目標電源狀態係經由分埒W H 的而妙υ/ 〗電雜態發現過程被發現之目 的电離2使/理者核心,則内部心電源管理邏輯設定核心 的組怨以驅使設定共用資 現。對共用資源而言,複合目標電:二;仏電源狀態之實 源狀態,其將不會干涉共享資原大以糸為一種最節能型的電 源狀態。 ’、母個核心之任何對應的目標電 管理=:::rr—種供,.一之 電源狀態定義將影響共享資之—料電源狀態,其中目標 度或效㈣源之域。核=了其麟運作之電源、速 其包含不透過任何隼尹式非核、^間電源狀態發現過程, 他核心之電源狀狀交ΓΓ::輯而與共享該資源之至少一其 與複合目標電源狀祕經由分散:係為了設定共職源的組態 發現之目的而被指定為-管理者2核心間電源狀態發現過程而被 ',則核心驅使用以設定共用 201245948 資源的組態之複合目標電源狀態之實現。 口。=心樣悲中’本發明提供—多核心處理器。多核心 :理:之每個核心包含電源狀態管理微碼,用以設定該核心的組 , ⑽分散式·間複合電源狀態發現過程。電源狀態管理 :微1可使母個核讀收—狀祕變要求,用以依據多個預定電源 狀態(包含-主動操作狀態及—個或多個漸進地較不敏感的狀態) 之任何要求的目標之其中—個設定其成為本身的組態。當-核心 接收要求以轉變成為一受限制的電源狀態(例如會干涉由其他 核心所共«源之-電源賴)時,則其電源狀態管理微碼啟動 :分散式核心間複合電賴態發現過程,_決定是否所有其他 受影響的如已做賴受_的電源狀態的準備。 如果參與發現難之核心確認受限翻電雜_為複合電 源狀態’則核心中的被授權者經由#電源狀態管理微碼實現或啟 動受限_電《態之植人。具體言之,漏核心將實現最限制 的或節能型的操作狀態’其可藉由核心而被實現,而不會干涉其 他核心之對應的目標操作狀態。 在另-實施樣態巾’每個核心、之電源管理微碼之—部分或常 式係為同步,其被組_被設計賴以與其他以節點地連接 (nodally connected)之核心交換電源狀態資訊來決定混合電源狀 態。同步邏輯之每個被喚起的實例(invokedinstance)係被設計成 至少有條件地在尚未同步節點地連接的核心(其係為節點地連接 至本身之核心,且同步邏輯之一同步化實例尚未被喚起)中產生 201245948 同步邏輯之從屬實例,以作為—複合魏狀態發_程之—部分。 於-實施财,核d源管雌碼係被财成無須啟用其 同步邏輯之-本地實例即可實現—目標電源狀態,如果核心之目 標電源狀態並非—種需要與其他核心、協調的受關的電源狀態核 〜。否則’電源管理邏輯設定核心的m現目標電源狀態之 非限制實施樣態或-_電源狀態之非限制實施樣態(例如在核 心上的局部電源節約動作),且喚起其同步邏輯之一本地實例,做 為文限制的電源狀態所應關_心、之最大域開始複合電源狀態 發現過程。在發現對應到目標受限制的電源狀態之一複合電源狀 恕中’被授權以實現複合電源狀態之一核心電源管理微碼啟動(典 型上是具最大影響範圍之管理者核心)及/或進行複合電源狀態之 實現。 在另一實施樣態中,本發明提供一種供一多核心處理器(例 如上述之處理器)使用之管理電源之分散方法。此方法包含接收 針對任一核心之一狀態轉變要求,以依據一目標電源狀態設定該 核心(”本地核心,’)的組態。如果目標電源狀態係為一受限制的電 源狀態,則執行於本地核心上之電源管理邏輯實施同步邏輯之一 本地實例以啟動一分散式核心間複合電源狀態發現過程,以使此 核心與其他核心交換電源狀態。此方法更包含評估發現的電源狀 態’以及有條件地回應受限制的電源狀態之實現或啟動。 同步邏輯之每個本地實例產生在一個或多個節點地連接核心 上之同步邏輯之一個或多個從屬實例,這些從屬實例係依序操 201245948 作’以產生它們的同步邏輯之額外從騎例。同步邏輯之每個實 例決定至少-混合電源狀態,及遞歸地(除非由—終止條件所二 止,如果有的話)在同步邏輯之尚相步㈣點地遠端核心上更 t進—步㈣起㈣實例㈣,朗可能被鱗之財之每—個核 一都有同步邏輯之同步貫例為止。在發現複合電源狀態等於受限 制的電源狀態時’於-授權核心上執行電源管理邏輯以啟動及/或 加以實現。 在又另一實雌態中,本發明提供微碼,其被編碼在包含分 散式核心敝紐現與上述魏fil賴之純^處理器之實體 核心之電腦可讀取的儲存媒體中。 【實施方式】 於此所说明的係為藉由使用固有的且被複製在每個核心上之 力政式分配邏輯’用以協調、同步、管理以及實現一多核心處理 器上之電源、休眠或操作狀態之系統與方法之實施例。在說明表 示詳細的實施例之每一張圖之前,先將本發明之更一般的適用概 念介紹於下。 I.多層多核心處理器概念 如於此所使用的,一種多核心處理器通常表示一個包含多個 啟動的實體核心之處理器,每個啟動的實體核心被設計成用以提 取、解碼並執行遵循一指令集架構之指令。一般而言,多核心處 理器係藉由一系統匯流排(最後由所有核心所共用)而耦接至一 13 201245948 晶片組,藉賴供期龜騎與各雜置之麵摔作。在某 些實施例t,,㈣匯流排係為—前端匯流排,其係為從處判至 其餘電腦系統之-外部介面。在某些實施例t,晶片組亦對4 用的主記憶體以及-共關圖形控制輯行集中存取。 多核心處理ϋ之核心可能被封裝在包含多重核心之一個或多 個晶片中,如說明於申請案序號_26,之段落中,其申請日 為2010年】2月22日,名稱為”多核心處理器内部旁路匯流排 (Multi-CoreProcessorlntemalBypassBus) ^ 正式(η哪rovisionai)令請案(CNTR25〇3),其係於此併入作灰 如於射所提出的,-種典型的“係為已被域或切割為 早物理實體之—片半導體晶圓,且一般具有至少-組之實體1/〇 妾觸墊例如,某些雙核心晶片具有兩組I/O接觸塾,每一組供其 核、之母-個使用。其他雙核心晶片具有單一組之⑽接觸塾,其 係在其雙核^之間被翻。某些喃 <日片具有兩組W接觸塾, 、、且仏兩組雙核心之每―個用。多重組態是可能的。 再者種多核心處理器亦可能提供一種承載多重晶片之一 封名肢。—種封|體"係為上面置放或安裝有晶片之-基板,此” 封裝體可錢供單—組之接腳,以供連接至—域板以及相關的 處理裔匯流排。封裝體之基板包含將晶片之接難連接至封裝艘 之共用接腳之連線網或佈線(wire nets or traces)。 更進一步的分層之層次是可能的。舉例而言,在封裝體與位 於下方之主機板之間可提供一個額外的層板(以下稱為平 14 201245948 D (platform)),而多個封裝體係設置於此平台上。平台可 —象上述之封裝體’其包含-個基板,此基板具有連接每個封裝 體之接腳解台之共職腳之連線網或佈線。 ’ 心用上述概念,在一實施例中,一種多封體裝處理器可視為 / ㈣2個封裝體設置在一平台上,每個封裝體具有N1個晶片,且 每個晶片具有N0個核心。於此數字N2、N1以及N0辆大於或 等於卜且N2、N1以及N0之至少-者大於或等於2。 Π.核心間傳輸結構 如上所述’非核心但晶片上的硬體協調邏輯(HCL)之使用 以實現要求核心間協調之關活動之—些缺點,包含更複雜的、 較不對稱的且較低良率的晶片設計以及縮放挑戰(scalling anllenge) #代方式係藉由使用晶#組本身來執行所有這種協 调,但XI種方式極可能需要在每個核心與系統匯流排上之晶片組 間進行傳輸,以便傳遞適合數值給晶片組。這種協調基本上亦需 要經由例如BIOS之系統軟體來實現,但這種做法對製造商而言是 有所限制或根本無法控綱。為了克服兩種習知方法之缺點,本 發明之某些貫施例利用在多核心處理器之核心間的旁路連接。這 些旁路連接並未連接至封裝體之實體接腳;因此,它們不會傳送 信號至封裝體外部;經由它們交換之通訊也不會要求系統匯流排 上之對應的傳輸。 舉例而言,如說明於CNTR.2503 ’每個晶片可能提供一條在 [3曰片核心間的旁路匯流排,旁路匯流排並未連接至晶片之實體接 15 201245948 觸墊,因此其亚未傳送㈣離輕核心⑼。旁路匯流排亦提供 核匕間之L賴m續善,並可使核^彼此之傳遞或協調無須使 用系統匯流排。多重變化亦在考量之内。舉例而言,如說明於 CNTR.25G3 -針’—種四核心晶片可能提供—條在兩組雙核心 間之旁路匯流排。或者,如說明独下之—個實施例,—種四核 心晶片可能在-晶片之兩組核心 '之每—個之間提供旁路匯流排, 以及在從兩組所選擇的核心間提供另—條旁路匯流排。在另—實 施例中,-種四核心晶片可能提供在每_個核㈣之核心間旁路 匯流排’如下圖16所述。又,在另一實施例中,一種四核心晶片 可能在第-與第二核心、第二核心與第三核心、第三與第四核心 以及第-與細如之間的核㈣提供純匯流排,而無須提供 在第-與第三核^之間或在第二與第四核心之間的核心、間旁路匯 流排。-麵似的;f路域(即使所述者係分配在兩個雙核心晶 片上之核心間)揭露於申請案序號61/426,47〇之段落中,申請曰 為2010年12月22日,名稱為”共用電源對多核心微處理器之分 配式管理(Distributed Management of a Shared Power Source to aThe target power state is determined by the branching of the W H / 电 电 电 电 电 的 的 的 电 电 电 电 电 电 电 电 电 电 , , 内部 内部 内部 内部 内部 内部 内部 内部 内部 内部 内部 内部 内部 内部 内部 内部 内部 内部 内部 内部 内部 内部 内部 内部 内部 内部For shared resources, the composite target power: two; 实 power state of the real state, it will not interfere with the sharing of resources is considered to be the most energy-efficient power state. ’, any corresponding target power management of the parent core =:::rr—the kind of power supply. The power state definition will affect the shared power state, where the target degree or effect (4) is the source domain. Nuclear = the power supply of its lining operation, the speed of its inclusion does not pass through any 隼 式 非 non-nuclear, ^ power state discovery process, his core power status:: and the sharing of the resource at least one of its composite target The power supply secret is distributed: it is designated as the -Manager 2 core power state discovery process for the purpose of setting the configuration of the common source, and the core drive is used to set the composite of the configuration of the 201245948 resource. The implementation of the target power state. mouth. = Heart Sorrow' The present invention provides a multi-core processor. Multi-core: Rational: Each core contains power state management microcode to set the core group, and (10) decentralized and inter-composite power state discovery process. Power State Management: Micro 1 enables the parent core to read and receive requests for any of the predetermined power states (including - active operating states and one or more progressively less sensitive states) One of the goals is set to be its own configuration. When the core reception request is transformed into a restricted power state (for example, it will interfere with the power source of the other cores), then its power state management microcode starts: the distributed core inter-composite The process, _ decides whether all other affected, such as the power state of the _ has been prepared. If participating in the discovery of the difficult core to confirm that the restricted power is _ is the composite power state, then the authorized person in the core implements or activates the restricted_electrical state via the #power state management microcode. In particular, the drain core will implement the most restrictive or energy efficient operating state' which can be implemented by the core without interfering with the corresponding target operating states of other cores. In another embodiment, the core of each of the cores, the power management microcode, or the normal system is synchronized, and the group is designed to exchange power states with other nodes that are connected to each other. Information to determine the state of the hybrid power supply. Each invoked instance of the synchronization logic is designed to be at least conditionally connected to the core of the node that has not been synchronized (it is connected to the core of itself by the node, and one of the synchronization logics has not yet been synchronized Arouse the generation of the subordinate instance of the 201245948 synchronization logic as part of the composite Wei state. In the implementation of the financial, the nuclear source source is not required to enable its synchronization logic - the local instance can be achieved - the target power state, if the core target power state is not - need to be coordinated with other cores, Power state core ~. Otherwise, the 'power management logic sets the core's untargeted implementation state of the target power state or the non-limiting implementation of the power state (eg, local power save action on the core) and evokes one of its synchronization logics locally. For example, the power state that is restricted by the text should be closed. The maximum domain of the heart begins the composite power state discovery process. In the case of a composite power supply that is found to correspond to a target-restricted power state, it is authorized to implement one of the composite power states, core power management microcode startup (typically the manager core with the greatest impact range) and/or The implementation of the composite power state. In another embodiment, the present invention provides a method of decentralizing a management power supply for use with a multi-core processor, such as the processor described above. The method includes receiving a state transition request for any of the cores to set the configuration of the core ("local core,") according to a target power state. If the target power state is a restricted power state, then executing The power management logic on the local core implements a local instance of the synchronization logic to initiate a decentralized inter-core composite power state discovery process to allow this core to exchange power states with other cores. This method also includes evaluating the discovered power state' and Conditionally responding to the implementation or initiation of a restricted power state. Each local instance of the synchronization logic generates one or more slave instances of the synchronization logic on the connected core at one or more nodes, which are sequentially operated 201245948 As an additional slave instance to generate their synchronization logic. Each instance of the synchronization logic determines at least the state of the hybrid power supply, and recursively (unless by the termination condition, if any) in the synchronization logic Step by step (four) point on the far core of the core is more t-step (four) from (four) instance (four), Lang may be scaled Each of the cores has a synchronous synchronization of synchronization logic. When the composite power state is found to be equal to the restricted power state, the power management logic is executed on the -authorized core to start and/or be implemented. In the female aspect, the present invention provides microcode, which is encoded in a computer readable storage medium comprising a decentralized core and a physical core of the above-described Wei wei zhizhi processor. Illustrated is a system for coordinating, synchronizing, managing, and implementing power, sleep, or operational state on a multi-core processor by using intrinsic force-allocation logic that is replicated on each core. Embodiments of the method. Before describing each of the drawings of the detailed embodiments, a more general applicable concept of the present invention will be described below. I. Multilayer multi-core processor concept as used herein, one more A core processor typically represents a processor that contains multiple activated physical cores, each of which is designed to be extracted, decoded, and executed to follow an instruction set. In general, the multi-core processor is coupled to a 13 201245948 chipset by a system bus (which is ultimately shared by all cores), and is used by the turtles and the miscellaneous faces. In some embodiments t, (4) the busbar is the front-end busbar, which is the external interface from the disciplinary to the rest of the computer system. In some embodiments t, the chipset also uses the main memory for 4. The core and the common-off graphics control are centralized access. The core of the multi-core processing may be encapsulated in one or more chips containing multiple cores, as described in the paragraph _26 of the application, the filing date For 2010] February 22, the name is "Multi-CoreProcessorlnby Busy Bus" (Formal (n) rovisionai order request (CNTR25〇3), which is incorporated into the gray As suggested by the shot, a typical "was a wafer semiconductor wafer that has been domaind or cut into an early physical entity, and typically has at least a group of physical 1/〇妾 touch pads, for example, some dual cores. The wafer has two sets of I/O contacts, each for its Nuclear, mother - one use. Other dual core wafers have a single set of (10) contact turns that are flipped between their dual cores. Some of the squirts have two sets of W-contact 塾, 、, and 仏 two sets of dual-core each. Multiple configurations are possible. Furthermore, a multi-core processor may also provide a one-of-a-kind multi-chip. —The package|body" is a substrate on which the wafer is placed or mounted, and the package can be supplied with a single-group pin for connection to the domain board and associated handler bus. The substrate of the body contains wire nets or traces that connect the pads to the common pins of the package. Further layering levels are possible. For example, in the package and located An additional laminate (hereinafter referred to as ping 14 201245948 D (platform)) can be provided between the lower motherboards, and multiple packaging systems are provided on the platform. The platform can be-like the package described above. a substrate having a wiring network or wiring connecting the legs of each package. The core uses the above concept. In one embodiment, a multi-package processor can be regarded as / (4) 2 The package is disposed on a platform, each package has N1 wafers, and each wafer has N0 cores. The numbers N2, N1, and N0 are greater than or equal to and at least N2, N1, and N0 are greater than Or equal to 2. Π. Inter-core transmission structure such as The use of 'non-core but on-wafer hardware coordination logic (HCL) to achieve the requirements of inter-core coordination activities, including more complex, more asymmetrical and lower yield wafer designs And the scaling challenge (scalling anllenge) #代方式 is to use the crystal # group itself to perform all such coordination, but the XI way is likely to need to transfer between each core and the system bus on the system bus, so that Passing the appropriate value to the chipset. This coordination basically needs to be implemented via a system software such as the BIOS, but this approach is limited or impossible to control for the manufacturer. To overcome the two conventional methods Disadvantages, some embodiments of the present invention utilize bypass connections between the cores of the multi-core processors. These bypass connections are not connected to the physical pins of the package; therefore, they do not transmit signals to the outside of the package. Communication via them will also not require a corresponding transmission on the system bus. For example, as illustrated in CNTR.2503 'Each wafer may provide one in [3 核 nucleus] The bypass busbar between the cores, the bypass busbar is not connected to the physical connection of the chip to the 201245948 touchpad, so its sub-transmission is not transmitted (4) from the light core (9). The bypass busbar also provides the balance between the cores. Good, and can make the nuclear transfer or coordination without using the system bus. Multiple changes are also considered. For example, as explained in CNTR.25G3 - the needle '- kinds of four core chips may provide - two in two a set of bypass busbars between the two cores. Alternatively, as described in an embodiment, a four core chip may provide a bypass bus between each of the two cores of the chip, and An additional bypass busbar is provided between the two selected cores. In another embodiment, a quad core wafer may be provided with a bypass bus between the cores of each of the cores (four) as described in Figure 16 below. Moreover, in another embodiment, a quad core chip may provide pure convergence between the first and second cores, the second core and the third core, the third and fourth cores, and the core (four) between the first and the thin The row does not need to provide a core, inter-bypass busbar between the first and third cores or between the second and fourth cores. - face-like; f-path (even if the system is distributed between the cores on two dual-core wafers) is disclosed in the paragraph number 61/426, 47 of the application, the application is December 22, 2010 Named "Distributed Management of a Shared Power Source to a

Multi-Core Microprocessor )",以及其同時申請的非臨時 (nonprovisional)申請案(CNTR2534)’亦於此併入作參考。 又,本發明考慮到比CNTR.2503之旁路匯流排較不廣泛的核 心間通訊配線組,例如說明於申請案序號61/426,47〇之段落中之 替代貫把例,申清曰為2010年12月22日,名稱為”光罩設置於 改以產生多核心晶片(Reticle Set Modification to pr〇ciUce 16 201245948Multi-Core Microprocessor ", and its non-provisional application (CNTR 2534), which is also filed concurrently, is hereby incorporated by reference. Moreover, the present invention contemplates a less extensive inter-core communication wiring set than the bypass busbar of CNTR.2503, such as the alternative example illustrated in the paragraphs of application Serial No. 61/426, 47, Shen Qingyi December 22, 2010, the name "Photomask is set to change to produce multi-core chips (Reticle Set Modification to pr〇ciUce 16 201245948

Multi-CoreDies)",以及其同時申請的非臨時(n〇npr〇visi〇nai) 申請案(CNTR.2528) ’亦於此併入作參考。核心間通訊配線之一 雜不龐大之肝CNTR.2^4,綠麟场參考。核 心間通訊配線組在包含配線之數目上要儘可能小,只要能用以啟 動如於此所制的協調活動即可。構築在核心、之間的核心間通訊 配線’亦可能依-種類似於以下更進—步_的⑼間通訊線之 方式被設計或配置在核心之間。 再者’-封裝體可能提供在-封錢晶#片間之晶片間通訊 線’而-平台可能提供在平台之封㈣間之封裝體間通訊線。如 以下將更完全說_,“間通訊線之實施可能f要每個晶片上 之至少1貞外貫體輸出接觸墊。囉地,縣體間通訊線之實施 可能需要每侧裝體上之至少—額外實體輸出接觸墊。又,如以 下更進-步制的,某些實施例提供超過—最低限度足夠數目之 輸出接觸塾之㈣輸出接驗,用以在協調核心中提供更大的彈 性。為了讓各種可能的核叫通訊得以實施,較好的方式是他們 都不需要任何一個核心外部之主動邏輯(論e &㈣。如此,本 發明各種實施例可透過使用—種非核心虹或其他主動非核心邏 輯以協調核d實施方式’來提供本發明於此所述的優點。 ΠΙ.階層式概念 再-人重中’本發明之說明除非另有規定,否則並未受限於多 核心多處理H之數個實關’其提供旁路通職線且透過系統匯 流排優先躺_配_協難心,峡實施献許某些構造或 17 201245948 限制活動之實施。在料實施财,㈣實體實施方式係與階層 式協調系統相互搭配,以執行所f的硬體協調。於此所說明之某 些階層式協猶統是非常複雜的。舉例而言,圖卜9、u、^、 14、15、16、18、19、20、2!以及22描述各種階層式協調系統之 多核心處職實施例,其係架構细來促進例如電源狀態管理之 核心間協調活動。此說明書亦提供數個對階層式協辩統之更深 入且柚㈣特性記述,以及甚至更詳盡且複義階層式協調系統 之例子。S1此’在進人用以啟動—構造或關活動之實施的核心 間協調過程之特定實例之說明前,先說明於此考翻的各種階層 式協調系統之各種實施樣態是有益的。 如於此所使用的,-觀層式協⑽'絲示—種為了某些恰 當或預賴動或目的’ 計成以―種至少局部受限或組織 的階層式方式而徠此協調之系統。這種架構即與一相等的點對點 (peer-to-peer)協調系統有所區別,因為其中的每個核心皆享有 同等特權,並可直接與任何其他核^ (以及與晶片組)協調以執 行一恰當活動。舉例而言,節點樹架構下的核心係在某些具限制 之活動下,僅與上層或下層的節點連接核心進行協調,其中的任 兩個節點間只存在有一條單一路徑,於是這種節點樹架構可構成 一嚴密的階層式協調系統。如於此所使用的,除非更嚴格地定義, 否則一階層式協調系統亦包含較為鬆散的階層式之協調系統,例 如一種允許在至少一群組之核心内的點對點協調之系統,其係在 至少兩個核心群組間進行階層式協調。於此呈現嚴密及鬆散的階 201245948 層式協調系統兩者之例子。 於貫施例中,一種階層式協調系統對應至一微處理器中之 核心之一配置,微處理器具有多個封裝體,每個封裝體具有多個 / 晶片,且每個晶片具有多個核心。將每層視為一"域(domain)" ,時是有用的。舉例而言’-種雙核可被視為由其核心所組 成之域種雙晶片封裝體可被視為由其;所組成之—域,以 及又封裝體平台或微處理器可被視為由其封裝體所組成之— 粒將核〜本身刺為一域亦是有用的。這種,,域"之概念化在表示 例如-快取、-電壓源或—時脈源之—龍、上亦是有關,此資 源係由-域之核頌翻,但此資源以別的方法錄該域之近端 (亦即’並未由該域之外部核心所共用)。當然,適合於任何既定 的夕核^處理②之域深度以及每個域之組成者之數目(例如,以 曰曰;UT'為-域’以封裝體係為—域’等等)可依據核心之數目、 匕們的刀層以及各種資源由核心所共用之方式改變並放大或縮 小。 為不同型式之域之間的騎命名亦是有用的。如於此所使用 的’在-種多核心晶片上之所有啟動的實體核心係被視為該晶片 ,之”組成者(C_tituents)”以及彼此之,,共同組成者(㈣。她^ 。同樣地’在—多晶片封裝體上之所有啟動的實體U係被視為 該封裝體德成者以及彼此之共同組成者。又囉地,在一種多 封裝體處如上之所有啟動的實體縣縣被視為該處理器之組 成者以及彼此之共同組成者。再者,這種麵方式可能延伸至像Multi-CoreDies)", and its non-provisional (n〇npr〇visi〇nai) application (CNTR.2528) are also incorporated herein by reference. One of the communication wiring between the cores is not a huge liver CNTR.2^4, Green Lin field reference. The core communication wiring group should be as small as possible in terms of the number of wirings included, as long as it can be used to initiate coordinated activities as described herein. The inter-core communication wiring built between the cores may also be designed or arranged between the cores in a manner similar to the following (9) communication lines. Furthermore, the '-package may provide an inter-chip communication line between the blocks - and the platform may provide an inter-package communication line between the seals (4) of the platform. As will be more fully explained below, the implementation of the inter-communication line may require at least one external output contact pad on each wafer. The implementation of the inter-county communication line may require on each side of the body. At least - the extra entity outputs the contact pads. Again, as further advanced, some embodiments provide a (four) output test that exceeds - a minimum number of output contacts to provide a larger Resilience. In order to allow all kinds of possible nuclear communication to be implemented, it is better that they do not need any core external active logic (on e & (4). Thus, various embodiments of the present invention can be used through non-core Rainbow or other active non-core logic provides the advantages of the invention as described herein in a coordinated core d implementation. 阶层. Hierarchical concept re-personality' The description of the invention is not limited unless otherwise specified In the multi-core multi-processing H, several real-times, which provide bypass through-line and through the system bus, lie in the _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ The implementation of the financial, (4) entity implementation and the hierarchical coordination system to coordinate the implementation of the hardware coordination of the f. Some of the hierarchical associations described here are very complex. For example, Figure 2, u, ^, 14, 15, 16, 18, 19, 20, 2! and 22 describe multi-core administrative embodiments of various hierarchical coordination systems that are structured to facilitate inter-core coordination activities such as power state management. This manual also provides several examples of a more in-depth and poetic (4) characterization of hierarchical associations, and even more detailed and complex hierarchical coordination systems. S1 is used to initiate-construct or close activities. Before describing the specific examples of the inter-core coordination process, it is beneficial to describe the various implementations of the various hierarchical coordination systems that have been examined here. As used herein, the -layer protocol (10)' a system for the coordination of certain appropriate or pre-emptive or purpose's in a hierarchical manner that is at least partially restricted or organized. This architecture is an equal peer-to-peer ) Coordination system has a district Because each of these cores has the same privileges and can coordinate directly with any other core (and with the chipset) to perform an appropriate activity. For example, the core of the node tree architecture is limited in certain Under the activity, only the upper or lower node connection core is coordinated, and only one single path exists between any two nodes, so the node tree architecture can form a strict hierarchical coordination system. Unless defined more rigorously, a hierarchical coordination system also includes a loosely hierarchical hierarchical coordination system, such as a system that allows point-to-point coordination within the core of at least one group, which is tied to at least two core groups. Hierarchical coordination between groups. This presents an example of both the rigorous and loosely ordered 201245948 layered coordination system. In one embodiment, a hierarchical coordination system corresponds to one of the cores in a microprocessor, the microprocessor has a plurality of packages, each package has multiple/wafers, and each wafer has multiple core. It is useful to treat each layer as a "domain". For example, a dual-core dual-core package that can be considered to be composed of its core can be considered to be composed of; the domain, and the package platform or microprocessor can be considered It is also useful for the package to consist of a particle that nucleates itself into a domain. In this way, the concept of "domain" is related to, for example, - cache, - voltage source or - the source of the clock - the dragon is also related. This resource is overturned by the domain - but this resource is otherwise The method records the near end of the domain (ie, 'not shared by the external core of the domain). Of course, the depth of the domain suitable for any given nucleus and the number of components of each domain (for example, 曰曰; UT' is - domain 'encapsulation system is - domain', etc.) can be based on the core The number, our knives, and various resources are changed and enlarged or reduced in a way that is shared by the core. Naming the ride between different types of fields is also useful. As used herein, all activated physical core systems on a multi-core wafer are considered to be the "C_tituents" of the wafer, and each other, and the constituents ((4). Her ^. The same All activated physical U-systems on the 'multi-chip package' are considered to be the same as the package and the other components. In addition, all the entities in the multi-package are activated as above. Be regarded as the constituent of the processor and the members of each other. Again, this way may extend to

S 19 201245948 設有多核心處理器一樣的域深度之多數層次。一般而言,每個非 終端域層久係由一個或多個組成者所定義,每一個组成者包含階 層式構造之下一個較低的域層次。 在呆二夕核心處理态貫施例中,對每個多核心域(例如,對 每個晶片,對每個封裝體,對每個平台等等)而言,其唯一一個 核心係被指定為並設有供該域使用之一”管理者(master) ”之一對 應的功能把關或協調角色。舉例而言,每個多核心晶片之單一核 心(如果有的話)被指定為該晶片之一,,晶片管理者",每個封裝體 之單一核心被指定為該封裝體之一"封裝體管理者"(pM>以及(對 如此成層之-處理器而言)每個平台之單一核心係被指定為供該 平台用之"平台管理者”等等。-般而言,此階層之最高域之管理者 核心作為多核心處理器之唯一的"匯流排服務處理器剛核 心,其中只有BSP被授權以使某些型式之活動與晶片組協調。吾 人可注意到,為了便利性,於此採_如”f理者”之專門用語,且 除’’管理者•’之外之標籤(例如•,委 明這種功嶋。 g〇 更進—步的_敎義在每倾_者核心與核心 預定目喊軸(為其所標柯),核 二 ::=:,,對 _一二=: 種夕核曰片之晶片管理者 晶 ⑽”。-般而言,對於相同日片之細被視為1伴 _^ 4之其他核*之任何-個, 母一個核心係被視為—夥伴。 夕千仁在替代特性記述中,夥伴 20 201245948 指定係被岐為在“管 之間的附屬關係。將這種_性==心晶片之其他核心 晶月管理者核心將具有三個夥至—種四核心晶片, 只具有草-夥伴(晶片管理者核心)7八、之母一個將被視為 理者上之其他管 (_y)"。-hi 之濱核心可能被視為一”同伴 一封穿體技 相_體之彼叫管理者核心, 特二核心係被視為-,但在-替代 之並他管理_ 於―職㈣理者核坤該封裝體 -種四間的购係。將這種替代特性記述應用至 種::片封裝體,核心將具有三個夥伴,^ 轉Γ為只具有單一夥伴㈣核外在又另—種 他圖11中所提出的)中,對於處理器中之其 s里者核心之每一個(包含在處理器之一不同龍體上之管理 核W),一管理者核心係被視為一”同伴,'。 七於下-個域層次(例如,具有這種深度之—種多核心處理器 之平台),對於平台之其他PM核心之每一個,咖(或平台管理 者(maSter))核心係被視為一”好友(chum)"。—般而言’對於 相同平台之彼此PM核心,每一個PM核心係關於—好友。但在 1代躲記述巾,好姑定鎌定於在—Bsp龍體管理者核 心與-平台之其他PM核^之___。將這輯代特性記 述應用至—種崎賴料,卿私將具有三個料,但其他 21 201245948 PM核心之每-個將被視為只具有單—夥伴⑻p)。 上述之夥伴/同伴/好友關係於此-般更被視為”同屬性 ”關係。每個,•夥伴"核心屬於一個同屬性群組,每個" 同伴”核心屬於-較高層級之同屬性群組,以及每個π好友m核心屬 於又更高層級之同屬性群組。換言之,上述階層式協調系統之各 種域定義對應的”同屬性"群組(例如,夥伴之一個或多個群組 '同 伴之群組以及好友之群組)。此外,—特定核心之每個”夥伴” 同伴”以及||好友|,核心(如果有的話)—般可更被視為家族㈤) 丨’核心。 如於此所使用的,一同屬性群組之概念係略不同於-域之概 念。如上所述,-域係由在其域中之所有核心所組成,舉例而言, -封裝體域-㈣由封裝體上之所有核心所組成。相較之下,一 同屬性群組-般係由相對應的域所選擇核心組成,例如,一封裝 體域之對應的同屬性群組僅由封裝體上之管理者核心(並中一個 亦為封裝體管理者如)所構成,而非封裝體上任何—個夥伴核 心所構成…般而言,只有終端多核心域(亦即不具有組成域 之域)將定義-個包含所有核心之對應同屬性群組。舉例而言, -雙核心;-般將定義-終端多核心域,其具有包含晶片二兩 核心之對劇屬性戰。吾人注意耻每個H成衫其自己 般包含位於在本身之近端且未 被其他批所共狀資源’其可藉由各種操作㈣而被設置。 吾人將明白在上述之夥伴/同伴/好友階層,任一非管理者核心 22 201245948 之每個核切是-夥伴,並屬於只由_日日日片上之核心所構成之 單-同屬性群組。每個晶片管理者核心,第―,屬於由相同晶片 上之夥伴核心所組成之最低層次同屬性群組;第二,屬於由相同 :封裝體上之同伴核心所組成之—同屬性群組。每個塊體管理者 ,核心’第-’屬於由_日日0片上之夥伴如顺成之—最低層次 同屬性群組;第二’屬於由相同封裝體上之同伴核心触成之一 同屬性群組;而第三,屬於由相同平台上之好友核心所組成之— 同屬性群組。簡言之’每個核心屬於w同屬性群組,於此w等 於同屬性群組(該核心是一管理者核心)之數目加上!。 ^為了更進一步救述同屬性群組之階層式本質的特徵,任何既 ::::接之近 之取低層次多核心域。在—個例讨,無論一特定 :、片Γ管理者指定核心’其最直接的同屬性群组包含其在 群組,其包含在相 〃第—接近的同屬性 . 裝體上之核心之同伴。mi :理者核心亦將具有包含一之-第三接近的同二 對這種處理哭 、力 彳重、,且成者)將是半獨佔的。亦即, 有核心。°。5 ’沒有既定的同屬性群組將包含該處理器之所 上述之同屬性群組概念甚至可更進一步藉由不同的協調模型 23 201245948 而祕徵化闕性馳询細在其域私之間。如於此 斤使用的在吕理者仲裁的,,同屬性群組中,在核心之間的直接 協調係被限定為在管理者核心及其转理者核心之_協調。在 同雜群組之⑽神理者核心無法彼此直接協調 ,只能間接地 經由管理者核心為之。在同儕合作(peer_e。脑。論e) ”同屬 性群組t,相較之下’關性群組之任何兩麵心可能彼此直接 協凋*無須官理者核心之仲裁。在一同儕合作同屬性群組中, =管理者之-種更功紐地相容專門用語將是"委派",因為其作 協周看守者〜、為了與較高層級域協調,而不為了盥在同 =組織_之_調。吾人脸意到,於此域在―"管理者仲 活動而之⑽性群組是有4義的。—般而言,對某些預定 成者進^核心只可與其同屬性群組之組成者或共同組 有5周’而且對於任何管理者仲裁的同屬性群組而言,僅 如較優的”共同組成者”或較差組成者,得以適用。 ι_層之節點與節點連接的角度說明 _統亦是適當的。〃 白層式協 是多核心處理私按、Γ 一卽點階層係為每個節點 核心)係為根=:2一的一個,其中一個核心(例如娜 調”路徑,,(包含中門:!節點之間存在有—連續不斷的協 ,,至至少另/間即點,如果適合的話)。每個節點料點連接 用到的活動之— 目的節點而非所有其他節點,且為了為協調系統所應 的,只可與”節點連接的”核心協調。為了更進—步 24 201245948 區=些節點連接’於此將把,者核心之_ _地_ 核心、或者看成"附屬家族,'核〜'鳴族'如 係”-核心之節點地連接的"共同成組成者核心,,有所區別丑 同組成者核心”係為並非附屬於本身之節點地連接核心。更進一: 點地瓣娜概心包输理杨 士 n u及其細點地雜之任何崎階級的核心(例 如H同儕協調同屬性群組’核心係為—部分)。又 Γ屬家亀之蝴观亦崎、⑽或"終端”核 到目別為止’卩$層式協調魏於這些域對魅核心、之 不同的巢狀配置已清楚地說明(例如 、體 n a 不冋的域對應至每個適合 、」日日、封裝體以及平台)。舉例而言,圖 ΓΪΓΓΓ不的_式銳縣”與處翻職*之核心之實體 :=狀糊_致。圖22料_翻_雜實 =夕Γ 場fG2,其中-個具有三 ==而其餘具有單核心晶片。然而,與封裝體核心之實 式協=ΓΓ式相符’旁路配線定義-對應的三個層次階層 伴之:===友之封繼理者,相關作為同 3者以及相關作為夥伴之晶片核心。 但是,依據一處理器之核心鬥B u a 線(如蝴話)之_,♦、靡及咖間旁路配 建立,且相較在處理响^之間嶋式嶋統可能被 裝之核心之巢狀實體配置而言,其具 25 201245948 有不同深度及分層,數個這種例子係設置於㈣、Μ、Μ以及^ 中。圖】1顯示具有兩個封裝體之八核心處理器,其中每個封裝體 具有兩個“,而每個晶片具有_核心。在圖n中,設置 二階階層式__之多條纽配線,俾使财辟理者核心可 高層級闕性群組之—部分,且每歸理者核心亦屬於包 ^身及其夥伴之-不同的最低層次同屬㈣組。圖⑽頁示在單 將懷體上之具有四個雙核心“之人核心處。在圖14中, =所需瓣、,哪紅三層靖式細統之多 個曰配線。圖15顯示具有兩個四核心晶片之處理器,於此在每 曰b曰片=之核心間配線需要—二階階層式協調线,以及在每個 L比=理者(亦即,好友)之間提供多條晶片間配線來作為第 體層次之協調。圖21顯示類似圖22具有兩個不對稱封裝 ^ -歡如處職,其中—個不對稱封裝體具有三個雙核 及封^另個具有單一雙核心晶片。但是,如同圖11,晶片間 統,’ ^ Μ旁路配線係提供以協助核心間之二階階層式協調系 兩個封魏上之所有的管理者核心係為相同的同屬性群 、' I部分。 望被應、。斤述’不同深度與協調模型之階層式協調系統,可依期 假若其^適用於提供作為—多核心處理11之共㈣源之分佈, 牛!^與多核心處理器之構造能力與限制相符的話。☆了更進一 心晶片 種設置足夠的旁路通訊配線以協助每個四核 曰曰之所有核心間的同儕合作協調模型之處理器。然而,在圖 26 201245948 心而建立 無觸難為每個吨心晶片之核 而:=外,如圖15所顯示的,具有兩個夥 ㈣ 者同屬性群組之—多層次協調階層,如果需要的話, 亦可^由使用(為了為協調系統所應用之活動之目的)少於所 有可传到的核W間配線而為圖16之四核心微處理器之核心而建立 之。因為圖】6中之每個四核心晶片提供在每一個其核心之間的旁 路配線,所以晶片魏夠協助階層式協調纽之所有三種型式。 一般而言’不管域、闕性群組以及多核△、處理H節點之本 質與數目為何’每個域中只有唯--個核心可被指定為該域以及 對應的同>1性群組之管理者。域可具有組成域 (constituent domain),再者’每個域以及對應的同屬性群組中只有一個核心將 被指定為該域之管理者。協調系統之最高級核心亦被稱為一”根節 點n 〇 IV.電源狀態管理 在介紹關於多核心組態、旁路通訊能力以及階層式關係之各 種概念以後,現在此說明書介紹關於電源狀態管理系統之特定考 慮的實施例之某些概念。然而,吾人應該明白到,本發明係適用 於除了電源狀態管理以外之多樣化活動之協調。 在此所說明之分配式多核心電源管理實施例中’多核心處理 器之每個核心包含分散式與分配式可計量電源管理邏輯’其複製 於每個核心上之一個或多個微碼常駐常式中。電源管理邏輯係可 操作以接收一目標電源狀態,確定其是否為一受限制的電源狀 27 201245948 態,啟動包含核心間協調之一複合電源狀態發現過程,並適當地 反應。 一般而言,一目標狀態係為任何需求或期望的預定操作狀態 (例如c-狀態、P_狀態、電壓ID (VID)值或時脈比率值)之其 中一個等級。一般而言,一預定群組之操作狀態界定包含多個處 理器操作狀態’其基於—個或多個電源、電磨、頻率、性能、操 作、響應性、共用資源或限制實現特徵而訂定。相對於—處理: 之其他期望的㈣賴,操作狀態可祕提供以最錢管理電源。 於-貫施例中,預定操作狀態包含一有效操作狀態(例如⑶ 狀態)及多個漸進地較不有效或敏感的㈣(例如Ch C2,〇 等狀悲)。如於此使㈣’―漸進地較不敏感的或有效狀態表示一 種相對於更有效或敏感的狀態之節省·之配置或操作狀態,或 相對不太敏感的(例如,較慢、較不完全啟動、無法執行例如存 取例如快取記憶體資源、或較易休眠及較難喚醒)。於某些實施例 中,基於衍生自或兼容於ACPI規格,預定操作狀態構成但並非· 要受限於C·狀態或休眠祕。於其他實施财,職操作狀態ς 成或包含各種電壓及頻率狀態(例如,漸進地較低電壓及/或較低 頻率狀態)’或兩者。又’一組預定操作狀態可能包含各種可程式 似喿作配置(或由其組成),例如強迫指令依據執程式順序喊 订強制母時酿周期只能發出一個指令、每時脈周期中只格式化 =指令、每時脈周期只轉換單一微指令、每時脈周期只引退單 —指令、及/树列形式存取各種快取記憶體,制的技術例如 28 201245948 5兒明於美國申請案序號61/469,515者’申請曰為2011年3月30 日,名稱為”經由每時脈操作之減少之指令執行狀態電源節約 (Running State Power Saving Via Reduced Instructions Per Clock , Operation) I· (CNTR.2550),其於此併入作參考。 ··, 吾人可理解,微處理器可能依據不同的、及獨立組或部分獨 立之操作狀態集合而配置。影響電源消耗、性能及/或響應性之各 種操作配置可被分§&到不同等級之電源狀態,每個等級可依據— 對應的階層式協㈣統而獨立實施,*每⑽統具有其本身的獨 立界定之域、域管理者及同屬性群組協調模型。 一般而言’―個預定操作狀態之等級可被分成至少兩個類 別:⑴主要之本地操作狀態(pred⑽㈤卿low叩咖〜 她s),其邱_餅核心极之資源,或在—般的實際應用 下’主要只影響到特定核心之性能;及⑵受限制之操作狀態 (restncted 〇perating states),其將衝擊一個或多個由其他核心共用 之資源’或在-般的實際應用τ,其相對地更有可能干擾其他核 /·此衝}共用貝源之知作狀態係相關於干擾共享該資源之 細核心的電源、性能,效率或響應性的相對較大的可能性。近 =狀紅實現-般而言㈣要與其雜心協調,或獲得來 债1 Γ協歇允許才進行。相較之Τ ’限制操作狀態之實現 更舄要與其他核心進行協調及許可。 =進輯姆,糊卿可被分㈣階層式類 J 各種貧源是如何共用及共用之程度。例如,-第一紐S 19 201245948 has the same level of domain depth as multi-core processors. In general, each non-terminal domain layer is defined by one or more constituents, each of which contains a lower domain level below the hierarchical structure. In the case of the core processing, for each multi-core domain (for example, for each wafer, for each package, for each platform, etc.), the only core system is designated as There is also a function to check or coordinate the role for one of the "masters" used by the domain. For example, a single core (if any) of each multi-core wafer is designated as one of the wafers, and the wafer manager ", a single core of each package is designated as one of the packages" The package manager "(pM> and (for such a layer-processor) a single core system for each platform is designated as the "platform manager" for the platform, etc. - in general, The top-level manager core of this class is the only "bus service processor core of the multi-core processor, of which only the BSP is authorized to coordinate certain types of activities with the chipset. We can note that Convenience, in this _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ In each of the cores and cores, the target axis is called (for its mark), the core 2::=:,, the pair of _1 ==: the chip manager crystal (10) of the seed chip. In other words, the fine of the same day is regarded as any one of the other cores of the _^4, A core system is regarded as a partner. In the description of the substitution characteristics, Xia Qianren, the partner 20 201245948 designation system is reduced to the "affiliation between the tubes. This kind of _ sex == other core crystal moon management of the heart chip The core will have three gangs to a four-core chip, only with a grass-partner (wafer manager core). The mother of one will be treated as the other manager (_y)"-hi The core of the shore may be regarded as a "companion", a body of the body, a core of the manager, and a core of the second is regarded as - but in the case of - replacement, he manages _ _ _ (4) The package is a four-part series. This alternative feature description is applied to the species: the chip package, the core will have three partners, and the conversion will be only a single partner (four) nuclear and another kind of In the FIG. 11), for each of its cores in the processor (including the management core W on a different dragon body of the processor), a manager core system is regarded as a "companion" , '. Seven to the next domain level (for example, with this depth - a multi-core processor level ), for each of the other PM cores of the platform, the coffee (or platform manager (maSter)) core is considered a "chum". - Generally speaking, for each PM core of the same platform, each A PM core system is about - friends. But in the 1st generation, you can hide the towel, so you can decide on the ___ of the other cores of the Bsp dragon body manager and the platform. Apply this feature description to - Kawasaki is expected to have three materials, but each of the other 21 201245948 PM cores will be considered to have only a single partner (8) p). The above-mentioned partner/companion/friend relationship is generally regarded as a "same attribute" relationship. Each, • Partner " core belongs to a group of the same attribute, each "companion" core belongs to the same attribute group of the higher level, and each π-friend m core belongs to the same attribute group of the higher level In other words, the various domain definitions of the hierarchical coordination system described above correspond to the "same attribute" group (for example, a group of one or more groups of partners and a group of friends). In addition, each of the "partners" of the specific core and the ||friends|, the core (if any) can be more regarded as the family (five) 丨' core. As used here, the same attribute group The concept of a group is slightly different from the concept of a domain. As mentioned above, a domain is composed of all the cores in its domain. For example, the -package domain-(d) consists of all the cores on the package. In contrast, the same attribute group is generally composed of the core selected by the corresponding domain. For example, the corresponding attribute group of a package domain is only the manager core on the package (and one of them is also The package manager is composed of, instead of any partner cores on the package. In general, only the terminal multi-core domain (that is, the domain without the constituent domain) will be defined - one containing all cores. The same attribute group. For example, - dual core; - will define - terminal multi-core domain, which has a pair of wafers and two cores of the play attribute battle. I pay attention to each of the H-shirts are included in their own Near-end itself and not in other batches The resource 'can be set by various operations (4). We will understand that in the above-mentioned partner/companion/friend level, any non-manager core 22 201245948 is a - partner and belongs to only _ day The single-same attribute group formed by the core of the daily film. Each chip manager core, the first, belongs to the lowest level of the same attribute group composed of the partner cores on the same wafer; the second, belongs to the same: The companion core on the package consists of the same attribute group. Each block manager, the core 'the-' belongs to the partner of the _ day and day 0, such as Shuncheng, the lowest level with the same attribute group; 'Belongs to the same attribute group that is touched by the peer core on the same package; and third, belongs to the same attribute group composed of the friend cores on the same platform. In short, 'each core belongs to the same attribute Group, where w equals the number of the same attribute group (the core is a manager core) plus!. ^ In order to further rescue the hierarchical nature of the same attribute group, any ::::: Low-level multicore In the case of a case, regardless of a specific:, the film manager specifies the core 'its most direct same attribute group contains its group, which is contained in the same - close to the same attribute. The core companion. mi: the core of the rationale will also have a one-third close to the same pair of crying, strong, and the other) will be semi-exclusive. That is, there is a core. 5 'There is no established group of the same attribute that will contain the above-mentioned concept of the same attribute group of the processor. Even further, the different coordination model 23 201245948 and the secret ambiguity will be fined between its domain private. For example, in this case, in the same attribute group, the direct coordination between the cores is limited to the coordination of the core of the manager and the core of the controller. (10) The core of the theologians cannot directly coordinate with each other, and can only be indirectly through the core of the manager. In the peer cooperation (peer_e. brain. on e) "with the attribute group t, in contrast, any two sides of the 'closed group may directly agree with each other * without the arbitration of the core of the official. In the group, the = manager's kind-competitive term will be "delegation" because it is a coordinator of the week~, in order to coordinate with the higher-level domain, not for the same organization _ _ tune. My face is that this domain is in the "" manager's secondary activities (10) sex group has 4 meanings. - Generally speaking, for some predetermined members into the core can only be It is 5 weeks for the members or common groups of the same attribute group, and for the same attribute group arbitrated by any manager, only the preferred "common component" or poorer component can be applied. The point of view of the connection between the node and the node is also appropriate. 〃 The white layer is the multi-core processing private button, and the 阶层 point hierarchy is the core of each node. The system is root =: 2 one, one of them Core (such as Na Tune) path, (including the middle door:! There is a - between the nodes Continued coordination, to at least another / point, if appropriate.) Each node is connected to the activity used - the destination node and not all other nodes, and only for the coordination system, only Coordination with the "node-connected" core. In order to go further - step 24 201245948 area = some node connection 'here will be, the core _ _ ground _ core, or as a subsidiary family, 'nuclear ~ 'Ming 'If the system is connected to the core of the core node, the core of the constituents, the difference between the ugly and the core of the constituents is the core that is not attached to the node itself. More: One point and the bottom of the heart The Yang Shi nu and its fine-grained core of any Saki class (for example, H is coordinating with the same attribute group 'core system is part--). It is also a family member of the 观 、, (10) or "terminal" As far as the eyesight is concerned, '卩$ layered coordination Wei has clearly explained the different nesting configurations of these domains to the charm core (for example, the domain of the body does not correspond to each suitable," day, package And the platform). For example, the map does not The entity of the core of the _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ However, it is consistent with the real-world coordination of the package core. 'Bypass wiring definition-corresponding three levels of hierarchy: === 友封封者, related as the same 3 and related as the partner's chip core However, according to the core doubling line of a processor (such as the butterfly), the ♦, 靡 and 咖 bypass settings are established, and the core system may be installed between the processing and the ringing system. In terms of nested entity configuration, it has 25 201245948 with different depths and layers, and several such examples are set in (4), Μ, Μ, and ^. Figure 1 shows an eight-core processor with two packages, each of which has two ", and each wafer has a _ core. In Figure n, a second-order hierarchical __ multi-link wiring is set.核心 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财 财It has four cores of "double core". In Fig. 14, = the desired lobes, and the red tiers of the three layers of 靖 细. Figure 15 shows a processor with two quad-core chips, where the wiring between the cores of each =b曰=-the second-order hierarchical coordination line, and at each L ratio= (ie, buddy) A plurality of inter-wafer wirings are provided as a coordination of the first level. Figure 21 shows a similar package 22 with two asymmetric packages, where the asymmetric package has three dual cores and the other has a single dual core wafer. However, as shown in Figure 11, the inter-wafer system, '^ Μ bypass wiring system is provided to assist the second-order hierarchical coordination between the cores. All the manager core systems of the two seals are the same homogeneous group, 'I part . Hope to be accepted. The hierarchical coordination system of different depths and coordination models can be applied according to the period of time. If it is suitable for providing the distribution of the common (four) source as the multi-core processing 11, the construction ability and limit of the multi-core processor are consistent with the limit. if. ☆ More in-depth chipset Set up enough bypass communication wiring to assist the processor of the peer coordination model of all cores of each quad-core. However, in Figure 26 201245948, the core is created for the core of each ton of heart chips: =, as shown in Figure 15, there are two groups (four) with the same attribute group - multi-level coordination level, if needed It can also be established for the core of the core microprocessor of Figure 16 by using (in order to coordinate the activities applied by the system) less than all of the core W wiring that can be passed. Since each of the quad core chips in Figure 6 provides bypass wiring between each of its cores, the wafers are capable of assisting all three types of hierarchical coordination. In general, 'regardless of the domain, the singularity group, and the multi-core △, the nature and number of processing H nodes', only the only core in each domain can be designated as the domain and the management of the corresponding >1 sex group By. The domain may have a constituent domain, and then each domain and only one core of the corresponding homogeneous group will be designated as the administrator of the domain. The most advanced core of the coordination system is also called a "root node n 〇 IV. Power State Management After introducing various concepts about multi-core configuration, bypass communication capabilities, and hierarchical relationships, this manual now describes power state management. Certain concepts of particular contemplated embodiments of the system. However, it should be understood that the present invention is applicable to the coordination of diverse activities other than power state management. In the distributed multi-core power management embodiment described herein, 'Each core of a multi-core processor includes decentralized and distributed meterable power management logic' that replicates in one or more microcode resident routines on each core. Power management logic is operable to receive a target The power state determines whether it is a restricted power supply state, and initiates a composite power state discovery process that includes coordination between cores and reacts appropriately. In general, a target state is any demand or desired schedule. One of the operational states (such as c-state, P_state, voltage ID (VID) value, or clock ratio value) In general, the operational state definition of a predetermined group includes multiple processor operational states 'based on one or more power, electrical grind, frequency, performance, operation, responsiveness, shared resources, or restricted implementation features. Relative to - processing: other desired (four) depends, the operating state can provide the most money to manage the power. In the example, the predetermined operating state includes a valid operating state (such as (3) state) and multiple progressive The ground is less effective or sensitive (4) (for example, Ch C2, 〇 〇 )). As such, (4) '- progressively less sensitive or effective state indicates a saving relative to a more efficient or sensitive state. Or operational state, or relatively less sensitive (eg, slower, less fully activated, unable to perform, for example, access, such as caching memory resources, or easier to sleep, and more difficult to wake up). In some embodiments, Based on derived or compatible with ACPI specifications, the predetermined operational state constitutes, but is not limited to, the C. state or the dormant secret. In other implementations, the operational state of the service constitutes or contains various voltages and frequencies. State (eg, progressively lower voltage and/or lower frequency state) 'or both. Again' a set of predetermined operational states may include (or consist of) a variety of programmable configurations, such as forcing instructions The program sequence is called to force the parent time cycle to only issue one command, only format=instruction per clock cycle, only convert a single microinstruction per clock cycle, and only retire the order per clock cycle—instruction, and/or tree column Form access to various cache memories, such as the technology of 28 201245948 5 children in the United States application number 61/469, 515 'applications for March 30, 2011, the name is "reduced instructions per clock operation Running State Power Saving Via Reduced Instructions Per Clock, Operation I (CNTR.2550), which is incorporated herein by reference. ···, as we understand, the microprocessor may be configured according to different, and independent or partially independent sets of operational states. Various operational configurations that affect power consumption, performance, and/or responsiveness can be divided into different levels of power states, each level can be implemented independently based on the corresponding hierarchical association (4), *each (10) has its Its own independently defined domain, domain manager and coordinating model of the same attribute group. Generally speaking, the level of a predetermined operational state can be divided into at least two categories: (1) the main local operational state (pred (10) (five) qing 叩 叩 〜 ~ her s), its Qi _ cake core resources, or in general In practice, 'mainly affects only the performance of a particular core; and (2) the restricted operating state (restncted 〇perating states), which will impact one or more resources shared by other cores' or in the actual application τ, It is relatively more likely to interfere with other cores. The state of knowledge of the common source is related to the relatively large possibility of interference with the power, performance, efficiency or responsiveness of the thin core sharing the resource. Near = red realization - in general (4) to coordinate with their miscellaneous, or to obtain the debt 1 Γ 歇 允许 allow. In contrast to the implementation of the restricted operating state, it is more important to coordinate and license with other cores. =Into the gram, the paste can be divided into (four) hierarchical class J How the various poor sources are shared and shared. For example, - first

S 29 201245948 操作狀態可能定義位於一核心之本地資源之配置、一第二組操作 狀態可能定義由一晶片之核心共用但不位於該晶片本地資源配 置 第二組操作狀態可能定義由一封裝體之核心共用之資源之 配置…等。一操作狀態之實現需要與在應用之操作狀態組態下共 享資源之核心進行協調並取得其許可。 一般而言,一種關於任何既定域之複合操作狀態係為一個屬 於該域之每個啟動實體核心之應用操作狀態的極值(亦即最大或 最小值)。於—實闕t,―實難^之應職作狀態係為核心之 最近且仍然正確的目標或需求之操作狀態(如果有的話),或者, 如果核心並不具有—最近的正確的目標或需求之操作狀態的話, 貫體核心之應用操作狀態為某些預設值。預設值可能是零(例如 複合操作狀態被計算為最小值的狀況)、預定操作狀態之最大值 (例如複合操作狀態被計算為最大值的狀況)、或者核心之目前實 施之操作狀態。於-實施例中,一核心之應用操作狀態係為一電 源或操作狀態,例如核心所期望的或需求之電壓ID (VID)或時 脈比率值。於另—實補中,―批之應祕作狀態為核心已經 從所應用的系統軟體接收的最近的有效c_狀態。 在另一實施例中,一實體核心之應用操作狀態係為核心的最 近的仍然正確的目標或需求之操作狀態之極值(如果有的話),以 及將影響餅最高域(如果有的話,料為此最高域具有管理者 憑證)之本地資源之最極端操作狀態。 因此,關於處理器之複合操作狀態整體看來將是該處理器之 30 201245948 所有的啟動實體如之顧電_態之最域絲小值。一種封 裝體之複合電源狀態將是_裝體之所有啟動實體核心所應用之 電源狀態之最大值或最小值。一種晶片之複合電源狀態將是該晶 片之所有啟動實體糾之應職源狀態之最大值或最小值。 〜說明於狀分散式電祕鮮理實施射,每麵心的電源 管理邏輯之-部分或常·朗步邏輯,其被設計成至少有條件 ,與其他節點地連接核^(亦即,同—同屬性群組之其他核心) ^換電源狀m訊’以決定—混合電源狀態。―種混合電源狀態 係為對應於本地(native)簡步邏輯之至少—節點地連結實例之 =心的應用電源狀態之—極值。在某些非必要的情況下,由一同 步吊式计算及細之-混合電源狀態將準確地對應至關於一應用 域之複合電源狀態。 一每侧步邏輯之被喚醒實例(inv〇ked丨驗⑽)係被設計成在 尚未同步之ii點地連接的核4至少有條件地產生同步邏輯之從 屬员例’此係開始於最立即同屬性群組之節點地連接核心,並繼 續漸進地較高層級關性群組之節點地連接核心(如果有的話, 將進行至同步邏輯實例所屬之核心)。尚未同步的節點地連接核心 係為節點地連接至本身之核心,其同步邏輯同步化實例尚未被實 施為一複合電源狀態發現過程之一部分。 、,b種在同步㉖輯之每個實例所進行之發;見過程,將遞歸地於 尚未同步的節點地遠端核心,更進—步地產生(至少有條件地) 5 輯之彳<屬貫例,直到所應用之潛在被衝擊域 31 201245948 potentially impact domain)之每—個核心上,皆有同步邏輯之同步 化之實例在執行為止。在關於所應用域之複合電源狀態之發現程 序中,執行於-核^上之電源管理邏輯之實例,被財為授權予 啟動或執行關於該域之複合電源狀態之實現、且可啟動 現之能力。 μ V.特定說明的實施例 現在將注意力轉至圖所顯示之特定實施例。 於一實施例中,同步邏輯之每個實例經由與系統匯流排不同 之旁路軌或旁通匯流排線(核心間通訊配線112、晶片間通訊配 線118以及封裝體間通訊配線1133)與其他核心上之邏輯之同步 化實例進行龍’用以湘―種分散式之分配方錢行電騎 理。這允許核心可實體地設置在多重晶片上或在多重封裝體=吕 错以可能崎低晶片尺寸並改善良率,且提供线巾之核心數之 高度擴充性(sealability ),而不會對現代賴處理器之晶片與封裝 體之接觸墊與接腳限制造成影響。 現在參考圖1所顯示之方塊圖,其顯示依據本發明執行分配 ^一多核心微處理器搬之多重處理核心之間的分散式電源 官理之電腦线1GG之實關。系統包含藉由— 116輕接至多核心微處理器搬之單一晶片組114。多核心微處理 器102封裝體包含兩個以晶片〇及晶片1表示之雙核心晶片104。 ,片谢係錢於封裝體之一基板上。基板包含配線網(或只簡 单稱為’’配線")或者麵,其將“1()4之接娜連接至封裝體 32 201245948 102之接腳。接腳可能因其他原因而連接至匯流排116。基板配線 亦包含連接在晶片104間之晶片間通訊配線丨丨8(以下討論更多的) 以促進它們之間的通訊’肋執行分配在乡核⑽處職1〇2之 核心106間的分散式電源管理。 每一個雙核心晶片104包含兩個處理核心1〇6,晶片〇包含核 心0及核心1,而晶片1包含核心2及核心3。每個晶片1〇4具有 -被指定的管理者核^應。於圖丨之本實施射,核心、〇係為晶 片〇之管理者核心106 ’而核心2係為晶片!之管理者核心1〇6。 於一實施例中’每個核心106包含配置熔絲((:〇11£^11加1〇11613以), 晶片104之製造商可能燒斷配置熔絲以標示核心1〇6何者係為晶 片104之管理者核心。此外’晶片1〇4之製造商可能燒斷配置炫 絲以對每個核心106指定其實例,亦即,核心1〇6中哪一個為核 心0、核心1、核心2或核心3。如上所述,專門用語"夥伴"係表 不在相同晶片104上且彼此溝通之的核心1〇6 ;因此,於圖〗之本 實施例中,核心0及核心丨係為夥伴,而核心2及核心3係為夥 伴。專門用語”同伴”於此係表示在不同晶片1〇4上且彼此溝通的管 理者核心106 ;因此,於圖1之本實施例中,核心〇及核心2係為 同伴。在一實施例中,偶數核心106係為每個晶月1〇4之管理者 核心。在一實施例中,核心〇係標示為多核心微處理器1〇2之啟 動服務處理器(boot service processor (BSP)),其單獨被授權以 與晶片組114協調某些限制活動,包含允許某些複合電源狀熊之 實現。在一實施例中,BSP核心106通知晶片組114並要求其允 it 33 201245948 許㈣排U6時脈之移除以減少電源消耗、及/或避免在匯流排n6 上產生窺探周期,-如後續於圖3之方塊322所討論的。於一實 鈀例中’ BSP係為核心1〇6,其匯流排要求輸出係祕至匯流排 116上之BREQ0信號。 在每個晶片104之内的兩個核心1〇6經由位於晶片1〇4内部 之核心間通訊配線112進行通訊。更明確而言,核心間通訊配線 112允許在-晶片1()4之内的核心、廳彼此中斷,並彼此傳遞訊息 用以執行分配在純讀處理n搬之核^ 1G6間的分散式電源 管理。於一實施例中’核心、間通訊配線112包含平行匯流排。於 一實施例中,核心間通訊配線112係類似於說明於CNTR2528者。 此外,核心106經由晶片間通訊配線118進行通訊。更明確 而言,晶片間通訊配線118允許個別的晶片104上之管理者核心 106彼此中崎,並彼此傳遞讯息以執行分配在多核心微處理器jo] 之核心⑽_分散式電源管理。於—實施例中,晶片間通訊配 線118以匯流排116時脈頻率執行。於一實施例中’核心1〇6傳 輸32位元訊息至彼此。在傳送或廣播時,核心1〇6在—匯流排 週期中於晶片間通訊配線118之單一配線上進行設置,用以表示 其即將傳輸一訊息,然後在接下來的31個匯流排116週期上傳送 31位元之序列。於母個晶片間通訊配線η8之末端為_ &位元移 位暫存器,其累積所接收的單一位元而成32位元之訊息。於一實 ^例中,32位元afl息包含多個資訊欄(fieid )。一個資訊攔載明依 據說明於CNTR.2534中之所共用的VRM分配式管理機制而使用 34 201245948 之一 7位元需求的VI〇值。其他資訊攔包含關於電源狀態(例如 C-狀態)同步之訊息,例如c-狀態要求值與確認,其係在核心1〇6 之間交換,如於此所纣論的。此外,一特殊訊息值可使一傳送其 值的核心106中斷一接收其值的核心106。 於圖1之實施例中’每個晶片104包含分別柄接至四個接腳 (以”ΡΓ、”P2"、P3”以及"P4”表示)之四個接觸墊1〇8。關於四 個接觸墊108,其中一個為輸出接觸墊(以〃〇υτ"表示),而另外 三個為輸入接觸墊(以IN 1、IN 2以及IN 3表示)。晶片間通訊 配線118係被έ又a十如下。晶片〇之OUT接觸塾與晶片J之取^ 接觸墊經由單-配線網耦接至接腳P1;晶m ουτ接觸塾與 晶片0之ΙΝ3接觸錢經由單„g己線網耦,接至接腳Ρ2;晶片〇之 IN 2接觸墊與晶片1之ΙΝ 3接觸墊經由單一配線網_接至接腳 P3;而晶片0之IN 1接觸墊與晶片接觸塾經由單一配線 _接至接腳P4。於-實施例中,核心腸在其所傳輸之離開〇υτ 接觸墊108至晶片間通訊配線118 (或如以下於圖u所說明之封 裝體間通瓶線1133)的每個訊息裡包含—酬碼。此識別碼獨 特地確認此訊息預定到達的目標核心、1〇6,在此所說明之實施例 (其中此訊息被廣播至多$接受者核心、1Q6)中是有用的。於一實 施例中’每個晶片104係依據在多核心微處理器1〇2製造期間所 燒斷之配置騎,而將四個接觸墊應之其中—姻旨定為輸出接 觸墊(OUT)。 當晶片0之管理者核^ 0想要與“丨之管理者核心2進行 35 201245948 通訊時,將在其OUT接觸墊上之資訊傳輸至晶片丨之巩i接觸 墊;同樣地’當晶片1之管理者核心2想要與晶片G之管理者核 心0進行通訊時’將在其0UT接觸墊上之資訊傳輸至晶片〇之^ 3接觸墊。因此,於圖!之實施例中,每個晶片1〇4只需要一個輸 入接觸墊108而非三個。然而,製造具有三個輸入接觸墊1〇8之 曰曰片104之項優點為其允許在圖1之四核心多核心微處理器 以及例如圖9所示之八核心多核心微處理器9〇2中的相同晶片1 〇4 得以被設計。此外,於圖丨之本實施例中,兩個接腳ρ是不需要 的。然而’製造具有四個接腳Ρ之晶片刚之—項優點為其允許 在圖一的相同四核心微處理器102被設計成單一四核心微處理器 1〇2、而例如圖11所示之具有兩個四核心微處理器11〇2可被設計 為之八核心系統Π00。然而,如顯示於圖12與14至16之四核心 實軛例中,可考慮移除未使用的接腳Ρ與接觸墊108,以在需要時 減>、接觸塾以及接腳數。此外,例如顯示於圖19與2〇之本實施 例中之雙如實施例,亦可依據需要而考慮移除未使用的接腳ρ -、接觸塾1〇8以減少接觸塾以及接腳數、或為其他目的而被部署。 在一實施例中,匯流排116包含允許晶片組114與多核心微 處理器102經由類似於熟知之Pentium 4匯流排協定之匯流排協定 傳遞之數個信號。匯流排116包含由晶片組114提供給多核心微 處理器102之-匯流排時脈信號,核心、1〇6使用其以產生内部核 “時脈信號,其頻率一般為匯流排區塊頻率之比率。匯流排116 ''匕έ STPCLK k號(被晶片組1 μ設置)’以要求核心1 〇6允 36 201245948 許以移除匯流排時脈信號,亦即允許以停止提供匯流排時脈信 號。多核心微處理器102從—預先決定的1/〇連接埠位址執行在匯 流排116上之-I/O讀取傳輸(只有其中一個核心廳執行它), 以指示晶片組114可設置STPCLK。如以下所討論的,多重核心 106經由核心間通訊配線m與晶片間通訊配線ιΐ8而彼此溝通, 用以決定單-核心106何時可執行1/〇讀取傳輸是有好處的。在一 實施例中,在晶片組114設置STpcLK後,每一個核心⑽發佈 - STOP GRANT訊息給晶片組114 ; 一旦每個核心舰已發佈一 S10P GRANT訊息後,晶片組114就可移除匯流排時脈。在另一 實施例中,晶片組m具有一配置選擇,以使其在其移除匯流排 時脈之前只期望來自多核心微處理器1〇2之單—的sT〇p G讀丁 訊息。 現在參考® 2所顯枚方制,其詳細顯秘據本發明圖i 之核心106之其中-個典型實例。依據一個實施例,核心刚微 結構包含功能單政—超純量(亭酿心)、義序執行管線。 -指令快取202快取從-系統記憶體提取之指令(未顯示)。一指 令譯媽器204係減以接收來自指令快取搬之指令(例如邊 指令集架構指令)。一註冊別名表(RAT) 212 _接以接收來自 指令譯碼器204及來自-微序列器施之譯碼微指令,並產生譯 碼微指令之依存資訊。保留站214係墟以接收來自纽犯之 譯碼微指令以及依存資訊。執行單元216係輪以接收來自保留 站214之譯碼翻令並接收供譯碼微齡所使用之指令運算元。 37 201245948 .i來自核〜1Q6之暫存器(例如通用暫存n·及可讀取且 可寫入的^別模組暫存器(_ 238,以及來_妾至執行單元 貝料決取222。-引退單元2丨8絲接以接收由執行單元 216傳來之&令執行絲,並將雜行絲服雜㈣6之架構 狀態。資料快取222係墟至一匯流排介面單元(BIU) 224,作 為核心⑽連接至圖〗隨排m之介面。—鎖相迴路(叫挪 接收來自匯流排116之匯流排時脈信號,並據以產生一核心時脈 信號242予核心、106之各種功能單元。虹226可經由執行單元 216而受控制,例如被禁能。 執行單元2!6接收- BSP指示碼现卩及一管理者指示碼 232,其分別表示核心廳是否為晶片崩之管理者核心與多核心 微處理器102之BSP核心。如上所述,BSp#示碼挪與管理者 指示碼232可能包含可程式化溶絲。於一實施例中,卿指示碼 228與管理者指示碼232係儲存於一特別模組暫存器⑽幻顶 中’其首先由可程式化溶絲值取出,但其可能藉由軟體寫入至順 238而被更新。執行單元216亦讀取並寫入控制與狀態暫存器 (CSR) 234與236,用以與其他核心1〇6溝通。尤其,核心1〇6 使用CSR 236,用以經由核心間通訊配線112而與相同晶片刚 上之核心106溝通,且核心使用CSR 234,用以透過接觸塾 108經由晶片間通訊配線ns而與其他晶片1〇4上之核心1〇6溝 通,如以下詳細說明的。 微序列器206包含一微碼記憶體2〇7,其被設計以儲存包含電 38 201245948 源g理璉輯微媽2〇8之 專門用往”料踩”主 a傾路曰的目的,於此所使用之 枝〇 κ)6心 相_如⑽輯行之指令,其執行通知 間轉域-電源管理相關的狀態(於此稱為—休眠狀態、 ‘C麵或電源狀態)之架構指令(例如娜他指令> 亦即’―狀態轉變指令之實例是核心1G0特有的, 轉變指令實例所執行之微碼係在該核心106上執行。^ =062=細’因的們每個射__令餘構並被科 執夕^來自指令集架構指令之使用者程式。除了核心106以 t夕k微處理器搬可能包含一附屬或服務處理器(未顯示), 八亚不具有與核心106相同的指令集架構。然而,在本發明中, 核心⑽本身(並非附屬或服務處理器且非任何其他非核心邏輯 兀件)執行分配在多核心微處理器102之多重處理核。廳間的 分散式電源管理,以因應狀_變指令,其較—種代表核心執行 電源管理之專用硬體設収有利地提供更_ (尺寸之)能 力、可重組性、良率特性、電源減少及/或W實際面積之減 優點。 、 "電源管理邏輯微碼細指令係因應至少兩個條件而被實施。 首先,電源管理邏輯微碼208可被喚起以實行核心、1〇6之指令华 架構之一指令。於一實施例中,χ86職订與m指令等;: 在微碼208中。亦即,當指令譯碼器2〇4遇到—_娜八汀或沉 指令時’指令譯碼器2〇4停止提取目前執行的使用者程式指令, 並將控制雜送至微㈣H 2〇6 _始提取實行 MWait或取 39 201245948 指令之電源管理邏輯微碼中的—常式。其次,電源管理邏輯 微碼208可能因應—中斷事件而被喚起。亦即,當一中斷事件產 生時,核〜106停止提取目前的使用者程式指令,並將控制權傳 运至微序列$ 206以開始提取掌控中斷事件之電源管理邏輯微碼 208中的吊式。中斷事件包含架構中斷、例外、錯誤或陷啡 (traps),例如由χ86指令集架構所界定者。一中斷事件之例子為 匯机排116上之一個對於與電源管__^胃 -者之I/O讀取傳輸偵測。中斷事件亦包含非架構界定的事件。於 -實補巾’非架構界定的巾斷事件包含:經由圖丨之核心間通 訊配線118 (例如圖5、6所描述之連結)發送信號或經由圖】之 晶片間通籠,線118發送信號(或經由圖u之封裝體間通訊配線 1133發送^虎,以下所討論的)之一核心、間中斷需求(例如與圖 5與6相關所說明的);以及藉由晶片組之一 STpcLK設置或解除 設置之侧。於—實施财,f源管理糖微碼208齡為核心 觸微架構指令組之指令。在另一實施例中,微碼識齡為不同 的♦曰令組之齡’其將轉變成核^胤之微架構指令組之指令。 圖1之系統100執行分配在多重處理核心1〇6之間的分散式 電源管理。更明確1?i3t•’每健^實施其本地魏管理邏輯微碼 208以響應—狀態轉變需求,並轉變成目標電源狀態。目標電源狀 態為多個預定電源狀態(例如C-狀態)之任何一個所需求者。預 疋電源狀態包含一參考或主動操作狀態(例如ACpi2 狀熊) 以及多個漸進地且相對不太敏感的狀態(例如ACn之、c2、 201245948 C3等狀態)。 現在參考圖3所g目 "·貝不之流程圖’其依據本發明顯示圖1之系 統100之操作’用以勃 核心廳間的分散式^在多核心微處理器1〇2之多重處理 : 式電源官理。具體言之,流程圖顯示電源管理· 邏輯微碼208之一邱八4。仏 · _ 刀钿作,係因應於遭遇一 MWAIT指令戍類 似的^令’缚變成—新電職態。更明麵言,® 3所顯示之 電源5里痛微碼之部分係為電源管理邏輯之—狀態轉變需 求處理邏輯(STRHL)常式。 〜為!促輯圖3之更佳理解,MWAIT指令與C_狀態架構之 實施樣態係在朗每—_ 3之_方塊馳制。MWAIT指令 可包含在作f _例如,職d_、Lin_、Ma_)或苴 j統軟體中。舉例而言,如果系統軟體知道系統上之工作量目、 所是低或林在的,_、錄體可錄行-MWAIT齡以允許 核心106進入-低電源狀態,直到一事件(例如從一周邊裝置之 中斷)要求由核心1〇6服務為止。另一例子為,在核心1〇6上執 打的軟體可能與在另一核心1〇6上執行的軟體之共享資料,是以 在存取由兩個核心106所共用資料時便需要經由例如-作號 (s酿幽〇之同步;如果在另一核心1〇6戶斤執行之儲存至錢 U〇ret〇s嶋ph〇re)前已經過1顯著的時間量時,則在目前核 心106上執行之軟體將致使目前核心1〇6經由mwait指令進^ 低電源狀態,直到儲存至信號發生為止。 MWAIT指令係詳細說明於2_年3月之IntdR 64與iA々 41 201245948 架構軟件開發人員手冊(Architectures 3〇肠啦D㈣ Manual) ’卷2A :指令集參考(A_M)之第3_761至3_764頁,而 監視(MONITOR)指令係詳細說明於相同文件之第3_637經由 3-639頁,其全部在此皆併入作參考。 MWAIT才曰令可能指定-目標c-狀態。依據_個實施例,c_ 狀態0係為-執行狀態,而大於〇之c•狀祕為休眠狀態;丄及 較南之c-狀態係為停止狀態,於其中核心1〇6不提取與執行指 令,而2及較南之C-狀態係核心1〇6可能執行額外動作以減少其 電源消耗,例如禁能其快取記憶體並降低其電麼及/或頻率之狀態。 依據-個實關,2絲高之祕被視為並預先決定成 為-受限制的電源狀態。在2或較高之〇_狀態中,晶片組ιΐ4可 能移除匯流排116時脈,藉以有效地禁能核心1Q6時脈,以便大 幅地減少由核心廳之電源消耗。關於每個後段較高的c·狀態, 將允許核^ 106執行更積極的電源節約動作,__皆需要較 長的時間恢復至執行狀態。可能使核心觸退出低電源狀^之^ 件之實例為-帽以及藉由另——之贿至―_指定的位 址範圍(由先前所執行的監視(M〇NIT〇R)指令所指定)。 明顯地,對C-狀態之ACPI編號機制使用較高的c號碼以表 示漸進地較不敏感、較深的休眠狀態。藉由使用這種編號機制,、 任^^駐顧群組(亦即:“、狀體、平台)之複合電源 狀怨將是該組成群組之所有啟動核心之應用c_狀態最小值,每個 核心的應肖C-狀態最小值係最近的有效要求c_狀態(如果有的 42 201245948 話)=是零(如果核心不具備有效的最近要求應用c_狀態的話。 然而,其他敎之電源狀態使用漸錄高的號碼以表示漸進 更敏感的狀態。舉例而言,CNTR.2534說明—種指示—期望的電 •,壓識別馬(VID)至一電壓調節器模組之協調系統。較 .高的VID對鼓較高電壓辦,_對應錄_ (所以是更敏 感的)性能狀態。但協調—複合VID涉及決定核心所請求彻值 最大值因為冑源狀態編號機制可依上升或下降次序被指 疋’所以此說明書之部分將複合電源狀態界定為―"極值",其係相 關l、之應用m嗤之最小值或最大值。然而,吾人明白即使 二凊求的VID及時脈比率值翻與3|知卿滅的方向"予以訂 疋(orderable)n(譬如使用從原始值開始之負計數);因此不管傳 統上界疋的方向為何’描述於此之更縣界定的階層式協調系統 通常亦適用這些電源狀態。 人雖然圖3說明—實施例,於其巾核心106響應-MWAIT指 7以執仃分散式電源管理’但是核心廳亦可能響應其他形式之 輸而通知核心1〇6其可能進入一低電源狀態。舉例而言,匯流 排"面爭几224可能產生一信號,以因應偵測到匯流排116上之 1/〇項取傳輸至一預先決定的ί/〇 4範圍時,用以使核心1〇6進 =陷味而執行㈣208。再者,核^觸因應所接收之其他外部信 7而進人執行微碼之實施例亦被本發明所考量,且實施 例亚未受限於x86指令集架構實施例或受限於包含一 ㈣4型 式處理器匯流排之系統實施例。再者,-核心106之既定目標狀 43 201245948 1可此内。卩地被產生,如經常出現具有期望的電壓與時脈數值之 情況。 現在把焦點放在圖3之個別功能方塊上,流程於方塊3〇2開 始。於方塊302,圖2之指令譯碼器· 204遇到_ Μ·ΙΤ指令並進 入p£3拼而執行電源S理邏輯微碼罵,且制沒實現指令 之STRHL常式。MWAIT指令載明以"X”表示之一目標c_狀態, 並在核心1()6等待-事件發生之同時通知其可能進入一最佳化狀 態。具體&之,最佳化狀態可能是一低電源狀態,於其中核心1〇6 將消耗比核心106遇到MWAIT指令之執行狀態下更少的電源。 流程繼續至方塊303。微碼將"X"儲存成為核心之應用或最近 的有效要求的電源狀態,以"Y”表示。吾人可注意到,如果核心1〇6 尚未遇到-MWAIT齡、或如果目為從料起該齡已被取代 或變成陳舊的(譬如藉由一後來的STPCLK解除設置)且核心係 處於一正常執行狀態,則儲存為核心之應用或最近的有效要求電 源狀態之數值"Y"係為〇。 流程繼續至方塊304。於方塊304 ’微碼208 (更詳細而言是 STRHL常式)檢驗"X” ’其為對應於目標c_狀態之一數值。如果 •’X"小於2(亦即’目標C_狀態為〇,則流程繼續至方塊3〇6;而, 如果目標C-狀態大於或等於2 (亦即,"χ"對應至一受限制的電 源狀態),則流程繼續至方塊308。於方塊306,微碼208將核心 106置於休眠。亦即,微碼2〇8之STRHL常式將控制暫存器寫入 在核心106之内’用以使其停止提取並執行指令。因此,核心 201245948 消耗比其處於執行狀態時更少的電源。最好的狀況是,當核心106 正休眠時,微序列206亦沒有提取並執行微碼2〇8指令。流程 於方塊306結束。圖5說明為因應從休眠被喚醒之核心ι〇6之操 作。 方塊308表示一條路徑,其係"乂,,為2或更多之對應於一受限 制的電源狀態時,微碼208之STRHL常式所執行的操作。如上所 迷’於一貫施例中,2或更多之一種c·狀態涉及移除匯流排116 時脈。匯流排116時脈係由核心1〇6所共用之一資源,因此當一 核心設有2錄高的-目標〔狀態時,較佳的方式是核心1〇6透 過於此所綱的以-齡配式麵财式進行通訊,用以確認每 個核心106已被通知其可以在通知晶月 '组114(其可能移除匯流排 116時脈)之前轉變成2或更大之C-狀態。 在方塊308中,微碼208之STRHL常式基於由於方塊3〇2所 遇到的MWAIT齡特別指定之目標C-狀態,執行相關的電源節 約動作(PSA)。-般而言,由核心1〇6所採取之psA &含獨立於 其他核〜106之動作。舉例而言,每個核心1〇6包含其自己的快 取記憶體’其係位於核心舰本身(例如,指令快取搬與資料 快取222)之近端,而psA包含刷新局部快取、移除它們的時脈 乂及使匕們斷電。在另一實施例中,多核心微處理器逝可能包 含由多重核心106所共用之快取。於本實施例中,共用的快取無 法被刷新、使它們的時脈被移除、或被斷電,直到核心鄕彼此 溝通以決定所有核心106已接收指定一適當的目標C-狀態之— 45 201245948 MWAIT為止’在這種情況Τ,它們可能在通知晶片組ιΐ4其可貪匕 需求移除匯流排U6 a寺脈及/或抑制在匯流排116上產生窺探循= 之允許之前,刷新共用的快取、移除它們的時脈並使它們斷電(參 見方塊322)。於-實施例中,核心1〇6共用一電壓調節器模組 (VRM)。CNTR.2534說明-種利用—種分配式之分散方式以管 理由多重核心所共用之- WM之設備及方法。於一實施例中,^ 個核心106具有其本身的PLL 226,如於圖2之本實施例中,以使 核心106可減少其頻率或禁能PLL 226以節省電源而不會影響其 他核心106。然而,在其他實施例中,一晶片1〇4上之核心 可能共用- PLL。CNTR.2534說明-種利用一種分配式之分散方 式以管理由多重核心所共用之PLL之裝置及方法。於此所說明之 電源狀態管理與相關的同步邏輯之實施例,亦可能(或選擇地) 被應用以利用-種分配式之分散方式來管理由多重核心所共用之 ~ PLL ° 流程繼續至方塊312。於方塊312,電源狀態管理微碼2〇8之 STRHL t式呼叫以syne_c·狀態表示之另—電源狀態管理微碼 208 *式(其係相關於圖4而詳細說明的),用以與其他節點地連 接核心106溝通並為多核心微處理器1〇2獲得一合成c_狀態,在 圖3中以Z表不。相對於正在核心上執行的實例,c_狀態常 式之每個被伽實例於此稱為syne—c_狀態常式之—〃本地„實例。 微馬208之STRHL常式喚起具有一輸入參數或探測(pr〇be) 電源狀態數值之syncLC•狀態常式,探測錢狀態數值等於核心之 46 201245948 應用電源狀態⑻卩,其最近的有效要求的目標電源狀態),其係 由MWAIT指令所特別指定之在方塊302中所接收之τ之數值。 喚起啊―C-狀態常式開始一複合電源狀態發現過程,如與圖4相 ,關而做更進一步說明者。 • 母個被喚醒sync—(:_狀態常式計算一"混合"c_狀態並使"混合 π狀態回復至呼叫或實施它(於此是strhl料)之任何程序。 1合"〔狀態為所探測C_狀態數值中的最小值,而所探測&狀態 數值係由被喚醒程序所接收、在核心上執行辦〇狀態常式之應 用C-狀態、以及由與sync_c-狀態常式的相關被引發實例所接收之 C-狀態數值。町將朗在某些情況之下,混合c_狀態為共通於 本地SynC_C-狀態常式與同步化sync_c_狀態常式兩者之域之複合 電源狀態相關。以下亦說明在其他情況中,混合^狀態可能只是 域之一局部合成C-狀態。S 29 201245948 The operational state may define the configuration of a local resource located in a core, a second set of operational states may be defined by a core of a chip but not located in the local resource configuration of the chip. The second set of operational states may be defined by a package. The configuration of resources shared by the core...etc. The implementation of an operational state requires coordination with and approval of the core of the shared resources under the operational state configuration of the application. In general, a composite operational state for any given domain is an extreme value (i.e., maximum or minimum) of the operational state of the application belonging to the core of each of the activated entities of the domain. The status of the job-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Or the operational state of the demand, the application operating state of the core is some preset value. The preset value may be zero (e.g., a condition in which the composite operational state is calculated to be the minimum value), a maximum value of the predetermined operational state (e.g., a condition in which the composite operational state is calculated as the maximum value), or an operational state of the core currently implemented. In an embodiment, a core application operating state is a power or operational state, such as a voltage ID (VID) or clock ratio value desired or required by the core. In the other-real complement, the “professional state” is the most recent valid c_ state that the core has received from the applied system software. In another embodiment, the application operational state of an entity core is the extreme value of the most recent still correct target or required operational state of the core (if any), and will affect the highest pie domain (if any). The most extreme operational state of the local resource for which the highest domain has the administrator credentials. Therefore, the overall operational state of the processor will appear to be the lowest value of all the startup entities of the processor. The composite power state of an enclosure will be the maximum or minimum value of the power state applied to the core of all the startup entities of the package. The composite power state of a wafer will be the maximum or minimum value of the active source state of all of the startup entities of the wafer. ~ Describes the implementation of the decentralized electric secret, the power management logic of each face, or the part of the Randy logic, which is designed to be at least conditionally connected to the other nodes ^ (ie, the same - Other cores of the same attribute group) ^Change the power supply to determine the status of the hybrid power supply. The hybrid power state is the value of the application power state corresponding to at least the node of the local simple step logic. In some non-essential situations, a simultaneous suspension calculation and a fine-hybrid power state will accurately correspond to the composite power state for an application domain. A wake-up instance of each side-step logic (inv〇ked test (10)) is designed to generate synchronization logic at least conditionally for cores 4 connected at points ii that have not yet been synchronized. The nodes of the same attribute group are connected to the core, and the nodes of the higher-level affinity group are continuously connected to the core (if any, to the core to which the synchronous logic instance belongs). The nodes that are not synchronized are connected to the core of the node, and the synchronization logic synchronization instance has not been implemented as part of a composite power state discovery process. , b, in each instance of the synchronization 26 series; see the process, will recursively to the remote core of the node that has not been synchronized, further step by step (at least conditionally) 5 series 彳A pervasive example, until every core of the potentially impact domain 31 201245948 potentially impact domain), there is an instance of synchronization logic synchronization being executed. In the discovery process for the composite power state of the applied domain, an instance of the power management logic executing on the core is authorized to initiate or execute the implementation of the composite power state for the domain, and can be activated. ability. μ V. DETAILED DESCRIPTION OF THE INVENTION Turning now to the particular embodiment shown in the figures. In one embodiment, each instance of the synchronization logic passes through a bypass rail or a bypass busbar (inter-core communication wiring 112, inter-chip communication wiring 118, and inter-package communication wiring 1133) different from the system bus and other cores. The synchronization of the logic on the example of the dragon's use of Xiang - a kind of decentralized distribution of money and electric riding. This allows the core to be physically placed on multiple wafers or in multiple packages = Lv error to potentially lower the wafer size and improve yield, and to provide the high degree of seamability of the core number of the wire towel, without the modern The contact pads and pin restrictions of the wafer and the package of the processor are affected. Referring now to the block diagram shown in FIG. 1, there is shown a practical implementation of a computer power line 1GG for distributed power supply between multiple processing cores in accordance with the present invention. The system includes a single chipset 114 that is moved to the multi-core microprocessor by -116. The multi-core microprocessor 102 package contains two dual core wafers 104, which are represented by wafer cassettes and wafers 1. The film is printed on one of the substrates of the package. The substrate comprises a wiring net (or simply referred to as ''wiring") or a face that connects the "1") to the pin of the package 32 201245948 102. The pins may be connected to other reasons Busbar 116. The substrate wiring also includes inter-wafer communication wiring 丨丨8 (discussed below) connected between the wafers 104 to facilitate communication between them. The rib implementation is distributed at the core of the township core (10). Decentralized power management of 106. Each dual core wafer 104 contains two processing cores 1 , 6 , which contain core 0 and core 1 , and wafer 1 contains core 2 and core 3. Each wafer has 1 - 4 The designated manager is responsible for the implementation. In the embodiment of the figure, the core and the system are the manager core 106' of the chip and the core 2 is the manager of the chip! The core 1〇6. In an embodiment 'Each core 106 contains a configuration fuse ((: 〇11£^11 plus 1〇11613), the manufacturer of the wafer 104 may blow the configuration fuse to indicate the core 〇6 which is the manager core of the wafer 104 In addition, the manufacturer of the wafer 1〇4 may blow off the configuration of the Hyun to each The cores 106 specify their instances, that is, which of the cores 1〇6 is core 0, core 1, core 2, or core 3. As described above, the terminology "partner" is not on the same wafer 104 and each other The core of communication is 1〇6; therefore, in this embodiment of Figure 〖, core 0 and core 为 are partners, while core 2 and core 3 are partners. The term "companion" is used in different chips. The manager core 106 on the 4th and communicating with each other; therefore, in the embodiment of Fig. 1, the core and core 2 are companions. In one embodiment, the even cores 106 are 1 per crystal. The manager core of 4. In one embodiment, the core system is labeled as a boot service processor (BSP) of the multi-core microprocessor 1200, which is separately authorized to coordinate with the chipset 114. These limiting activities include enabling the implementation of certain composite power bears. In one embodiment, the BSP core 106 notifies the chipset 114 and asks it to remove the U6 clock to reduce power consumption, and / or avoid creating a snooping week on the bus n6 The period, as discussed later in block 322 of Figure 3. In the case of a real palladium, the 'BSP system is the core 1〇6, and its bus bar requires the output system to be bound to the BREQ0 signal on the bus bar 116. The two cores 1〇6 within 104 communicate via the inter-core communication wiring 112 located inside the chip 104. More specifically, the inter-core communication wiring 112 allows the core and hall within the wafer 1()4. Interrupted with each other and communicated with each other to perform distributed power management between the cores 1 1G6 allocated for pure read processing. In one embodiment, the core and inter-communication wiring 112 includes parallel bus bars. In one embodiment, the inter-core communication wiring 112 is similar to that described in CNTR2528. Further, the core 106 communicates via the inter-chip communication wiring 118. More specifically, the inter-wafer communication wiring 118 allows the manager cores 106 on the individual wafers 104 to communicate with each other and transfer messages to each other to perform core (10) distributed power management distributed among the multi-core microprocessors. In the embodiment, the inter-wafer communication wiring 118 is executed at a clock rate of the bus bar 116. In one embodiment, 'core 1 〇 6 transmits 32-bit messages to each other. During transmission or broadcast, the cores 1-6 are arranged on a single wire of the inter-wafer communication wiring 118 during the bus cycle to indicate that it is about to transmit a message, and then on the next 31 bus cycles 116. A sequence of 31 bits is transmitted. At the end of the mother-to-wafer communication wiring η8 is a _ & bit shift register, which accumulates a single bit received to form a 32-bit message. In a real case, the 32-bit afl contains multiple information fields (fieid). An information interception uses the VI〇 value of one of the 7-bit requirements of 34 201245948, according to the VRM allocation management mechanism shared in CNTR.2534. Other information blocks contain messages about synchronization of power states (eg, C-states), such as c-state request values and acknowledgments, which are exchanged between cores 1-6, as discussed herein. In addition, a special message value causes a core 106 that transmits its value to interrupt a core 106 that receives its value. In the embodiment of Figure 1, each wafer 104 includes four contact pads 1 〇 8 that are respectively stalked to four pins (represented by "ΡΓ," P2 ", P3", and "P4"). With respect to the four contact pads 108, one of them is an output contact pad (indicated by 〃〇υτ") and the other three are input contact pads (indicated by IN 1, IN 2, and IN 3). The inter-wafer communication wiring 118 is the same as the following. The contact pad of the wafer 塾 and the contact pad of the chip J are coupled to the pin P1 via a single-wiring net; the contact of the crystal m ουτ with the 03 of the wafer 0 is coupled via a single „ghex wire mesh, connected to the connection Ankle 2; the IN 2 contact pad of the wafer 与 and the wafer 1 接触 3 contact pad is connected to the pin P3 via a single wiring network; and the IN 1 contact pad of the wafer 0 is in contact with the wafer, and is connected to the pin P4 via a single wiring _ In the embodiment, the core intestine is contained in each of the messages transmitted from the 〇υτ contact pad 108 to the inter-wafer communication wiring 118 (or the inter-package via line 1133 as illustrated in Figure u below). - Reward code. This identification code uniquely identifies the target core to which the message is scheduled to arrive, 1-6, which is useful in the embodiment described herein (where this message is broadcast up to $receiver core, 1Q6). In the embodiment, 'each wafer 104 is set according to the configuration that is blown during the manufacture of the multi-core microprocessor 1〇2, and the four contact pads are designated as the output contact pads (OUT). The manager of the chip 0 core ^ 0 wants to communicate with the manager of the 丨 核心 core 2 35 201245948 Transmitting the information on the OUT contact pad to the wafer contact pad; similarly, when the manager core 2 of the chip 1 wants to communicate with the manager core 0 of the chip G, it will be on its 0UT contact pad. The information is transmitted to the ^ 3 contact pads of the wafer. So, in the picture! In the embodiment, only one input contact pad 108 is required per wafer 1 4 instead of three. However, the advantage of fabricating the die 104 having three input contact pads 1 为其 8 is that it allows for the core multi-core microprocessor of Figure 1 and the eight-core multi-core microprocessor of Figure 9, for example. The same wafer 1 〇 4 in 2 was designed. Furthermore, in the embodiment of the figure, two pins ρ are not required. However, the advantage of making a wafer with four pins is that it allows the same quad core microprocessor 102 in Figure 1 to be designed as a single quad core microprocessor 1 2, for example as shown in FIG. The four-core microprocessor 11〇2 can be designed as an eight-core system Π00. However, as shown in the core yoke examples of Figures 12 and 14 to 16, it is contemplated to remove unused pins and contact pads 108 to reduce > contact 塾 and pin count as needed. In addition, for example, in the embodiment shown in FIGS. 19 and 2, as in the embodiment, the unused pins ρ - and 塾 1 〇 8 may be removed as needed to reduce the number of contacts and the number of pins. Or deployed for other purposes. In one embodiment, busbar 116 includes a number of signals that allow chipset 114 and multi-core microprocessor 102 to communicate via a busbar protocol similar to the well-known Pentium 4 busbar protocol. The bus bar 116 includes a bus clock signal provided by the chip set 114 to the multi-core microprocessor 102, and the core, 1 〇 6 uses it to generate an internal core "clock signal, the frequency of which is generally the bus block frequency. Ratio. Bus 116 ''匕έ STPCLK k (set by the chipset 1 μ)' to require the core 1 〇6 to allow the 2012 clock to remove the bus clock signal, that is, to allow the bus to be stopped. Signal. The multi-core microprocessor 102 performs an I/O read transfer on the bus 116 from a predetermined 1/〇 connection address (only one of the core offices executes it) to indicate that the chipset 114 can The STPCLK is set. As discussed below, the multiple cores 106 communicate with each other via the inter-core communication wiring m and the inter-chip communication wiring ι 8 to determine when the single-core 106 can perform a 1/〇 read transmission. In one embodiment, after the STpcLK is set in the wafer set 114, each core (10) issues a STOP GRANT message to the chipset 114; once each core ship has issued an S10P GRANT message, the chipset 114 can remove the busbars. Pulse. In another In one embodiment, the chipset m has a configuration option such that it only expects a single sT〇p G read message from the multi-core microprocessor 1〇2 before it removes the busbar clock. ® 2 is shown in detail, which details one of the typical examples of the core 106 of the present invention. According to one embodiment, the core rigid microstructure contains a function of single politics - super pure amount (King brewing heart), meaning The instruction execution pipeline - the instruction cache 202 caches the instruction extracted from the system memory (not shown). An instruction translation device 204 is subtracted to receive an instruction from the instruction cache (eg, an edge instruction set architecture instruction). A registration alias table (RAT) 212 receives the decoding microinstructions from the instruction decoder 204 and from the -microsequencer, and generates dependency information for the decoding microinstruction. The reservation station 214 is used to receive the information from the button. The decoding unit executes the decoding microinstruction and the dependency information. The executing unit 216 is configured to receive the decoding flip from the reservation station 214 and receive the instruction operand used for decoding the micro age. 37 201245948 .i From the nuclear ~1Q6 temporary Memory (such as general temporary storage n· and The read and writable module register (_238, and the __ to the execution unit block 222. - the retiring unit 2 丨 8 is wired to receive the & Let the wire be executed, and the structure of the wire will be mixed (4). The data is taken from the 222 system to the bus interface unit (BIU) 224, which is connected to the interface of the figure (10). (The call receives the bus clock signal from the bus bar 116 and generates a core clock signal 242 to the core, 106 various functional units. The rainbow 226 can be controlled via the execution unit 216, for example, disabled. Execution unit 2!6 receives - BSP indicator code and a manager indicator code 232, which respectively indicate whether the core office is the manager core of the chip collapse and the BSP core of the multi-core microprocessor 102. As noted above, the BSp# code entry and manager indicator code 232 may contain a programmable melt. In one embodiment, the clerk indicator code 228 and the manager indicator code 232 are stored in a special module register (10) phantom top. 'It is first fetched by the programmable melt value, but it may be written by software. It was updated to shun 238. Execution unit 216 also reads and writes control and status registers (CSR) 234 and 236 for communication with other cores. In particular, the core 1 〇 6 uses the CSR 236 for communicating with the core 106 on the same wafer via the inter-core communication wiring 112, and the core uses the CSR 234 for communicating with the 塾 108 via the inter-wafer communication wiring ns. The core 1〇6 on the wafer 1〇4 communicates as detailed below. The micro-sequencer 206 includes a micro-code memory 2〇7, which is designed to store the purpose of the main a-way 包含 包含 包含 包含 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 The 〇 ) ) ) 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 (For example, the Nata instruction > that is, the instance of the "state transition instruction" is unique to the core 1G0, and the microcode system executed by the transition instruction instance is executed on the core 106. ^ = 062 = fine 'cause each shot __There is a user program from the instruction set architecture instruction. In addition to the core 106, the microprocessor may contain an auxiliary or service processor (not shown), which does not have The same instruction set architecture is the core 106. However, in the present invention, the core (10) itself (not an affiliate or service processor and not any other non-core logic component) executes the multiple processing cores allocated in the multi-core microprocessor 102. Decentralized power management Therefore, it is advantageous to provide more _ (size) capability, recombinability, yield characteristics, power supply reduction, and/or W real area reduction advantages. The power management logic microcode fine instruction is implemented in response to at least two conditions. First, the power management logic microcode 208 can be invoked to implement one of the core, one of the instructions of the instruction architecture. In one embodiment, Χ86 job order and m instruction, etc.;: in the microcode 208. That is, when the command decoder 2〇4 encounters -_娜八汀 or sink instruction, the instruction decoder 2〇4 stops extracting the currently executed The user program instruction, and the control is sent to the micro (four) H 2 〇 6 _ to extract the implementation of the power management logic microcode in the MWait or 39 201245948 command. Second, the power management logic microcode 208 may respond - interrupt The event is aroused. That is, when an interrupt event occurs, the core ~106 stops extracting the current user program instruction and transfers control to the microsequence $206 to begin extracting the power management logic microcode that controls the interrupt event. 208 Hanging. Interrupt events include architectural interrupts, exceptions, errors, or traps, as defined by the 指令86 instruction set architecture. An example of an interrupt event is one of the sinks 116 for the power supply tube __^ stomach The I/O read transmission detection. The interrupt event also contains non-architectural defined events. The non-architectural definition of the wipe-off event includes: inter-core communication wiring 118 via the diagram (eg Figure 5 The connection described in 6) sends a signal or through the inter-wafer cage of the diagram, the line 118 sends a signal (or via the inter-package communication wiring 1133 of FIG. u, the following discussion) core, interrupt Requirements (as illustrated in connection with Figures 5 and 6); and the side set or unset by one of the chipset STpcLK. In the implementation of the financial, f source management sugar microcode 208 years of age as the core touch microarchitecture instruction group instructions. In another embodiment, the microcodes are of different ages, which will be converted into instructions of the microarchitecture instruction set of the core. System 100 of Figure 1 performs distributed power management distributed between multiple processing cores 1-6. It is more clear that 1?i3t•’s each implements its local Wei management logic microcode 208 in response to the state transition requirement and transitions to the target power state. The target power state is required by any of a plurality of predetermined power states (e.g., C-state). The pre-power state includes a reference or active operating state (e.g., ACpi2 bear) and a plurality of progressively and relatively less sensitive states (e.g., ACn, c2, 201245948 C3, etc.). Referring now to FIG. 3, the flow chart of the present invention is shown in FIG. 3, which shows the operation of the system 100 of FIG. 1 in a decentralized manner in a multi-core microprocessor. Processing: Power supply management. Specifically, the flowchart shows one of the power management and logic microcodes 208.仏 · _ Knife work, in response to a MWAIT command 戍 令 令 缚 缚 缚 — — — — — — — — — — — — 新 新 新 新 新 新 新 新To be clearer, the part of the power supply 5 that is displayed in the ® 3 is the power management logic-state transition demand processing logic (STRHL) routine. ~ For the better understanding of Figure 3, the implementation of the MWAIT instruction and the C_state architecture is in the box. The MWAIT instruction can be included in the f__, for example, job d_, Lin_, Ma_) or 统 software. For example, if the system software knows the amount of work on the system, is low or forested, _, the record can be recorded - MWAIT age to allow core 106 to enter - low power state until an event (eg from one The interruption of the peripheral device is required to be served by the core 1〇6. Another example is that the software that is executed on the core 1-6 may share data with the software executed on the other core 1-6, so that, for example, when accessing the data shared by the two cores 106, for example, - the number (synchronized by s brewing; if the storage of another core 1 〇 6 jins is executed to the money U〇ret〇s嶋ph〇re) has passed a significant amount of time, then at the current core The software executed on 106 will cause the current core 1 to enter the low power state via the mwait command until the store signal is generated. The MWAIT Directive is described in detail in the IntdR 64 and iA々41 201245948 Architecture Software Developer's Manual (Architectures 3 Manual D (4) Manual) in Volume 2A: Volume 2A: Instruction Set Reference (A_M), pages 3_761 to 3_764. The monitoring (MONITOR) command is described in detail in the third document of the same document, the entire disclosure of which is incorporated herein by reference. MWAIT is ordered to specify the target c-state. According to an embodiment, c_state 0 is the -execution state, and c is greater than the c; the secret is the dormant state; and the c-state of the south is the stop state, in which the core 1〇6 is not extracted and executed. The instructions, while 2 and the souther C-state core 1〇6 may perform additional actions to reduce their power consumption, such as disabling their cache memory and reducing its power and/or frequency state. According to a real customs, the secret of 2 filaments is considered and pre-determined to become a restricted power state. In the 2 or higher state, the chipset ι 4 may remove the bus 116 clock, thereby effectively disabling the core 1Q6 clock to greatly reduce the power consumption by the core office. Regarding the higher c· state of each subsequent segment, the core 106 will be allowed to perform more aggressive power saving actions, and __ will take longer to return to the execution state. An example of a component that may cause the core to exit the low power state is a cap and a bribe to another specified address range (specified by the previously performed monitoring (M〇NIT〇R) instruction) ). Obviously, the AC-numbering mechanism for the C-state uses a higher c-number to indicate a progressively less sensitive, deeper sleep state. By using this numbering mechanism, the composite power supply of the group (ie, ", body, platform") will be the application c_state minimum of all the cores of the group. The minimum C-state minimum for each core is the most recent valid requirement c_state (if there are 42 201245948 words) = is zero (if the core does not have a valid recent request to apply the c_ state. However, other 敎The power state uses a progressively higher number to indicate a progressively more sensitive state. For example, CNTR.2534 illustrates a type of indication—desired electrical power, and a coordination system for a voltage identification horse (VID) to a voltage regulator module. The higher VID is higher for the drum, and the _ corresponds to the _ (and therefore more sensitive) performance state. However, the coordination-composite VID involves determining the maximum value of the core request because the source state numbering mechanism can be raised or The descending order is referred to as 'so the part of this specification defines the composite power state as “"extreme value", which is the minimum or maximum value of the application m嗤. However, I understand that even the second request VID in time The ratio value is changed to 3| knowing the direction of the annihilation" orderable n (for example, using a negative count from the original value); therefore, regardless of the direction of the traditional boundary, the description is based on the definition of the county. The hierarchical coordination system usually also applies to these power states. Although Figure 3 illustrates the embodiment, the response to the core of the towel 106-MWAIT refers to the implementation of distributed power management, but the core office may also respond to other forms of transmission. Core 1〇6 may enter a low power state. For example, the bus " 224 may generate a signal to detect the transmission of a 1/〇 item on the bus 116 to a predetermined one. In the range of ί/〇4, the core 1〇6 is used to perform the trapping (4) 208. Furthermore, the embodiment in which the core touches the other external letter 7 received and executes the microcode is also adopted by the present invention. Considered, and the embodiment is not limited to the x86 instruction set architecture embodiment or to the system embodiment including the one (four) type 4 processor bus. Further, the established target of the core 106 43 201245948 1 is hereby incorporated. Squatting The situation with the desired voltage and clock values appears. Now focus on the individual function blocks of Figure 3, the flow begins at block 3〇2. At block 302, the instruction decoder 204 of Figure 2 encounters _ Μ · ΙΤ instruction and enter p£3 spelling to execute the power s logic microcode 骂, and the STRHL routine that does not implement the instruction. The MWAIT instruction states that the target c_ state is represented by "X", and is in core 1 ()6 Waiting--the event is notified that it may enter an optimized state. Specifically, the optimized state may be a low power state in which the core 1〇6 will consume the MWAIT command than the core 106. Less power in the execution state. Flow continues to block 303. Microcode will store "X" as the core application or the most recent required power state, expressed as "Y." We can note that if Core 1〇6 has not yet encountered -MWAIT age, or if the target is It is expected that the age has been replaced or become stale (for example, by a later STPCLK release) and the core is in a normal execution state, then stored as the value of the core application or the most recent valid required power state "Y" The flow is continued to block 304. At block 304 'microcode 208 (more in detail STRHL routine) check "X" 'which is a value corresponding to one of the target c_states. If •'X" is less than 2 (ie, 'the target C_ state is 〇, the flow continues to block 3〇6; and, if the target C-state is greater than or equal to 2 (ie, "χ" corresponds to one The restricted power state is then passed to block 308. At block 306, the microcode 208 puts the core 106 to sleep. That is, the STRHL routine of the microcode 2〇8 writes the control register to the core 106. Internal 'used to stop extracting and execute instructions. Therefore, core 201245948 consumes less power than it is in the execution state. Best of all, when core 106 is sleeping, microsequence 206 is not extracted and executed. The code 2 〇 8 instruction. The flow ends at block 306. Figure 5 illustrates the operation of the core ι 6 in response to wake-up from sleep. Block 308 represents a path, which is "乂, which corresponds to 2 or more The operation performed by the STRHL routine of the microcode 208 in a limited power state. As noted above, in a consistent embodiment, one or more of the c. states involve removing the busbar 116 clock. 116 clocks are shared by the core 1〇6, so When a core has 2 high-targets (in the state), the preferred method is that the cores 1〇6 communicate with each other through the age-matched financial model to confirm that each core 106 has been notified. It may be converted to a C-state of 2 or greater prior to notifying the Crystal's group 114 (which may remove the busbar 116 clock). In block 308, the STRHL routine of the microcode 208 is based on the block 3〇2 The target C-state specified by the MWAIT age is encountered, and the associated power save action (PSA) is performed. In general, the psA & taken by the core 1〇6 contains actions independent of the other cores~106. For example, each core 1 〇 6 contains its own cache memory 'which is located near the core ship itself (eg, command cache and data cache 222), while psA includes refresh local cache, The clocks are removed and powered down. In another embodiment, the multi-core microprocessor may contain caches shared by the multiple cores 106. In this embodiment, the shared cache cannot. Being refreshed, their clocks removed, or powered down until the cores communicate with each other Decide that all cores 106 have been received to specify an appropriate target C-state - 45 201245948 MWAIT So far, in this case, they may be informing the chipset ιΐ4 that it can greedy the need to remove the busbar U6a and/or Suppressing the common caches, removing their clocks, and powering them off before allowing the snooping on the busbar 116 (see block 322). In the embodiment, the cores 1〇6 share a voltage. Regulator Module (VRM). CNTR.2534 illustrates the use of a distributed approach to managing the devices and methods shared by multiple cores. In one embodiment, the core 106 has its own PLL 226, as in the embodiment of FIG. 2, such that the core 106 can reduce its frequency or disable the PLL 226 to conserve power without affecting other cores 106. . However, in other embodiments, the core on a wafer 1〇4 may share a -PLL. CNTR.2534 describes an apparatus and method for managing a PLL shared by multiple cores using a distributed dispersion scheme. Embodiments of the power state management and associated synchronization logic described herein may also (or alternatively) be applied to manage the PLL ° flow shared by the multiple cores to a block using a distributed approach. 312. At block 312, the STRHL t-type call of the power state management microcode 2〇8 is represented by the syne_c state, the power state management microcode 208* (which is described in detail with respect to FIG. 4) for use with other Nodes connect core 106 to communicate and obtain a composite c_ state for multi-core microprocessor 1 , 2, which is represented by Z in FIG. Relative to the instance being executed on the core, each of the c_state routines is exemplified by the syne-c_state routine - 〃 local „ instance. The STRHL routine of the micro horse 208 evokes an input parameter Or detect (pr〇be) the power state value of the syncLC• state routine, the value of the probe money state is equal to the core of the 46 201245948 application power state (8) 卩, its most recent valid target power state), which is specially specified by the MWAIT directive Specify the value of τ received in block 302. Arouse the C-state routine to start a composite power state discovery process, as shown in Figure 4, which is further explained. • The parent is awakened sync — (:_ state routine calculates a "mix"c_ state and returns the "mixed π state to any program that calls or implements it (here is strhl). 1 "[Status is detected C The minimum value in the _ state value, and the detected & state value is the C-state of the application that is received by the wake-up program, executes the state routine on the core, and is triggered by the correlation with the sync_c-state routine. Example office The C-state value is received. In some cases, the mixed c_ state is related to the composite power state common to both the local SynC_C-state routine and the synchronized sync_c_state routine. Note that in other cases, the mixed state may be just one of the domains to locally synthesize the C-state.

-般而言一域之複合電源狀料該域之财如之應用電 源狀態之極值(在AOT電源狀態機制中是最小值)。舉例而言, -晶片104之合成C,狀態係為晶片之所有核心,之應用c_狀態 (例如,最近的有效要求的C-狀態,如果所有核心皆具有這樣的 數值的話)之最小值。整體看來,多核心微處理器搬之合成C 狀態為多核心微處理器102之所有核心收之應用c_狀態之最小 值。 然而…種混合賴狀態可能是―應賊之複合電源狀態, 或只是局部的複合賴狀態。-局部的複合電源狀態將是兩個以 47 201245948 上但小於全部之一應用域之核心應用電源狀態之極值。在一些部 分中,此說明書表示一種&quot;至少局部合成電源狀態”以包含任何變化 之計算而得的混合電源狀態。在一混合電源狀態與一複合電源狀 態之間的電位(即使是細微的)區別將透過圖4C、10及17之說 明而變得更顯清楚。 吾人預先注意到,多核心微處理器102之一非零的合成c—狀 態表示每個核心106已看見載明一非執行C_狀態(亦即,具有工 或更大之數值之C-狀悲)之MWAIT,而一零值的合成C-狀鮮表 示並非每個核心106已看到MWAIT。再者,大於或等於2之數值 表不多核心微處理器102之所有核心1〇6已接收載明2戍更大之 C-狀態MWAIT指令。 流程繼續至決定方塊314。於決定方塊314中,微碼2〇8之 STRHL常式檢查於方塊312所決定之混合c-狀態”z&quot;。如果,,z”大 於或等於2,則流程繼續至決定方塊318。否則,流程繼續至方塊 316。 於方塊316,微碼208之STRHL常式將核心1〇6置於休眠。 流程於方塊316結束。 於決定方塊318 ’微碼208之STRHL常式判斷核心1〇6是否 為BSP。如果是’則流程繼續至方塊322 ;否則,流程繼續至方塊 324。 於方塊322 ’ BSP 106通知晶片組114其可能要求移除匯流排 116時脈及/或抑制在匯流排116上產生窺探彳盾環之允許。 48 201245948 於-實施例中’依據熟知之Pentium 4匯流排協定,唯一被授 權以允許較高的電源管理狀態之BSP齡將通知晶片組Μ其可 能藉由初始化匯流排116上之一 1/〇讀取傳輸至一預先決定的ι/〇 埠’來要求移除匯流排116時脈及/或抑制在匯流排116上產生窺 探循環之允許。然後’晶片組114設置在匯流排116上之stpclk 信號以要求移除匯流排116時脈之允許。於一實施例中,在通知 b曰片組114其可於方塊322 (或方塊608)設置STPCLK之後,執 行於BSP核心1 〇6上之微碼208之STRHL常式將等待晶片組i工4 設置STPCLK,而非前進至休眠狀態(於方塊324或方塊614), 然後通知其他核心106有關此STPCLK之設置 '發佈其STOP GRANT訊息,然後進行到休眠狀態。依據由1/〇讀取傳輸而特別 才曰疋之預先決定的I/O連接埠位址,晶片組114可隨後抑制在匯流 排116上產生窺探循環。 流程繼續至方塊324。於方塊324,微碼208將核心106置於 休眠狀態。流程於方塊324結束。 現在參考圖4’一流程圖顯示圖1之系統1〇〇之另一元件之操 作’其執行分配在多核心微處理器1〇2之多重處理核心1〇6之間 的分散式電源管理。更明確而言,流程圖顯示圖3 (與圖6)之電 源狀態管理微碼208之Sync_C-狀態常式之一實例之操作。雖然圖 4係為顯示微碼208之sync_c-狀態常式之單一實例之功能性流程 圖’但吾人將從下面理解到其經由該常式之多重同步實例實現一 合成C-狀態發現過程。流程於方塊4〇2開始。 49 201245948 於方塊402,一核心106上之微碼208( &quot;sync_C-狀態微媽208”) 之sync_C-狀態常式之一實例被喚醒並接收一輸入探測C-狀態’在 圖4中以&quot;A”表示。sync_C-狀態常式之一實例可能從MWAIT指令 微碼208所執行處被喚醒,如相關於圖3所說明,在這種情況下, sync_C-狀態常式構成sync_C-狀態常式之一初始實例。此外, sync_C-狀態常式之一實例可能藉由源自另一核心之一同步需求 (於此稱為一外部地產生的同步需求)而產生,在這種情況下, sync一C-狀悲常式構成sync_C-狀感常式之一從屬實例(dependent instance )。尤其當執行於另一個節點地連接核心上之sync—c-狀態 常式之一本地實例,可能藉由將一適當的核心間中斷傳送至本地 核心來產生sync—C-狀態常式之本地實例。如相關於圖6更詳細說 明的,電源狀態管理微碼208之一核心間中斷處理常式(ICIH) 將處理由節點地連接核心1〇6所接收之核心間中斷。 流程繼續至決定方塊404。於決定方塊404,如果sync c_狀 態常式之這個實例(亦即,”本地實例”)係一初始實例,亦即,如 果其係從圖3之MWAIT指令微碼208被喚醒,則流程繼續至方 塊406。否則,本地實例係藉由執行於一節點地連接核心上之 sync—C·狀態f式之外部或本地實賴產生之—從騎例,而流程 繼續至決定方塊432。 於方塊406’Sync_C-狀態微碼208藉由程式化圓2之CSR 236 來產生在其料核处之―從屬科_〔狀料式,㈣將於方塊 402所接收之” A”值傳送至其夥似用以中斷夥伴^這將要求夥伴 50 201245948 計算一現合c-狀態並將其傳回至本地核心1〇6,以下將對此做更 詳細之說明。 流程繼續至方塊顿。於方塊·,sync—c_狀態微碼施程式 化⑽236,用以偵測夥伴已傳回一混合c_狀態至核心滿,如 絲’聰瓣之混合㈣態,韻4,㈣,,表㈣注意的 是,如果夥伴位於其最活躍的執行狀態(_麵吆 _),則&quot;B”之數值將是零。於一實施例中.,微碼厕等待夥伴以 響應在一迴圈中於方塊做出的請求,此迴圈為-預先決定的 數值來輪詢CSR 236,用以侧夥伴料已_ —混合c_狀態。 於-貫施例巾,此迴圈包含—逾時雜器;如果逾時計數器到期, 則微碼208假設夥伴核心1〇6不再被啟動且可被使用、在任何後 續的sync—C-狀態計算中並不包含供該夥伴用之應用或假設〇狀 態、以及隨後也未試圖與夥伴核心1〇6進行通訊。再者,在與其 他核心廳(亦即,同伴核心與好友核心)的通訊方面,微碼· 皆以類似方式操作’不管其是否經由核心間通訊配線112或晶片 間通訊配線118 (或於下職明之職體間通訊配線1133)與另 一個核心106相通。 流程繼續至方塊412。於方塊412,sync_C-狀態微碼208為核 心106屬於其之一部分之晶片1〇4,透過計算”A&quot;與”B”值之最小值 來异出一混合C-狀態,並以&quot;c&quot;做表示。在一雙核心晶片中,,,c&quot; 將必定是合成C·狀態,因為&quot;a”及”B”值表示晶片上之所有(兩個) 核心之應用C-狀感。 201245948 流程繼續至決定方塊414。於決定方塊414,如果於方塊412 所計算之”C&quot;值小於2 ’或本地核心、廳並非是管理者核心1〇6, 則流程繼續至方塊416。否則,,,C&quot;值至少是2且本地核心1〇6係 為管理者核心,而流程繼續至方塊422。 於方塊416,常式對於在方塊412喚起其(於此常 式)以計算&quot;c”值的呼叫程序進行回復。流程於方塊416結束。 於方塊422’Sync_C-狀態微碼208藉由程式化圖2之CSR 234 產生在其同伴核心上之sync_C|態常式之一從屬實例,用以將於 方塊412所計算之”C”值傳送至其同伴並用以中斷同伴。這將要求 同伴計算並傳回_混合c_狀態,並提供其酬這個如ι〇6,如 以下更對此做更詳細之說明。 在這一點上,應注意sync一C-狀態微碼208並未在同伴核心中 產生sync_C-狀態常式之從屬實例,直到其已經決定其自己的晶片 本身的合成0狀‘_為止。事實上’於本說明書中所說日月之所有的 sync一C-狀態常式皆依據一相容巢狀域走訪順序進行操作。亦即, 每個sync一C-狀態常式漸進地且有條件地發現合成c_狀態,首先係 在其為一部分(例如,晶片)的最低域開始,然後,若它是該域 之官理者,則以巢狀方式往下一個較高層級域進行(例如,在圖1 的情况下是處理器本身)之’料。隨後討論的圖13,將更進一 步顯示這種尋訪順序,其巾syneJ:H|t^條件地且漸進地首 先發現核心、為晶&gt;| -部分之合成〇_狀態,接著尋訪它為封聚體之 —部分(若核心亦為該晶片之管理者),最後尋訪整個處理器或系 52 201245948 統之(若核心亦為處理器之BSP)。 流程繼續至方塊424。於方塊424,sync_C-狀態微碼208程式 化CSR 234以偵測同伴已傳回一混合c_狀態,並獲得混合€_狀 態,在圖4中以”D&quot;表示。在某些情況之下,&quot;D&quot;,在某些情形將 會’但並不需要全部(如以下與圖C中之對應的數值”L”相關的說 明)構成同伴之晶片合成C-狀態。 流程繼續至方塊426。於方塊426,sync_C-狀態微碼208藉由 計异&quot;C”及&quot;D”值之最小值為多核心微處理器1〇2計算一混合€_狀 態’其以Έ”表示。假設”D”係為同伴之晶片合成c_狀態,則,·Ε,, 將構成處理器之合成C-狀態,因為&quot;ΕΜ將是”C”(如上所述,我們 知道的這種晶片之合成C-狀態)及&quot;D·,(同伴之晶片合成c_狀態) 之最小值,且在處理器上沒有核心被從計算中所省略。如果不是 的話,則&quot;E&quot;可能構成處理器之只有一部分的合成。狀態(亦即, 這個晶片上之核心與同伴核心之應用c·狀態之最小值,而非亦屬 狖同伴之夥伴的應用C-狀態之最小值)。流程繼續至決定方塊428。 於方塊428 ’常式將於方塊426所計算之&quot;E&quot;值傳回至其呼叫 者。流程於方塊428結束。 於決定方塊432,如果圖6之核心間中斷處理常式喚醒sync_c_ 狀態常式以因應從核心之夥伴的一中斷(亦即,一夥伴喚醒此常 式),則流程繼續至方塊434。否則,核心間中斷處理常式喚醒 sync_C-狀%常式以因應從核心之同伴的一·中斷(亦即,同伴產生 此常式),而流程繼續至方塊466。 53 201245948 於方塊434,核心]06被其夥伴所中斷,所以嘴―&amp;狀態微 碼208程式化CSR 236 ’用以獲得由夥伴及其所產生常式所遞送 之探測C-狀態,在圊4中以nF&quot;表示。流程繼續至方塊幻6。 於方塊436,sync_C-狀態微碼208藉由計算其本身的應用c_ 狀態”Y”與探測C-狀態”F”(由其夥伴所接收)之最小值來為其晶 片104本身計算一混合C-狀態,其結果係以”G&quot;表示。在一雙核心 晶片中,”G”將會是包含核心106之晶片104之合成c_狀態,因為 在那種情況下,,Ύ”及”F·’將分職示該晶片之财(兩個)核心 之應用C-狀態。 流程繼續至決定方塊438。於決定方塊438,如果於方塊436 所計算之&quot;G”值小於2或核心106並非是管理者核心1〇6,則流程 繼續至方塊442。否則,如果&quot;G”為至少2且核心為管理者核心, 則流程繼續至方塊446。 於方塊442 ’為因應從其夥伴核心間而來之中斷請求,吵如 狀態微碼208程式化CSR 236,用以將於方塊436所計算之&quot;G&quot;值 傳送至其夥伴。流程繼續至方塊444。於方塊444,sync_C-狀態微 碼208將於方塊436所計算之&quot;G&quot;值傳回至喚醒它之程序。流程於 方塊444結束。 於方塊446’sync_C-狀態微碼208藉由程式化圖2之CSR 234 而在其同伴核心上產生吁沉—^狀態常式之一從屬實例,用以將於 方塊436所計算之”G”值傳送至其同伴’並用以中斷同伴。這將要 求同伴計算一混合C-狀態並將其傳回至這個核心;[〇6,以下將對 54 201245948 此做更詳細說明。流程繼續至方塊448。 於方塊448,sync—C-狀態微碼208程式化咖2糾以綱同 伴已傳回混合C-狀態至核心1〇6,並獲得混合c_狀態,在圖4中 以'Ή”表示。在至少某些而不需要全部的情況中(如與圖4C中之 對應的數值&quot;L&quot;相關的說明)’ &quot;H&quot;將構成同伴之晶片之合成^狀 態。流程繼續至方塊452。 於方塊452 ’ sync_C-狀癌微碼208藉由計算”g”及,Ή&quot;值之最 小值為多核心微處理器102計算一混合C-狀態,並以”j&quot;來表示。 假設&quot;H&quot;為同伴之晶片合成C-狀態,則”J&quot;將構成處理器之合成c_ 狀態,因為'T將是nG'·(如上所述,我們知道這是該晶片之合成 C·.狀態)及&quot;H”(同伴之晶片合成C-狀態)之最小值,且在處理器 上沒有核心被從計算所省略的話。如果不是的話,則”j&quot;可能構成 處理器之只有一部分的合成C-狀態(亦即,這個晶片上之核心與 同伴核心之應用C-狀態之最小值,而非亦屬於同伴之夥伴的應用 C-狀態之最小值)。因此,&quot;H”構成處理器之&quot;至少局部的合成&quot;C· 狀態。 流程繼續至方塊454。於方塊454,為因應經由從其夥伴之核 心間中斷請求’ sync_C-狀態微碼208程式化CSR 236,用以將於 方塊452所計算之”J”值傳送至其夥伴。流程繼續至方塊456。於 方塊456 ’常式將於方塊452所計算之”J”值傳回至喚醒它之程序。 流程於方塊456結束。 於方塊466,核心106被其同伴所中斷,所以sync_C-狀態微 55 201245948 碼2〇8程式化CSR別,用以獲得由同伴所產生常式遞送之輸入 探測C-狀態於,在圖4中以”K〃表示。 由於sync一C-狀態常式之階層式尋訪順序,同伴將不會中斷此 種核心,除非其已經發現其晶片之合成c_狀態,所以”κ&quot;會是所產 生同伴之合成C-狀態。又,應注意到因為其被一同伴所中斷,這 就表示核心106係為晶片1〇4之管理者核心1〇6。 机知繼續至方塊468。於方塊468,sync—C-狀態微碼208藉由 計算其本身的應用C-狀態,Ύ&quot;與所接收的同伴合成C-狀態&quot;K”值之 最小值,來計异處理器之至少局部的合成c_狀態,其結果係以, 表示。 如果L”為卜則i’l”無法是處理器之合成c_狀態,因為其並 未。併其夥伴之應用C-狀態。如果其夥伴之應用c_狀態為〇,則 (未被精確發現下)供處理器用之合成C-狀態將是〇。然而,縱 使不需要被精確發現,處理ϋ之合成也不大於”L&quot;。在揭露 於這個特定臨界值觸發實施例之電源管理邏輯中,一旦發現一混 合C-狀態小於2,吾人就知道處理器之合成C-狀態亦小於2。小 於2之C-狀態之實現只具有局部效果,所以更精確的判定合成c_ 狀態亚非必要。因此合成C-狀態發現過程可能逐漸放鬆並終止, 如於此所顯示的。 、、〈而如果L為〇 ’則其必然是處理器之合成C-狀態,因為 (如上所述)處理器之合成C-狀態無法超過處理器之任何一個混 σ c狀態。於部分說明書提到sync_c_狀態常式為計算一&quot;至少局 56 201245948 部的合成數值”之微妙處是有好處的。流程繼續至決定方塊472。 於決定方塊472,如果於方塊468所計算之,,L&quot;值小於2,則 流程繼續至方塊474。否則’流程繼續至方塊478。應注意的是本 發明之其他實施例可省略這種臨界值條件(例如,L &lt;2?)以繼 續一合成C-狀態發現過程。在這樣的實施例中,處理器之每個啟 動核心將無條件地決定處理器之合成C-狀態。 於方塊474 ’為因應由其同伴而來之核心間中斷請求,Sync_c-狀態微碼208程式化CSR 234,用以將於方塊468所計算之&quot;L'·值 傳送至其同伴。再者’吾人應注意當同伴接收”L&quot;時,其正接收可 能構成處理器之局部合成數值。然而,因為&quot;L&quot;小於2,所以處理 器之合成數值亦必定小於2,將排除任何更進一步判斷處理器之合 成數值之行動(如果,’L”為1)。流程繼續至方塊476&lt;3於方塊476, 令式將於方塊468所計算之&quot;L”值傳回至其呼叫者。流程於方塊476 結束。 於方塊478,Sync—C-狀態微碼208藉由程式化CSR 236在其 夥伴核心上喚醒一從屬sync_c-狀態常式’用以將於方塊468所計 算之nL&quot;值傳送至其夥伴並用以中斷夥伴。這將要求夥伴計算一混 合C-狀態並將其提供給核心1〇6。吾人可注意到在圖丨之四核心 實施例並以圖4之sync_C-狀態微瑪208作說明之架構中,這將相 當於請求夥伴提供其最近的請求c_狀態(如果有的話)。 流程繼續至方塊48^於方塊482,synC-C_狀態微碼2〇8程式 化CSR 236以债測夥伴已傳回一混合C-狀態至核心I%,並獲得 57 201245948 夥伴之混合C-狀態’在圖4中以,’Μ”表示。吾人可注意到如果夥 伴處於其最活躍的執行狀態時,則&quot;Μ&quot;之數值將是零。流程繼續至 方塊484。 於方塊484,sync—C-狀態微碼208藉由計算&quot;L”及”Μ&quot;值之最 小值而為多核心微處理器102計算一混合c_狀態,以&quot;N”表示。吾 人可注意到,在圖1之四核心實施例並以圖4之sync—c_狀態微碼 208作說明之架構中,&quot;N&quot;必定是處理器之合成c_狀態,因為其包 含同伴之晶片合成C-狀態K、核心自己的應用c_狀態A、以及夥 伴之應用C-狀態(後者係併入由夥伴所傳回之混合電源狀態M) 之最小值,這三個狀態一起包含所有四個核心之應用c_狀態。 流程繼續至方塊486。於方塊486,為因應經由其同伴而來之 核心間中斷請求’ sync_C-狀態微碼2〇8程式化CSR 234,用以將 於方塊484所計算之&quot;N&quot;值傳送至其同伴。流程繼續至方塊488。 於方塊488 ’常式將於方塊484所計算之”n&quot;值傳回至其呼叫者。 流程於方塊488結束。 現在參考圖5所顯示之流程圖,其顯示依據本發明圖丨之系 統100 ’用以執行分配在多核心微處理器1〇2之多重處理核心應 間的分散式電源管理之操作。更明確而言,此流程圖顯示藉由電 源狀態管理微碼208之喚起與重新開始(wake-and-resuine)常式 之核心’以因應核心106被一事件從一休眠狀態(例如從圖3之 方塊306、316或324,或從圖6之方塊614進入)喚醒後之操作。 流程於方塊502開始。 58 201245948 於方塊502 ’核心106因應一事件而從其休眠狀態醒來,並藉 由提取及執行微碼208之一指令處理程序而重新開始。事件可能 包含但並未受限於:一核心間中斷’亦即經由核心間通訊配線112 或晶片間通訊配線118(或圖11實施例之封裝體間通訊配線1133) 從另一核心106而來之中斷;藉由晶片組114之匯流排116上之 STPCLK h 5虎g之没置,措由晶片組1 ].4在匯流排116上對 STPCLK信號解除設置(deassertion);以及另一型式之中斷,例 如一外部中斷要求信號之設置,例如可能藉由一周邊裝置(例如 USB裝置)而產生。流程繼續至決定方塊5〇4。 於決定方塊504,喚起與重新開始常式判斷核心1〇6是否被另 一核心106之中斷所喚起。如果是’則流程繼續至方塊;否則, 流程繼續至決定方塊508。 於方塊506,一核心間中斷常式掌控核心間中斷,如相關於圖 6所詳細說明的。流程於方塊506結束。 於決定方塊508 ’喚起與重新開始常式判斷核心丨〇6是否被藉 由晶片組114在匯流排116上設置STPCLK信號置所喚起。如果 是,則流程繼續至方塊512 ;否則,流程繼續至決定方塊516。 於方塊512 ,為因應於圖3之方塊322或於圖ό之方塊608 所執行之I/O讀取傳輸’晶片組114已設置STPCLK請求移除匯 流排116時脈之允許。回應於此’核心ι〇6微碼2〇8在匯流排116 上發佈一 STOP GRANT 息,以通知晶片組η*其可能移除匯流 排116時脈。如上所述’於一實施例中,晶片組m將持續等待, 59 201245948- Generally speaking, a composite power supply of a domain is the extreme value of the power state of the application (the minimum value in the AOT power state mechanism). For example, - the synthesis C of the wafer 104, the state is the core of all of the wafers, applying the minimum value of the c_ state (e.g., the most recently required C-state, if all cores have such values). Overall, the multi-core microprocessor moves to the composite C state for all cores of the multi-core microprocessor 102 to receive the minimum value of the c_ state. However, the hybrid state may be a composite power state of the thief, or a partial composite state. - The partial composite power state will be the extreme value of the core application power states of the two application domains on 47 201245948. In some parts, this specification refers to a mixed power state that includes a "at least partially synthesized power state" to include any variation. The potential between a hybrid power state and a composite power state (even if it is subtle) The differences will become more apparent through the description of Figures 4C, 10 and 17. It is noted in advance that a non-zero composite c-state of one of the multi-core microprocessors 102 indicates that each core 106 has seen a non-executive The M_IT of the C_ state (i.e., C-like sadness with a value of work or greater), and the synthetic C-likeness of a zero value indicates that not every core 106 has seen MWAIT. Again, greater than or equal to The value table of 2 is not the core of all cores 102 of the core microprocessor 102 has received a larger C-state MWAIT instruction. The flow continues to decision block 314. In decision block 314, the microcode 2〇8 The STRHL routine checks the mixed c-state "z&quot; determined at block 312. If, z, is greater than or equal to 2, then flow continues to decision block 318. Otherwise, flow continues to block 316. At block 316, the STRHL routine of microcode 208 places core 1〇6 to sleep. The decision is made in decision block 318 'The STRHL routine of microcode 208 determines if core 1〇6 is a BSP. If yes, then flow continues to block 322; otherwise, the flow continues to block 324. At block 322 'BSP 106 notifies the chipset 114 it may be required to remove the busbar 116 clock and/or to inhibit the creation of a snooping shield ring on the busbar 116. 48 201245948 In the embodiment - the only authorized to allow under the well-known Pentium 4 busbar agreement The BSP age of the higher power management state will inform the chipset that it may request to remove the bus 116 by applying one of the 1/〇 reads on the initialization bus 116 to a predetermined ι/〇埠. And/or inhibiting the creation of a snoop cycle on the busbar 116. The chipset 114 then sets the stpclk signal on the busbar 116 to request removal of the busbar 116 clock. In one embodiment, at notification b曰After group 114 can set STPCLK at block 322 (or block 608), the STRHL routine of microcode 208 executing on BSP core 1 〇6 will wait for chipset i to set STPCLK instead of going to sleep state (in Block 324 or block 614), and then notify other cores 106 about the setting of the STPCLK to 'publish its STOP GRANT message, and then go to sleep. The predetermined I/O is specially based on the read transmission by 1/〇. The port group 114 can then inhibit the generation of a snoop cycle on the bus 116. Flow continues to block 324. At block 324, the microcode 208 places the core 106 in a sleep state. Flow ends at block 324. 4' a flow chart showing the operation of another component of the system of FIG. 1 'which performs distributed power management distributed among the multiple processing cores 1 - 6 of the multi-core microprocessor 1 。 2. More specifically That is, the flowchart shows the operation of one example of the Sync_C-state routine of the power state management microcode 208 of Figure 3 (and Figure 6). Although Figure 4 is a single example of the sync_c-state routine showing the microcode 208. Functional flow chart 'but We will understand below that it implements a synthetic C-state discovery process via multiple instances of this routine. The flow begins at block 4〇 4. 49 201245948 At block 402, a microcode 208 on a core 106 ( &quot;sync_C - One instance of the sync_C-state routine of the state micro mom 208") is woken up and receives an input probe C-state 'indicated by &quot;A" in FIG. An instance of the sync_C-state routine may be awakened from where the MWAIT instruction microcode 208 is executed, as explained in relation to Figure 3, in which case the sync_C-state routine forms an initial instance of the sync_C-state routine. . In addition, one instance of the sync_C-state routine may be generated by a synchronization request originating from one of the other cores (herein referred to as an externally generated synchronization requirement), in which case sync-C-like sorrow The routine constitutes a dependent instance of the sync_C-sense routine. In particular, when performing a local instance of one of the sync-c-state routines on the core connected to another node, it is possible to generate a local instance of the sync-C-state routine by transmitting an appropriate inter-core interrupt to the local core. . As explained in more detail with respect to Figure 6, an inter-core interrupt handling routine (ICIH) of the power state management microcode 208 will process the inter-core interrupts received by the node-connected cores 1-6. Flow continues to decision block 404. At decision block 404, if the instance of the sync c_state routine (ie, "local instance") is an initial instance, that is, if it is awakened from the MWAIT instruction microcode 208 of FIG. 3, the flow continues. To block 406. Otherwise, the local instance is generated by executing the external or local real-time of the sync-C state f-type on the node connected to the node, and the flow proceeds to decision block 432. The block 406 'Sync_C-state microcode 208 is generated by the CSR 236 of the programmed circle 2 to generate a "subordinate" at its core, and (4) the value of the "A" received at block 402 is transmitted to Its partner is used to interrupt the partner ^ This will require the partner 50 201245948 to calculate the current c-state and pass it back to the local core 1〇6, which will be explained in more detail below. The process continues to block. In the block, the sync-c_ state microcode is programmed (10) 236 to detect that the partner has returned a mixed c_ state to the core full, such as the silk 'cluster's hybrid (four) state, rhyme 4, (four), the table (d) Note that if the partner is in its most active execution state (_face 吆 _), then the value of &quot;B" will be zero. In one embodiment, the microcode latrine waits for the partner to respond in a loop. In the request made by the square, the loop is a pre-determined value to poll the CSR 236 for the side partner to have a mixed c_ state. In the case of the case towel, the loop contains - timeout If the timeout counter expires, the microcode 208 assumes that the partner core 1〇6 is no longer activated and can be used, does not include an application for the partner in any subsequent sync-C-state calculations or Assume that the state of the embarrassment, and subsequently did not attempt to communicate with the partner cores 1. In addition, in the communication with other core offices (that is, the companion core and the friend core), the microcodes are operated in a similar manner. Whether it is via the inter-core communication wiring 112 or the inter-chip communication wiring 118 (or The inter-worker communication wiring 1133) is in communication with another core 106. Flow continues to block 412. At block 412, the sync_c-state microcode 208 is the wafer 1〇4 to which the core 106 belongs, by calculating "A&quot; The minimum value of the "B" value is different from a mixed C-state and is represented by &quot;c&quot;. In a pair of core chips, c&quot; will necessarily be a composite C· state because the values of “&quot;a” and “B” indicate the C-like sense of application of all (two) cores on the wafer. 201245948 Flow continues Decision block 414. At decision block 414, if the "C&quot; value is less than 2' calculated at block 412 or the local core, hall is not the manager core 1-6, then flow continues to block 416. Otherwise, the C&quot; value is at least 2 and the local core 1〇6 is the manager core, and the flow continues to block 422. At block 416, the routine replies to the calling procedure that evokes (at this routine) at block 412 to calculate the &quot;c&quot; value. The flow ends at block 416. At block 422 'Sync_C-state microcode 208 by program CSR 234 of Figure 2 generates a dependent instance of the sync_C| state routine on its companion core for transmitting the "C" value calculated at block 412 to its companion and for interrupting the companion. This would require peer computing And return _ mixed c_ state, and provide its rewards such as ι〇6, as described in more detail below. At this point, it should be noted that the sync-C-state microcode 208 is not in the companion core. The slave instance of the sync_C-state routine is generated until it has determined the synthesis 0 of its own chip itself. In fact, all the sync-C-state routines of the day and month mentioned in this specification are The operation is performed according to a compatible nested domain access sequence. That is, each sync-C-state routine progressively and conditionally discovers the synthesized c_state, first in the lowest domain of which is a part (eg, a wafer). Start, then, if it is the domain The controller then proceeds to the next higher level domain in a nested manner (for example, the processor itself in the case of Figure 1). Figure 13, which is discussed later, will further show this search order, which The towel syneJ:H|t^ conditionally and progressively first discovers the core, the crystal _ state of the granules, and then searches for it as part of the encapsulation (if the core is also the manager of the wafer) Finally, the entire processor or system is accessed (if the core is also the BSP of the processor). The flow continues to block 424. At block 424, the sync_c-state microcode 208 programs the CSR 234 to detect that the companion has been transmitted back. A mixed c_ state is obtained and a mixed €_ state is obtained, which is indicated by "D&quot; in FIG. In some cases, &quot;D&quot;, in some cases, will not necessarily require all (as described below in relation to the value "L" in Figure C) to form a companion wafer synthesis C-state. . Flow continues to block 426. At block 426, the sync_C-state microcode 208 calculates a mixed state of the multi-core microprocessor 1〇2 by the minimum value of the &quot;C&quot; and &quot;D&quot; values, which is represented by Έ. "D" is the companion wafer synthesis c_ state, then, Ε, will constitute the synthetic C-state of the processor, because &quot;ΕΜ will be "C" (as mentioned above, we know the wafer The minimum value of the synthesized C-state) and &quot;D·, (companion wafer synthesis c_state), and no core on the processor is omitted from the calculation. If not, then &quot;E&quot; may constitute processing Only a portion of the synthesis state (ie, the minimum of the application c·state of the core and companion cores on this wafer, rather than the minimum of the application C-state of the companion partner) continues. Decision block 428. The constant &quot;E&quot; value calculated at block 426 is passed back to its caller at block 428. The flow ends at block 428. At decision block 432, if the inter-core interrupt processing routine of Figure 6 Wake up the sync_c_ state routine to respond to the core An interrupt (ie, a partner wakes up the routine), the flow continues to block 434. Otherwise, the inter-core interrupt handler routine wakes up the sync_C-like % routine to respond to an interrupt from the core companion (ie, The companion generates this routine, and the flow continues to block 466. 53 201245948 At block 434, the core]06 is interrupted by its partner, so the mouth-& state microcode 208 stylized CSR 236' is used by the partner and The detected C-state delivered by the generated routine is represented by nF&quot; in 圊 4. The flow continues to block phantom 6. At block 436, the sync_C-state microcode 208 is calculated by its own application c_state "Y" The minimum value of the "detected C-state" F" (received by its partner) is used to calculate a mixed C-state for its wafer 104 itself, the result of which is indicated by "G&quot;. In a dual core chip, the "G" will be the composite c_ state of the wafer 104 containing the core 106, because in that case, Ύ" and "F·' will be assigned to show the chip's wealth (two The core application C-state. Flow continues to decision block 438. At decision block 438, if the &quot;G&quot; value calculated at block 436 is less than 2 or the core 106 is not the manager core 1-6, the flow continues to block 442. Otherwise, if &quot;G&quot; is at least 2 and the core For the administrator core, the flow continues to block 446. At block 442', in response to an interrupt request from its partner core, the status microcode 208 stylizes the CSR 236 for transmitting the &quot;G&quot; value calculated at block 436 to its partner. Flow continues to block 444. At block 444, the sync_c-state microcode 208 returns the &quot;G&quot; value calculated at block 436 to the program that wakes it up. Flow ends at block 444. Block 446 'sync_C-state microcode 208 generates a dependent instance of the state-of-the-art state on its companion core by programming CSR 234 of FIG. 2 for the "G" calculated at block 436. The value is passed to its companion' and used to interrupt the companion. This will require the companion to calculate a mixed C-state and pass it back to this core; [〇6, which will be explained in more detail in 54 201245948. Flow continues to block 448. At block 448, the sync-C-state microcode 208 stylized the coffee 2 rectifies that the companion has passed back the mixed C-state to the core 1〇6 and obtains the mixed c_state, which is represented by 'Ή' in FIG. In at least some, but not all, cases (as described in the corresponding &quot;L&quot; corresponding to Figure 4C) &quot;H&quot; will constitute the composite state of the companion wafer. Flow continues to block 452. At block 452 'sync_C-like cancer microcode 208, a multi-core microprocessor 102 calculates a mixed C-state by computing a "g" and a minimum value of Ή&quot; and is represented by "j&quot;. Assuming that &quot;H&quot; is the companion's wafer synthesis C-state, then "J&quot; will constitute the composite c_ state of the processor, because 'T will be nG'. (As mentioned above, we know that this is the synthesis of the chip C· . state) and &quot;H" (companion wafer synthesis C-state) the minimum value, and no core on the processor is omitted from the calculation. If not, then "j&quot; may constitute only a portion of the processor's composite C-state (ie, the minimum C-state of the core and companion cores on the die, not the partner of the companion) The minimum value of the C-state. Therefore, &quot;H" constitutes the &quot;at least partial synthesis&quot; C. state of the processor. Flow continues to block 454. At block 454, the CSR 236 is programmed to transmit the "J" value calculated at block 452 to its partner in response to requesting the 'sync_C-state microcode 208' from its core interrupt request. Flow continues to block 456. The "J" value calculated at block 452 is returned to the program that wakes it up at block 456'. Flow ends at block 456. At block 466, the core 106 is interrupted by its companion, so the sync_C-state micro 55 201245948 code 2〇8 stylizes the CSR to obtain the input probe C-state of the routine delivery generated by the companion, in FIG. Expressed by "K〃." Due to the hierarchical search sequence of the sync-C-state routine, the companion will not interrupt the core unless it has found the composite c_ state of its wafer, so "κ" will be the companion produced. Synthetic C-state. Also, it should be noted that because it is interrupted by a companion, this means that the core 106 is the manager core 1〇6 of the chip 1〇4. The machine continues to block 468. At block 468, the sync-C-state microcode 208 counts at least the minimum value of the C-state &quot;K&quot; value of the received companion by calculating its own application C-state, The partial synthesis of the c_ state, the result of which is denoted by , if L" is the same, then i'l" cannot be the composite c_ state of the processor because it is not. And its partner applies the C-state. If the application c_ state of the partner is 〇, then the synthetic C-state for the processor (which is not accurately discovered) will be 〇. However, even if it is not required to be accurately discovered, the composition of the processing 也不 is not greater than "L&quot;. In the power management logic disclosed in this particular threshold triggering embodiment, once a mixed C-state is found to be less than two, we know that the combined C-state of the processor is also less than two. The implementation of the C-state less than 2 has only a partial effect, so a more accurate decision to synthesize c_state is not necessary. Thus the synthetic C-state discovery process may gradually relax and terminate, as shown here. , and < If L is 〇 ', it must be the synthesized C-state of the processor, because (as described above) the synthesized C-state of the processor cannot exceed any of the mixed σ c states of the processor. It is advantageous to mention in the specification that the sync_c_state routine is a subtle point of calculating a "composite value of at least the office 56 201245948." Flow continues to decision block 472. At decision block 472, if calculated at block 468 If the L&quot; value is less than 2, the flow continues to block 474. Otherwise, the flow continues to block 478. It should be noted that other embodiments of the present invention may omit such threshold conditions (e.g., L &lt; 2?) To continue a synthetic C-state discovery process. In such an embodiment, each of the boot cores of the processor will unconditionally determine the composite C-state of the processor. At block 474' is the core between the companions The interrupt request, Sync_c-state microcode 208, stylizes CSR 234 for transmitting the &quot;L' value calculated in block 468 to its companion. Again, 'we should pay attention to when the companion receives the L&quot; The reception may constitute a local composite value of the processor. However, since &quot;L&quot; is less than 2, the synthesized value of the processor must also be less than 2, and any action to further determine the synthesized value of the processor (if 'L' is 1) will be excluded. The flow continues to block 476 &lt; 3 At block 476, the command returns the &quot;L&quot; value calculated at block 468 to its caller. The process ends at block 476. At block 478, the Sync-C-state microcode 208 wakes up a dependent sync_c-state routine on its partner core by the stylized CSR 236 to pass the nL&quot; value calculated at block 468 to its partner and Interrupt partner. This will require the partner to calculate a mixed C-state and provide it to core 1〇6. We may note that in the architecture of the core embodiment of Figure 4 and illustrated by the sync_C-state DM 208 of Figure 4, this would be equivalent to the requesting partner providing its most recent request c_state (if any). The flow continues to block 48^ at block 482, the synC-C_state microcode 2〇8 stylized CSR 236 has returned a mixed C-state to the core I% by the debt test partner, and obtains a mix of 57 201245948 partners C- The state 'is indicated by 'Μ' in Figure 4. We can note that if the partner is in its most active execution state, the value of &quot;Μ&quot; will be zero. Flow continues to block 484. At block 484, sync The C-state microcode 208 calculates a mixed c_state for the multi-core microprocessor 102 by computing the minimum of &quot;L&quot; and &quot;&quot; values, denoted by &quot;N&quot;. It may be noted that in the architecture of the core embodiment of FIG. 1 and illustrated by the sync-c_state microcode 208 of FIG. 4, &quot;N&quot; must be the synthesized c_ state of the processor because it contains companions The wafer synthesis C-state K, the core's own application c_state A, and the application's application C-state (the latter is incorporated into the hybrid power state M returned by the partner), the three states together The application c_ state of all four cores. Flow continues to block 486. At block 486, the inter-core interrupt request 'sync_C-state microcode 2〇8' is programmed to pass the &quot;N&quot; value calculated by block 484 to its companion. Flow continues to block 488. The value of n&quot; calculated by block 488 'normally calculated at block 484 is passed back to its caller. Flow ends at block 488. Referring now to the flowchart shown in Figure 5, a system 100 in accordance with the present invention is shown. 'Operation to perform distributed power management assignments between multiple processing cores of the multi-core microprocessor 110. More specifically, this flowchart shows the arousal and restart of the microcode 208 by power state management. (wake-and-resuine) The core of the routine 'has the operation after the core 106 is awakened by an event from a sleep state (e.g., from block 306, 316 or 324 of Figure 3, or from block 614 of Figure 6). Flow begins at block 502. 58 201245948 At block 502 'core 106 wakes up from its sleep state in response to an event and resumes by extracting and executing an instruction handler for microcode 208. The event may include but is not affected Limited to: an inter-core interrupt', that is, an interruption from another core 106 via the inter-core communication wiring 112 or the inter-chip communication wiring 118 (or the inter-package communication wiring 1133 of the embodiment of FIG. 11); The STPCLK h 5 on the bus 116 of the group 114 is not set, and the STPCLK signal is deasserted on the bus bank 116 by the chip set 1]. 4, and another type of interrupt, such as an external interrupt. The setting of the request signal may be generated, for example, by a peripheral device (e.g., a USB device). Flow continues to decision block 5〇4. At decision block 504, arouses and restarts the routine to determine if core 1〇6 is being used by another core. The interruption of 106 is invoked. If yes, the flow continues to block; otherwise, the flow continues to decision block 508. At block 506, an inter-core interrupt routine controls the inter-core interrupt, as described in detail with respect to Figure 6. </ RTI> </ RTI> </ RTI> </ RTI> </ RTI> </ RTI> </ RTI> </ RTI> </ RTI> </ RTI> </ RTI> </ RTI> </ RTI> </ RTI> </ RTI> </ RTI> </ RTI> </ RTI> </ RTI> </ RTI> </ RTI> </ RTI> Otherwise, the flow continues to decision block 516. At block 512, the STPCLK request has been set for the I/O read transfer that is performed in response to block 322 of FIG. 3 or block 608 of FIG. In addition to the bus 116 permission, in response to this 'core 〇 6 microcode 2 〇 8 on the bus 116 to issue a STOP GRANT message to inform the chipset η * which may remove the bus 116 pulse. In an embodiment, the chipset m will continue to wait, 59 201245948

直到所有核心106已發佈STOP GRANT訊息後再移除匯流排116 時脈。而在另一實施例中,可在單一核心106已發佈STOP GRANT 訊息之後,由晶片組114移除匯流排116時脈。流程繼續至方塊 514。 於方塊514 ’核心106返回至休眠。而晶片組114將移除匯流 排116時脈,以便減少因多核心微處理器1〇2之電源消耗,如上 所述。最後,晶片組114將恢復匯流排116時脈,然後解除設置 STPCLK,以便使核心106回復至它們的執行狀態,俾能使它們可 以執行使用者指令。流程於方塊514結束。 於決定方塊516’喚起與重新開始常式判斷核心1〇6是否藉由 晶片組114於匯流排116上的STPCLK信號之解除設置所喚起。 如果是,則流程繼續至方塊518 ;否則,流程繼續至方塊526。 於方塊別’為因應-事件(例如魏計時財斷或周邊中 斷)’晶片.组114已恢復匯流排116日夺脈並解除設置STpcLK以使 核心106再開始執行。回應於此&quot;奐起與重新開始常式解除於方 塊308所執订之電源即約動作。舉例而言,微碼遍可能使電源 恢復予核心106局部快取、掛4 曰加核心106日守脈頻率、或增加核心 106細作電壓。此外,核心1()6可能使電源恢復予共用快取,舉例 而言’如果核心106係為咖。流程繼續至方塊522。 於方塊522 ’喚起與重新開始常式讀取並寫入CSR说盘 挪’用以通知所有其他核心1〇6這個核心廳已醒來且再度執 灯。喚起與重新開始常式可儲存,,Q&quot;以作為核以應用或者最新的 201245948 有效要求C-狀態。流程繼續至方塊524。 於方塊524 ’喚起與重新開始常式終止並將控制返回至指令譯 碼器204,以重新開始譯碼提取的使用者程式指令(例如,滿指 令)。具體言之,典型的使用者指令提取與執行將在“^丁指令 之後的指令重新開始。流程於方塊524結朿。 於方塊526,喚起與重新開始常式處理其他中斷事件,例如上 述相關於方塊502者。流程於方塊526結束。 現在參考圖6所顯示之流程圖,其顯示本發明圖1之系統1〇〇 用以執行分配在多核心微處理器102之多重處理核心1〇6之間的 分散式電源官理操作。更明確而言,此流程圖顯示微碼2〇8之核 心間中斷處理常式(ICIHR)之操作,其係因應接收一核心間中斷, 亦即經由核心間通訊配線112或晶片間通訊配線118(例如可能於 圖4之方塊406、422、446或478所產生的)從另一核心1〇6之 中斷所執行之操作。微碼208可能藉由輪詢(如果微碼2〇8已經 執行)採取一核心間中斷、或者微碼2〇8可能採取一核心間中斷 以作為在使用者程式指令之間的一真正的中斷、或者中斷可能使 微碼208從核心106正休眠之狀態喚醒。 流程於方塊604開始。於方塊604,中斷核心1〇6之ICIHR 依據圖4呼叫一本地Sync_C-狀態常式,以繼續由另一核心所開始 之同步化電源狀態發現過程。回應於此,其獲得供多核心微處理 器102之至少一局部合成C-狀態,圖6中以&quot;PC&quot;表示。ICIHR呼 叫具有一輸入值”Y〃之sync_C-狀態微碼208,其係由外部sync_c- 61 201245948 狀態常式所遞送之探測C-狀態,而本地sync一C-狀態常式將依附 (will depend)於外部sync_C-狀態常式。又,大於或等於2之數 值表示TC&quot;係為一種多核心微處理器1〇2之所有核心1〇6的完全 且非僅是局部的合成c-狀態,並表示處理器之所有核心1〇6已接 收指定&quot;PC&quot;或更大之C-狀態數值之一 MWAIT指令。 么||_私繼續至方塊606。於方塊606,微碼208決定於方塊604 所獲彳寸之PC之數值疋否大於或等於2,以及核心是否被授權 以執行或允許&quot;pc&quot;c-狀態之執行(例如,核心1〇6係為BSp)。如 果是,則流程繼續至方塊608 ;否則,流程繼續至決定方塊612。 於方塊608,核心、106 (例如,當BSp核心1〇6被授權如此做 時)通知晶片組m其可能要求移除匯流排116時脈之許可,如 於上述之方塊322。流程繼續至決定方塊μ〕。 於決定方塊612,微碼決定其是碰休眠触起。如果 是’則流程繼續至方塊614 ;否%,流程繼續至方塊⑽。The bus 116 is not removed until all cores 106 have issued a STOP GRANT message. In yet another embodiment, the busbar 116 clock may be removed by the die set 114 after the single core 106 has issued the STOP GRANT message. Flow continues to block 514. At block 514' core 106 returns to sleep. The chipset 114 will remove the bus 116 clock to reduce power consumption due to the multi-core microprocessor 1 , 2, as described above. Finally, the chipset 114 will resume the bus 116 clock and then de-set the STPCLK to return the core 106 to their execution state so that they can execute user commands. Flow ends at block 514. Whether or not the decision block 516' evokes and restarts the normal mode determination core 1 〇 6 is aroused by the de-setting of the STPCLK signal on the bus bar 116 by the chip set 114. If so, the flow continues to block 518; otherwise, the flow continues to block 526. In the block ‘ is a response-event (e.g., Wei timing or peripheral interrupt) ’. The bank 114 has resumed the busbar 116 and re-sets the STpcLK to cause the core 106 to resume execution. In response to this, the "power up" action initiated by the block 308 is lifted and restarted. For example, the microcode pass may cause the power supply to recover to the core 106 local cache, the 4 core plus the core 106 day clock frequency, or the core 106 fine voltage. In addition, Core 1() 6 may restore power to the shared cache, for example, if the core 106 is a coffee. Flow continues to block 522. The block 522 'Arouses and restarts the routine reading and writes the CSR saying disk move' to notify all other cores 1 〇 6 that the core hall has woken up and re-lighted. Arouse and restart the normal storage, Q&quot; as a core application or the latest 201245948 effective requirement C-state. Flow continues to block 524. The block 524' evokes and restarts the routine termination and returns control to the command decoder 204 to resume decoding the extracted user program instructions (e.g., full instructions). In particular, a typical user instruction fetch and execution will restart the instruction following the "instruction." The process concludes at block 524. At block 526, the evoke and restart routines are processed for other interrupt events, such as those described above. Block 502. The flow ends at block 526. Referring now to the flow chart shown in Figure 6, the system 1 of Figure 1 of the present invention is shown for performing the processing of multiple processing cores 1-6 of the multi-core microprocessor 102. Decentralized power supply operation. More specifically, this flow chart shows the operation of the inter-core interrupt processing routine (ICIHR) of microcode 2〇8, which is based on receiving an inter-core interrupt, that is, via the core. Operation performed by communication interconnect 112 or inter-wafer communication wiring 118 (e.g., as may be generated by blocks 406, 422, 446, or 478 of Figure 4) from the interruption of another core 1-6. Microcode 208 may be polled (If microcode 2〇8 has been executed) take an inter-core interrupt, or microcode 2〇8 may take an inter-core interrupt as a real interrupt between user program instructions, or an interrupt may cause The code 208 wakes up from the state in which the core 106 is sleeping. Flow begins at block 604. At block 604, the ICIHR of the interrupt core 1-6 calls a local Sync_C-state routine according to Figure 4 to continue synchronization with another core. In response to this, it obtains at least a partial synthesis C-state for the multi-core microprocessor 102, represented by &quot;PC&quot; in Figure 6. The ICIHR call has an input value "Y〃 sync_C- State microcode 208, which is the probe C-state delivered by the external sync_c- 61 201245948 state routine, and the local sync-C-state routine will depend on the external sync_C-state routine. Moreover, a value greater than or equal to 2 indicates that TC&quot; is a complete and not only a partial synthetic c-state of all cores 1〇6 of a multi-core microprocessor 1〇2, and represents all cores of the processor. 6 has received the MWAIT instruction specifying one of the &quot;PC&quot; or larger C-state values. What?|_ Private continues to block 606. At block 606, the microcode 208 determines whether the value of the PC obtained at block 604 is greater than or equal to 2, and whether the core is authorized to perform or allow the execution of the &quot;pc&quot;c-state (eg, Core 1〇) The 6 series is BSp). If so, the flow continues to block 608; otherwise, the flow continues to decision block 612. At block 608, the core, 106 (e.g., when the BSp core 1-6 is authorized to do so), notifies the chipset m that it may require permission to remove the bus 116, as in block 322 above. The flow continues to the decision block μ]. At decision block 612, the microcode determines that it is touching the sleep touch. If yes, the flow continues to block 614; no%, the flow continues to block (10).

2〇4,並重新開始對所提取的使用者程式指令進行解譯。 。流程於方 204,並重 塊616結束。2〇4, and restart the interpretation of the extracted user program instructions. . The process ends at block 204 and ends at block 616.

相較之下,在圖8之例子中, 作貫例。在圖7之例子中’使用者 執行,每個執行一 MWAIT指令。 使用者程式有效地在核心106上執 62 201245948 行’每個於不同的時間執行_mwait指八 執行一 MWAIT指令而進八休眠之後输。卩在另—核心已 核心106之微碼208之特徵, 丁 乂些例子—起顯示 ,一 .四個核心106之每-個。如m mm 丁十應於圖1之 如以上相關於圖1所顯示盥所述者,妨 心〇與核心2為它們的晶片1〇4之管理者,、 乂 心微處理器102之BSP。圖7之每行矛二,而核心〇為多核 母订表不由各個核心10ό所採取 之動作。圖7每觸動作向下流糊表示時間之經過。 首先,每個核心觸遇到—個由各種C·狀態所指定之MWAIT 指令(於方塊302)。在圖7之例子中,送至核心〇與核心3之 &amp;令以4之(:·_,而送至核心〗與核心、2之汀 指令指定5之C-狀態。每-個核心106回應地執行其相關的電源 節約動作(於方塊期),_所接收的目標c·狀態(,,χ&quot;)儲存 為其所應用的以及最近的有效要求c_狀態” γ„ ^ 其次,每個核心106將其應用c_狀態&quot;γ&quot;作為一探測c_狀態 傳送至其夥伴(於方塊4〇6),如以具有,,A”標記值之箭號所表示。 每個核心106接著接收其夥伴之探測c_狀態(於方塊4〇8),並計 算其晶片104合成C-狀態&quot;C”(於方塊412)。在此例子中,由每 個核心106所計算之&quot;C&quot;值為4。因為核心1及核心3並非是管理 者核心,所以它們兩者前進至休眠(於方塊324&gt; 因為核心0與核心2係管理者核心,所以它們彼此(亦即, 它們的同伴)傳送各自的”C&quot;值給對方(於方塊422),如以具有&quot;C” 63 201245948 標記值之箭號所表示。它們每個接收其同伴之晶片合成c _狀態(於 方塊424),並計算多核心微處理器1〇2合成c_狀態”E”(於方塊 426)。在此例子中,由每一個核心〇與核心2所計算之&quot;£&quot;值為4。 因為核心2並非是BSP核心106,所以其進行到休眠(於方塊324)。 因為核心0係為BSP,所以其通知晶片组114可能要求移除 匯&quot;iL排116時脈之許可(於方塊322),例如,設置$丁pclk。更 明確而言,核心〇通知晶片組114 _多核心微處理器1〇2合成 C-狀態為4 ’然後核心〇進行到休眠(於方塊324)。依據由於方 塊322所初始化之1/0讀取傳輸而特別指定之預定ι/〇連接璋位 址,晶片、組114可隨後抑制在匯流排116上產生窺探循環。 —當所有的核心廳休眠時,晶片組114設置STpCLK將喚醒 每個核〜106 (於方塊502)。每一個核心概回應地發佈一 ST〇p grant訊息給晶片組114 (於方塊512),然後返回至休眠(於方 塊514)。核心106可能休眠持續一段不明確的時間量’在沒有電 源即約動作與休眠之益處下,仍可比它們正常操作時消耗更少的 電源。 最後,發生一喚醒事件。在此例子中,晶片組m解除設置 STPCLK,其喚醒每一個核心、⑽(於方塊5〇2)。每—個核心觸 口應地解除其先前的電源節約動作(於方塊518),並離開其微碼 208且恢復至提取並執行使用者碼(於方塊洲。 見在參考圖8所顯不之流程圖,其顯示依據本發明圖1之系 統⑽依據圖3至6所朗操作流程之第二實例。圖8之流程圖 64 201245948 歲圖7 ’然而’在圖8之例子中,每個有效地在核心觸上執 行之使用者財於不_日扣執行—響抓齡,亦即在另一 健心在齡-MWAIT^M麵錄_才執行。 乜3首先遇到一個具有特定目標&amp;狀態&quot;X&quot;為*之難^τ 才曰7 (於方塊3〇2)。核心3回應地執行其相關的電源節約動作(於 方塊308) ’並將τ儲存為其應用c-狀態,以下更進—步以τ 表不核u 3接著將其應用c_狀態作為—制〔狀態傳送至並夥 伴’核心2 ’(於方塊406),如以具有&quot;A&quot;標記值之箭號所表示, 其將中斷核心2。 核心2係被其夥伴核心3所中斷(於方塊6〇4)。因為核心2 仍然處於-執行狀態,所以其自己的細c_狀態為Q,以&quot;γ&quot;表示 (在方塊604 + )。核心2接收核心3之探測c_狀態(於方塊犯4), 以&quot;F”表示並具有4之數值。核心2接著計算其^ 1Q4合成c_ 狀態&quot;G&quot;(於方塊436),並將〇之”G”值傳回至其夥伴核心3 (於 方塊442)。然後’核心2離開其微碼2〇8並回復至使用者碼(於 方塊616)。 核心3接收其夥伴核心2之〇之同步c-狀態,ΊΒΠ (於方塊 408)。核心3接著又計算其晶片104合成c_狀態”c,〈於方塊412)。 因為C之數值係為0,所以核心3進行到休眠(於方塊316 )。 核心2隨後遇到一個具有特定目標c_.狀態”χ&quot;為5之MWAIT 指令(於方塊302)。核心2回應地執行相關的電源節約動作(於 方塊308) ’並將&quot;X&quot;儲存為其應用c-狀態,隨後對核心2以&quot;Y” 65 201245948 表不。核心2接著將’Ύ〃(其係為5)作為一探測c_狀態傳送至其 夥伴’核心3 ’(於方塊406) ’如以具有&quot;A&quot;標記值之箭號所表示, 其將中斷核心3。 核心3係被喚醒核心3之其夥伴核心2所中斷(於方塊5〇2)。 因為核心3之前刺cm為4之MWAIT指令,且該數值仍然 是正確的,其應用C-狀態係為4,以&quot;γ&quot;表示(在方塊6〇4中)。 核心3接收核心2之探測C-狀態(於方塊434),以&quot;F&quot;表示並具 有5之數值。核心3接著計算其晶片1〇4合成c_狀態”G&quot;(於方塊 436)以作為探測C-狀態之最小值(亦即,5)、以及自己的應用 C-狀態(亦即’ 5),並將4之”G”值作為一混合〇狀態傳回至其夥 伴核心2 (於方塊442)。核心3接著返回至休眠(於方塊444)。 核心2接收其夥伴核心3之混合c_狀態(於方塊4〇8),以&quot;B„ 表示並具有4之數值,然後計算其晶片1〇4合成c_狀態&quot;c”值(於 方塊412)作為混合C-狀態之一最小值(亦即,4)、以及自己的 應用C-狀態(亦即,4)。因為核心2已發現其最低層次域之合成 C-狀態係至少為2之數值,但作為該域之管理者之核心2則屬於 一較高層級的同屬性群組,所以其(核心2)接著將自己之”c,,值 (為4)傳送至其同伴核心〇 (於方塊422),其將中斷核心〇。 核心〇係被其同伴核心2所中斷(於方塊604)。因為核心〇 處於一執行狀態,所以其應用c_狀態為〇,以&quot;γ”表示(在方塊6〇4 中)。核心〇接收核心2之探測C-狀態(於方塊466),以”Κ',表示 並具有4之數值。然後,核心〇計算其混合c-狀態&quot;l&quot;(於方塊 66 201245948 468)並將0之L值傳送至其同伴核心2 (於方塊474)。接著, 核心0離開其微碼观並回復至使用者碼(於方塊⑽)。 核心2接收其同伴核心〇之混合匕狀態(於方塊犯仆以 ·:表示並具有G之數值’魄計算其自己屍合C-狀態(於方塊426), ·:其係以E表不。因為E,’值係為G,所以核心、2進行到休眠(於方 塊 316)。 核心0接著遇到一個特定目標C-狀態&quot;X&quot;為4之MWAIT指令 (於方塊3〇2)。核心〇回應地執行相關的電源節約動作(於方塊 308),並將&quot;X”儲存為其應用C·狀態,以&quot;γ&quot;表示。然後,核心〇 將,Τ (其係為4)作為一探測C-狀態傳送至其夥伴,核心i,(於 方塊406) ’以具有,,A&quot;標記值之箭號表示,其將中斷核心i。 核心1係被其夥伴核心0所中斷(於方塊604)。因為核心i 仍然處於-執行狀態,所以其應用c_狀態為〇,以,ύ&quot;表示(在方 塊604中)。核心i接收核心〇之探測c遠態(於方塊434),以 ”F”表示並具有4之數值。核心!接著計算其晶片服合成c_狀態 G (於方塊436) ’並將〇之”G&quot;值傳回至其夥伴核心〇 (於方塊 442)。然後’核心1離開其微碼雇並回復至使用者碼(於方塊 616) 0 核〜〇接收其夥伴核心1之數值為〇之混合c_狀態”(於方 塊408)。核心0接著計算其晶片1〇4合成〔狀態”c&quot;(於方塊 4⑴。因為&quot;C”之數值為()’所以核心〇進行到休眠(於方塊316)。 核心1隨後遇到-個具有特定目標c.狀態&quot;χ,,為3之mwait 67 201245948 指令(於方塊302)。核心1回應地將”χ”儲存為其應用電源狀態 ’並執行相關的電源節約動作(於方塊3〇8)。然後,核心工 將其應用C-狀態’Ύ”(為3)傳送至其夥伴,核心〇,(於方塊4〇6), 如以具有”Α”標記值之箭號表示,其將中斷核心〇。 核心0係被喚醒核心0之夥伴核心1所中斷(於方塊5〇2)。 因為核心0以前遇到目標C-狀態為4之MWAIT指令,所以其應 用C-狀態係為4’以1Ύ”表示(在方塊604中)。核心0接收核心1 之探測C-狀態(於方塊434),以&quot;F”表示並具有3之數值。核心〇 接著計算其晶片104合成C-狀態&quot;G&quot;(於方塊436),並將3之&quot;G&quot; 值傳送至其同伴核心2 (於方塊446),其將中斷核心2。 核心2係被其同伴核心〇所中斷(於方塊604),同伴核心〇 喚醒核心2 (於方塊502)。因為核心2之前遇到C-狀態為5之 MWAIT指令,所以其應用C-狀態係為5,以ηΥ&quot;表示(在方塊604 中)。核心2接收核心0之探測〇狀態(於方塊466),以”Κ&quot;表示 並具有3之數值。核心2接著計算一”混合”C-狀態&quot;L&quot;(於方塊 468),並將3之&quot;L&quot;值傳送至其夥伴核心3 (於方塊474),其將中 斷核心3。 核心3係被喚醒核心3之夥伴核心2所中斷(於方塊502)。 因為核心3之前遇到C-狀態為4之MWAIT指令,所以其應用C-狀態係為4,以&quot;Υ&quot;表示(在方塊604中)。核心3接收核心2之 C-狀態(於方塊434),以'Τ&quot;表示並具有3之數值。核心3接著計 算一混合C-狀態&quot;G&quot;(於方塊436),並將3之”G”值傳送至其夥伴 68 201245948 核心2 (於方塊442)。因為”G&quot;現在負責每—個核心之應用「狀 態,所以&quot;G”構成多核心處理器1〇2合成c_狀態。然而,因為核心 3並非疋BSP且從休眠被喚起,所以核心3返回至休眠(於方塊 614)。 核心2接收其夥伴核心3之數值為3之混合c_狀態&quot;M&quot;(於 方塊482)。核心2接著計算一混合C-狀態&quot;N&quot;(於方塊484)。然 後’核心2將3之&quot;N”值傳送至其同伴核心〇 (於方塊486)。再 者’因為&quot;N”負責每一個核心之應用c_狀態,所以&quot;N&quot;亦需要構成 多核心處理器102合成C-狀態。然而,因為核心2並非是BSP且 從休眠被喚起,所以核心2返回至休眠(於方塊614)。 核心0接收其同伴核心2之數值為3之C-狀態”H&quot;(於方塊 448)。核心0接著又計算混合c_狀態”Γ(數值為於方塊452), 並將其傳送至夥伴核心1 (於方塊454)。再者,因為&quot;J&quot;負責每一 個核心之應用C-狀態,所以,亦需要構成多核心處理器102合成 C-狀態。又因為核心〇為BSp,所以其通知晶片組114要求移除 匯流排Π6日守脈之許可(於方塊608)。更明確而言,核心〇通知 晶片組114多核心微處理器ι〇2合成c_狀態係為3。然後,核心〇 進行到休眠(於方塊614)。 核心〗接收其夥伴核心〇之數值為3之C-狀態&quot;B”(於方塊 408)。核心1亦計算一混合c_狀態&quot;C&quot;(於方塊412),其係為3 且其亦構成多核心處理器102合成的C-狀態。因為核心1並非是 BSP,所以核心1進行到休眠(於方塊3】6)。 69 201245948 現在所有H6就像它們在圖7之例子般鱗於休眠狀 態,且事件的進行对亦触於圖7所制之对,亦即,晶片 組114設置STPCLK並喚醒核心1〇6,等等。 明顯地,藉由這個最終同步化電源狀態發現過程完成的期 間,所有的核心已各別計算多核心處理器1〇2合成c_狀態。 於一實施例中,微碼208被設計成無法被中斷。因此,在圖7 之例子中,當每個核心106之微碼208被喚醒以處理其各個 MWAIT指令時’當另-個核心1〇6試圖中斷微碼2〇8日寺它並未被 中斷。取而代之的是’舉例而言,核心〇看到核心i已送出其c_ 狀態,並於方塊408獲得來自核心丨之c-狀態,認為核心丨於方 塊406送出其C-狀態以因應核心〇中斷核心丨。同樣地,核心1 看到核心0已送出其C-狀態,並於方塊408獲得來自核心丨之c_ 狀態,認為核心0於方塊406送出其C_狀態以因應核心丨之中斷 核心0。因為核心0與核心1之每個在計算至少局部合成的c_狀 態時將其他核心106之C-狀態納入考量,所以每個核心1〇6將計 算至少局部合成的C-狀態。因此,舉例而言,核心丨將計算至少 局部合成的C-狀態,無論核心〇是否將其c_狀態送出至核心工以 因應接收來自核心1之一中斷或者因應遇到一 指令,在 這種情況下,兩個C-狀態可同時跨越核心間通訊配線112 (或跨 越晶片間通訊配線118,或跨越封裝體間通訊配線1133,於圖u 之本實施例中)而傳送。因此,有利的是,微碼2〇8可適當地操 作以執行多核心微處理器102之核心丨〇6間的分散式電源管理, 201245948 而不管由各種核心106所接收MWAIT指令之事件之順序為何。In contrast, in the example of Fig. 8, a case is made. In the example of Figure 7, the 'users execute, each executing an MWAIT instruction. The user program effectively executes 62 201245948 lines on the core 106. Each time a different time is executed, _mwait refers to eight executions of an MWAIT instruction and enters eight sleeps. In the other, the core has the characteristics of the microcode 208 of the core 106, and some examples are shown, one for each of the four cores 106. For example, as shown in Fig. 1, the core 2 is the manager of their wafers 1, and the BSP of the microprocessor 102. Each row of Figure 7 is a spear, and the core is the action taken by the multi-core master without the cores. Figure 7 shows the passage of time for each touchdown. First, each core touch encounters a MWAIT instruction specified by various C states (at block 302). In the example of FIG. 7, the & command sent to the core and core 3 is 4 (:·_, and sent to the core and the core, and the 2's instruction specifies the C-state of 5. Each core 106 Responsively performing its associated power saving actions (in the block period), _ the received target c. state (,, χ&quot;) is stored for its applied and most recent valid request c_state" γ„ ^ Next, each The core 106 transmits its application c_state &quot;[gamma&quot; as a probe c_ state to its partner (at block 4〇6), as indicated by the arrow with the value of A, A. Each core 106 It then receives its partner's probe c_state (at block 4〇8) and calculates its wafer 104's synthesized C-state &quot;C&quot; (at block 412). In this example, calculated by each core 106&quot; The C&quot; value is 4. Since Core 1 and Core 3 are not the manager cores, they both go to sleep (at block 324); because Core 0 and Core 2 are the manager cores, they are each other (ie, they The companion) transmits the respective "C&quot; value to the other party (at block 422), such as to have &quot;C 63 201245948 The indicated value is represented by the arrow. Each of them receives its companion's wafer synthesis c_state (at block 424) and calculates the multi-core microprocessor 1〇2 to synthesize the c_state "E" (at block 426). In this example, the value of "quote" calculated by each core and core 2 is 4. Since core 2 is not the BSP core 106, it goes to sleep (at block 324). It is a BSP, so its notification chipset 114 may request permission to remove the sink&quot;iL row 116 clock (at block 322), for example, to set up a PC pclk. More specifically, the core UI notifies the chipset 114_multicore The microprocessor 〇2 synthesizes the C-state to 4' and then the core 〇 proceeds to sleep (at block 324). The predetermined ι/〇 connection address is specified according to the 1/0 read transfer initialized by block 322. The wafer, group 114 can then inhibit the generation of a snoop cycle on the bus bar 116. - When all of the core halls are dormant, the chipset 114 sets STpCLK to wake each core to 106 (at block 502). Post an ST〇p grant message to the chipset 114 (in Block 512) then returns to sleep (at block 514). Core 106 may sleep for an ambiguous amount of time 'without the power, ie, the benefits of both action and sleep, still consume less power than they would normally operate. Finally, a wake-up event occurs. In this example, chipset m de-asserts STPCLK, which wakes up each core, (10) (at block 5〇2). Each core contact should undo its previous power-saving action ( At block 518), and leaving its microcode 208 and reverting to extracting and executing the user code (in Box. See the flow chart shown in reference to Figure 8, which shows a second example of the operational flow of the system (10) of Figure 1 in accordance with the present invention in accordance with Figures 3 through 6. Flowchart of Figure 8 201245948 years old Figure 7 'However' In the example of Figure 8, each user who effectively performs the execution on the core touches the execution of the day-------------------- The heart is in the age-MWAIT^M face record _ only executed.乜3 first encounters a specific target & state &quot;X&quot; is * difficult ^ τ 曰 7 (at block 3 〇 2). Core 3 responsively performs its associated power saving action (at block 308) 'and stores τ as its application c-state, the following more steps - τ not nucleus u 3 then apply its c_ state as a system [The status is passed to the partner 'core 2' (at block 406), as indicated by the arrow with the &quot;A&quot; tag value, which will interrupt core 2. Core 2 is interrupted by its partner core 3 (at block 6〇4). Since Core 2 is still in the -execution state, its own fine c_ state is Q, expressed as &quot;γ&quot; (at block 604 + ). Core 2 receives the detected c_ state of core 3 (in block 4), is represented by &quot;F" and has a value of 4. Core 2 then computes its ^1Q4 synthetic c_state&quot;G&quot; (at block 436), and The "G" value is passed back to its partner core 3 (at block 442). Then 'core 2 leaves its microcode 2〇8 and reverts to the user code (at block 616). Core 3 receives its partner core 2 Thereafter, the c-state is synchronized, ΊΒΠ (at block 408). Core 3 then calculates its wafer 104 to synthesize c_state "c" (at block 412). Since the value of C is zero, core 3 proceeds to sleep (at block 316). Core 2 then encounters a MWAIT instruction with a specific target c_. state χ&quot; 5 (at block 302). Core 2 responsively performs the associated power save action (at block 308) 'and stores &quot;X&quot; as It applies the c-state, and then to Core 2 with &quot;Y" 65 201245948. Core 2 then passes 'Ύ〃 (which is 5) as a probe c_ state to its partner 'core 3' (at block 406) ' as indicated by the arrow with the &quot;A&quot; tag value, which will Interrupt core 3. The core 3 is interrupted by its partner core 2 of wake-up core 3 (at block 5〇2). Since core 3 is preceded by a MWAIT instruction of 4 and the value is still correct, its applied C-state is 4, expressed as &quot;[gamma]&quot; (in block 6〇4). Core 3 receives the probe C-state of core 2 (at block 434), is represented by &quot;F&quot; and has a value of 5. Core 3 then computes its wafer 1〇4 synthesis c_state "G&quot; (at block 436) as the minimum of the detected C-state (ie, 5), and its own application C-state (ie, '5) And pass the 4 "G" value back to its partner core 2 as a mixed state (at block 442). Core 3 then returns to sleep (at block 444). Core 2 receives its partner core 3 mix c_ State (at block 4〇8), denoted by &quot;B„ and have a value of 4, and then calculate its wafer 1〇4 synthetic c_state&quot;c" value (at block 412) as one of the smallest mixed C-states The value (ie, 4), and its own application C-state (ie, 4). Because Core 2 has found that the synthetic C-state of its lowest hierarchical domain is at least 2, as the administrator of the domain Core 2 belongs to a higher-level group of the same attribute, so its (Core 2) then passes its own "c," value (for 4) to its companion core (at block 422), which will break the core. Hey. The core system is interrupted by its peer core 2 (at block 604). Since the core 〇 is in an execution state, its application c_ state is 〇, denoted by &quot;γ&quot; (in block 6.4). The core 〇 receives the probe C-state of core 2 (at block 466), Κ', which means that it has a value of 4. The core then computes its mixed c-state &quot;l&quot; (at block 66 201245948 468) and transmits the L value of 0 to its companion core 2 (at block 474). Core 0 then leaves its microcode view and reverts to the user code (at block (10)). Core 2 receives the mixed state of its peer core (indicated by: and represents the value of G', calculates its own corpse C-state (at block 426), and: it is represented by E. Since the E, 'value is G, the core, 2 goes to sleep (at block 316). Core 0 then encounters a specific target C-state &quot;X&quot; is the MWAIT instruction of 4 (at block 3〇2). The core responsively performs the associated power save action (at block 308) and stores the &quot;X as its application C. state, expressed as &quot;[gamma&quot;. Then, the kernel will, Τ (the system is 4) As a probe C-state is transmitted to its partner, core i, (at block 406) 'is indicated by the arrow with the A&quot; tag value, which will interrupt core i. Core 1 is interrupted by its partner core 0 ( At block 604), since core i is still in the -execution state, its application c_ state is 〇, ύ&quot; indicates (in block 604). Core i receives the core c probe c state (at block 434) , with "F" and with a value of 4. Core! Then calculate its wafer service synthesis c_ state G (at block 436) 'Return the G&quot value to its partner core 〇 (at block 442). Then 'core 1 leaves its microcode hire and replies to the user code (at block 616) 0 core~ 〇 Receive the value of its partner core 1 as a mixed c_state" (at block 408). Core 0 then calculates its wafer 1〇4 synthesis [state" c&quot; (at block 4(1). Because the value of &quot;C" is () 'So the core 〇 goes to sleep (at block 316). Core 1 then encounters a specific target c. state &quot;χ,, is 3 mwait 67 201245948 instruction (at block 302). Core 1 responds Save “χ” as its application power state' and perform the associated power save action (at block 3〇8). Then, the core worker sends its application C-state 'Ύ' (for 3) to its partner, core〇 , (at block 4〇6), as indicated by the arrow with the value of “Α”, it will interrupt the core 〇. Core 0 is interrupted by the buddy core 1 of wake-up core 0 (at block 5〇2). Core 0 previously encountered a target C-state 4 MWAIT instruction, so its application C-state is 4' to 1Ύ "Representation (in block 604). Core 0 receives the probe C-state of core 1 (at block 434), is represented by &quot;F" and has a value of 3. The core 〇 then computes its wafer 104 synthesized C-state&quot;G&quot; (at block 436), and pass the 3&quot;G&quot; value to its companion core 2 (at block 446), which will interrupt core 2. The core 2 is interrupted by its companion core (at block 604) The companion core 〇 wakes up core 2 (at block 502). Since Core 2 previously encountered an MWAIT instruction with a C-state of 5, its applied C-state is 5, denoted by ηΥ&quot; (in block 604). Core 2 receives the probe 〇 state of core 0 (at block 466), denoted by Κ&quot; and has a value of 3. Core 2 then computes a "mixed" C-state &quot;L&quot; (at block 468), and 3 The &quot;L&quot; value is passed to its partner core 3 (at block 474), which will interrupt core 3. The core 3 is interrupted by the buddy core 2 of wake-up core 3 (at block 502). Because core 3 encountered C before - A MWAIT instruction with a status of 4, so its application C-state is 4, indicated by &quot;Υ&quot; (in block 604). Core 3 receives the C-state of core 2 (at block 434) to 'Τ&quot; Represents and has a value of 3. Core 3 then computes a mixed C-state &quot;G&quot; (at block 436) and passes the value of 3 "G" to its partner 68 201245948 core 2 (at block 442). G&quot; is now responsible for each core application "state, so &quot;G" constitutes a multi-core processor 1〇2 synthesis c_ state. However, since core 3 is not a BSP and is evoked from sleep, core 3 returns to sleep (at block 614). Core 2 receives a mixed c_state &quot;M&quot; of its partner core 3 value of 3 (at block 482). Core 2 then computes a mixed C-state &quot;N&quot; (at block 484). Then 'core 2 passes the value of 3&quot;N' to its companion core 〇 (at block 486). Again, 'because 'C' is responsible for the c_ state of each core application, so &quot;N&quot; also needs to be constructed The multi-core processor 102 synthesizes the C-state. However, since core 2 is not a BSP and is evoked from sleep, core 2 returns to sleep (at block 614). Core 0 receives the C-state "H&quot; of its companion core 2 of 3 (at block 448). Core 0 then computes the mixed c_state" (valued at block 452) and passes it to the partner core. 1 (at block 454). Furthermore, since &quot;J&quot; is responsible for the application C-state of each core, it is also necessary to form a multi-core processor 102 to synthesize the C-state. Also because the core port is BSp, it notifies the chipset 114 that it is required to remove the bus bar 6-day suffix (at block 608). More specifically, the core 〇 notification chipset 114 multi-core microprocessor ι〇2 synthesizes the c_state system to three. The core 进行 then proceeds to sleep (at block 614). Core receives a C-state &quot;B&quot; of its partner core value of 3 (at block 408). Core 1 also computes a mixed c_state &quot;C&quot; (at block 412), which is 3 and its It also constitutes the C-state synthesized by the multi-core processor 102. Since the core 1 is not a BSP, the core 1 goes to sleep (at block 3) 6) 69 201245948 Now all H6s are like they are in the example of Figure 7. The sleep state, and the event pair also touches the pair made in Figure 7, that is, the chipset 114 sets STPCLK and wakes up the core 1〇6, etc. Obviously, this final synchronization power state discovery process is completed. During the period, all cores have separately calculated the multi-core processor 1〇2 to synthesize the c_ state. In one embodiment, the microcode 208 is designed to be uninterruptible. Thus, in the example of Figure 7, when each The microcode 208 of the core 106 is woken up to process its various MWAIT instructions. 'When another core 1〇6 tries to interrupt the microcode 2〇8, the temple is not interrupted. Instead, 'for example, the core looks The core i has sent its c_ state and obtained at block 408 from the core The c-state assumes that the core sends its C-state at block 406 to respond to the core interrupt core. Similarly, core 1 sees that core 0 has sent its C-state, and at block 408, it obtains c_ from the core. State, core 0 is considered to have sent its C_ state at block 406 to accommodate core interrupt interrupt core 0. Because each of core 0 and core 1 incorporates the C-state of the other core 106 when calculating the at least partially synthesized c_ state Consider, so each core 1〇6 will calculate at least a partially synthesized C-state. Thus, for example, the core 丨 will calculate at least a partially synthesized C-state, regardless of whether the core 送 sends its c_ state to the core In response to receiving an interrupt from one of the cores 1 or in response to an instruction, in this case, the two C-states can simultaneously span the inter-core communication wiring 112 (or across the inter-chip communication wiring 118, or across inter-package communication) Wiring 1133 is transmitted in the present embodiment of Figure u. Accordingly, advantageously, microcode 2〇8 is suitably operative to perform decentralized power management between cores 6 of multi-core microprocessor 102, 2012 45948 regardless of the sequence of events received by the various cores 106 for the MWAIT instructions.

如可攸刖文觀察到的,廣義來說,當一核心106遇到一 MWAIT 指令時’其首先與其夥伴交換C-狀態資訊,且兩個核心106基於 兩個核心106之C-狀態而為晶片1〇4計算一至少局部合成的c—狀 態,但是例如在雙核心晶片的情況下,其將是相同的數值。管理 者核心106只在計算晶片104合成c·狀態之後,接著與它們的同 伴父換C-狀態資訊,且兩者基於兩個晶片〗04之合成匸_狀態為多 核心微處理器102所計算之合成c_狀態將是相同的數值。依據此 種方法,可得到的好處是,不管核心106接收它們的MWAIT指 令之順序為何,所有核心106計算相同的合成c_狀態。再者,較 佳是,不管核心106接收它們的MWAIT指令之順序為何,它們 以種分配式方式彼此協調,以使多核心微處理器1〇2可作為單 一·實體與晶片組114溝通有關要求參與相對於多核心微處理器i〇2 疋全域性之電源節約動作之許可’例如移除匯流排116時脈。有 利的是’這種分配式C·狀態同步以達成電源f理之實施樣態,係 在不需要使祕於之晶1〇4上但位於如1G6外部之執行電源 管理的專用硬體之情形下被執行,其可能提供下述優點:可調(尺 寸之)能力、可她性、良率雜、電源減少以及域晶片實際尺 寸減少。 吾人可注意到,具有不同數目及配置之私應之其他多核 心微處理器實施例之每個核心1G6可能採用類似的微碼雇,如相 關於圖3至6所說明的。舉例而言,一種在單一晶片1〇4 (例如圖 201245948 】8所不具有兩個核心106之雙核心微處理器i咖實施例之每 個核心106可能採用類似的微媽,如相關於認定每個核心106 只具有^伴且沒有_之圖3至6所綱的。同樣地,一種具 有兩個單核〜片崩(例如圖19所示)之雙核心微處理器膽 實施例之每個核心⑽可能制類似的微碼.如_於認定每 個核心1〇6只具有一同伴且沒有夥伴(或者重新指派核心106為 同伴)之圖3至6所說明的。同樣地,一種具有單核心單一晶片 封裝體104 (例如圖20所示)之雙核心微處理器勘2實施例之每 個核〜106可雜用類似的微碼2〇8,如相關於認定每個核心⑽ 只具有-好友且沒有同伴或夥伴(或者重新指派核心、刚為同们 之圖3至6所說明的。 再者,其他具有核心106之不對稱配置(例如圖21及22所 顯示者)之多核心微處理器實施例之每個核心1〇6,可能採用相對 於圖3至6而改變之類似微碼208,例如以下相關於圖1〇、13以 及17所述。再者’除於此所說明之具有不同數目及配置之核心ι〇6 及/或封裝體(其採用以下相關於圖3至6以及10、13與17所說 明的核心106之微碼208之操作組合)之外的系統實施例等,亦 被本發明所考慮在内並得以依實際應用做等效修飾。 現在參考圖9所顯示之方塊圖’其顯示本發明之電腦系統9〇〇 執行分配在一多核心微處理器902之多重處理核心106間的分散 式電源管理之一替代實施例。系統900類似於圖1之系統,而多 核心微處理器902係類似於圖1之多核心微處理器102 ;然而,多 72 201245948 核心微處理器902為一種八核心微處理器9〇2,其包含組織在單一 微處理器封裝體上之四個雙核心晶片104,以晶片〇、晶片^ 片2以及晶片3表示。晶片〇包含核心〇與核心j,而晶片^包人 核心2與核心3,類似於圖丨;此外,晶片2包含核心4與核心/, 而晶片3包含核心6與核心7。在每個晶片之内,核心為彼此之夥 伴,但每個晶片選擇一核心被標示為該晶片之管理者。 封裝體上之晶片管理者具有多條將每個晶片連接至每隔一個 晶片之晶片間通訊配線。這允許一協調系統之實現,於其中晶片 官理者包含-同僚合作(peer_collab〇rative)同屬性群組之成員; 亦即’每個晶片;I*理者係能夠與封裝體上之任何其他晶片管理者 協調。晶片間通訊配、線118係被設計如下。晶片〇之〇υτ接觸塾、 晶片1之IN i ;妾觸墊、晶片2之m 2接腳以及晶片3之取3接 腳係經由單—配線_接至接腳P1 ;晶片1之QUT接觸塾、晶 片2之IN 1接觸墊、晶片3之四2接觸墊以及晶片〇之劃接 觸墊係經由單-配線輸接至接腳p2 :晶片2之〇υτ接觸塾、 晶片3之IN i #妾觸墊、晶片〇之m 2接觸塾以及晶片〗之取3 接觸塾她由單—g通峨魅接腳P3 ; “ 3之接觸塾、 晶月0之IN 1接觸塾、晶片】之巩2接觸塾以及晶片2之拊3 接觸墊係經甴單一配線網耦接至接腳p4。 §每一個管理者核心1〇6想要與其他晶片1〇4溝通時,將傳 輪其OUT接觸墊1〇8上之資訊,且此資訊係廣播至其他晶片刚, 並經由適當的IN接觸塾108被各自的管理者核心、1〇6所接收。如 73 201245948 可伙圖9觀察到的’有利的是每個晶片i〇4上之接觸墊⑽之數 目與封裝體902上接腳P之數目(亦即,關於分配在於此所說明 之多重核心之間的分散式電源管理之接觸墊與接腳;而,多核心 微處理器102當然可包含用於其他目的之其他接觸塾與接腳,例 如資料、位址以及控制匯流排)係不大於晶片1〇4之數目,其為 一相當小的數目。這在-接㈣有限的及域接腳纽的設計上特 別有利,而這可能是共通的,因為標準晶以封裝體上的接觸塾/ 接腳數目是有規範的,對於微處理器製造商而言f試去遵循這些 “準數值有其轉效益’而在這鋪軒可能已制大部分的接 觸墊/接腳。再者,說明於下之替代實施例,其每個晶片綱上之 接觸塾108之數目係為或可能為小於晶片顺之數目。 _參考圖10所顯示之流程圖,其顯示依據本發明圖9之系 統9〇〇執行分配在八核心微處理器9〇2之多重處理核心舰間的 分= 式電源管理之操作流程。更明確而言,圖川之流程圖顯示圖 二、?)_—C姻微碼2。8之操作,類似於圖4之流程圖, -心方面是相似的’且相同號碼的方塊是類似的。 :流晴所說明之核心觸之_狀態微_ = 有一個⑽說明差異。尤其,晶片刚之每個管理者核心106且 1二鋼伴核心1G6,_伴如⑽。此外,管理者卜 =::::一―同二 °相由封裝體管理者或BSP來仲裁。 201245948 流程開始於圖10中之方塊402,並繼續經由方塊416,如相 關於圖4所說明者。然而,圖10並不包含方塊422、424、426或 428。反之,流程繼續從決定方塊414離開&quot;NO'’分支至決定方塊 1018。 於決定方塊1018,sync_C-狀態微碼208決定所有其同伴是否 已被造訪,亦即,核心106是否已經由方塊1022與1024與每一 個同伴交換C-狀態。如果是,則流程繼續至方塊416 ;否則,流 程繼續至方塊1022。 於方塊1022,sync_C-狀態微碼208藉由程式化圖2之CSR234 在其下一個同伴上產生sync_C-狀態之新實例,用以將&quot;C&quot;值傳送 至其下一個同伴,並用以中斷同伴。在第一同伴的情況中,所送 出之nC&quot;值係於方塊412被計算出;在剩下的同伴的情況中,”c” 值係於方塊1026被計算出。在包含方塊414、1018、1022、1024 以及1026之迴圈中,微碼208追縱已造訪之同伴,以確保其已造 訪它們每一個(除非於決定方塊414被發現是真實的狀況)。 流程繼續至方塊1024。於方塊1024,sync—C-狀態微碼208 程式化CSR234以偵測下一個同伴已傳回一混合C-狀態,並獲得 混合C-狀態,以nD&quot;表示。 流程繼續至方塊1026。於方塊1026,sync_C-狀態微碼208 藉由計算&quot;C”與”D”值之最小值,來計算一最近計算的本地混合c_ 狀態’以nC&quot;表示。流程回復至決定方塊414。 流程繼續從圖1〇中之方塊434,並繼續經由方塊444,如相 75 201245948 關於圖4所說明的。然而,圖10並不包含方塊446、448、452、 454或456。反之,流程繼續從決定方塊438離開&quot;NO”分支至決定 方塊1045。 於決定方塊1045 ’ sync_C-狀態微碼208決定所有其同伴是否 已被造訪’亦即’核心106是否已經由方塊1〇46與1048與每一 個同伴交換C-狀態。如果是,則流程繼續至方塊442 ;否則,流 程繼續至方塊1046。 於方塊1046,sync_C-狀態微碼208藉由程式化CSR234在其 下一個同伴上產生sync_C·狀態常式之新實例,用以將&quot;g”值傳送 至其下一個同伴,並用以中斷同伴。在第一同伴的情況中,所送 出之&quot;G&quot;值係於方塊436所計算;在剩下的同伴的情況中,”G,,值 係於方塊1052被計算出。 流程繼續至方塊1048。於方塊1048,微碼208程式化CSR 234 以偵測下一個同伴已傳回一混合C-狀態至核心1〇6,並獲得混合 C-狀態,以”H&quot;表示。 流程繼續至方塊1052。於方塊1052,sync_C-狀態微碼208 藉由計算與&quot;H”值之最小值來計算一最近計算的本地混合狀 態,以&quot;Gn表示。流程回復至決定方塊438。 流程繼續從圖10中之方塊466,並繼續經由方塊476,如相 關於圖4所說明者。吾人可注意到於方塊474中,同伴(核心1〇6 傳送”L&quot;值給它)係中斷核心1〇6之同伴。此外,流程繼續從圖1〇 中之決定方塊472離開”NO&quot;分支,並繼續經由方塊484,如相關 76 201245948 於圖4所說明者。然而,圖ι〇並不包含方塊486或488。反之, 流程繼續從方塊484至決定方塊1085。 於決定方塊1085,如果&quot;L&quot;值小於2,則流程繼續至方塊474 ; 否則,流程繼續至決定方塊1087。在流程從方塊484繼續至決定 方塊1085之情況中,&quot;L”值係於方塊484被計算出;在流程從方 塊1093繼續至決定方塊1〇85之情況中,&quot;L&quot;值係於方塊1〇93被 s十鼻出。流程繼續至決定方塊1087。 於決定方塊1087 ’ synch_C-狀態微碼208判斷所有同伴是否 已被造訪,亦即,核心106是否已經與每一個同伴交換c_狀態或 從每一個同伴接收C-狀態。在中斷同伴的情況下,c_狀態係經由 方塊466被接收(且將經由方塊474被送出);因此,中斷的同伴 係被視為已經被造訪;剩下的同伴中,c_狀態係經由方塊1〇89與 1091被交換。如果所有同伴已被造訪,則流程繼續至方塊474 ; 否則’流程繼續至方塊1089。 於方塊1089,微碼208藉由程式化CSR234在其下一個同伴 上產生sync一C-狀態常式之一新實例,用以將&quot;L&quot;值傳送至其下一 個同伴,並用以中斷同伴。在第一同伴的情況中,所送出之”L,,值 係於方塊484被計算出;在剩下的同伴的情況中,&quot;L&quot;值係於方塊 1093被計算出。 抓私繼績至方塊1091。於方塊1〇91,微碼208程式化CSR 234 以制下-個同伴已傳回-混合核心lG6,並獲得混合 C-狀態,以&quot;M&quot;表示。 77 201245948 流程繼續至方塊1093。於方塊1093,sync_C-狀態微碼208 藉由計算” L&quot;與” M,’值之最小值來計算本地混合C-狀態之最近計算 的數值,以&quot;L&quot;表示。流程回復至決定方塊1085。 現在參考圖11所顯示之方塊圖,其顯示本發明之電腦系統 1100執行分配在兩個多核心微處理器102之多重處理核心106間 的分散式電源管理之一種替代實施例。系統1100係類似於圖1之 系統100,且兩個多核心微處理器102每個係類似於圖1之多核心 Μ處理器102 ;然而,此系統包含耦接在一起之兩個多核心微處理 态用以提供一種八核心系統11〇〇。因此,圖^之系統”⑻ 亦類似於圖9之系統9〇〇,其包含四個雙核心晶片1〇4,以晶片〇、 日日片1日日片2以及晶片3表示。晶片0包含核心〇與核心i,晶 片1包含核心2與核心3 ,晶片2包含核心4與核心5,而晶片3 包s核心6與核心7。然而,晶片〇與晶片j係包含在第一多核心 微處理器封裝體搬中,而晶片2與晶片3係包含在第二多核心 Μ处里器封j體1〇2中。因此,雖然核心I%係被分配在圖I〗之 本貫施例中之多重多核心微處理器封裝體102之間,然而核心106 共^某些電源管理相關的資源,亦即由晶片組m與晶片組114 斤提ί、之用以窺探或不窺探匯流排丨丨6時脈在處理器匯流排上快 取之:略,因此晶片組114可由預先決定的1/〇連接埠位址,而期 望匯机排116上之單—1/0讀取傳輸。此外,兩個封裝H 102之核 心106潛在地共用一 VRM,而晶片崩之核c可能共用— PLL ’如上所述。有利的是,圖11之系統_之核心、106 (尤其 78 201245948 核心106之微碼208)係被設計成與彼此溝通,用以如於此以及 CNTR.2534中所說明的,藉由使用核心間通訊配線ιι2、晶片間 通訊配線118以及封紐間通訊配線1133 (酬於下),以分散方 式在協調共用電源管理相關的資源之控制。 第-多核心微處理器102之晶片間通訊配線118係如圖j中 之设计。然而,第二多核心微處理器1〇2之接腳係以&quot;、&quot;p6„、 ”P7”以及”P8”表示,且第二多核心微處理胃1〇2之晶片間通訊配線 118係被设s十如下。晶片2之ΙΝ2接觸墊與晶片3之取3接觸墊 經由單一配線網耦接至接腳Ρ5;晶片2之接觸墊與晶片3之 IN 2接觸墊係經由單一配線網耦接至接腳p6 ;晶片2之〇υτ接 觸墊與晶片3之IN 1接觸墊經由單一配線網耦接至接腳p7 ;晶片 3之OUT接觸塾U 2之IN 3接觸墊經由單__配線網耦接至接 腳P8。再者’經由系統U00之主機板之封裝體間通訊配線1133, 第一多核心微處理器102之接腳P1耦接至第二多核心微處理器 102之接腳P7 ’以使晶片〇之out接觸墊、晶片丨之沉丨接觸 墊、晶片之IN 2接觸塾,以及晶片3之IN 3接觸墊係經由單一配 線網而全部耦接在一起;第一多核心微處理器1〇2之接腳p2耦接 至第一多核心Μ處理器1〇2之接腳P8,以使晶片1之out接觸 些、晶片2之IN 1接觸塾、晶片3之IN 2接觸塾,以及晶片〇之 IN 3接觸墊係經由單一配線網而全部耦接在一起;第一多核心微 處理為102之接腳P3係耦接至第二多核心微處理器ι〇2之接腳 P5,以使晶片0之OUT接觸墊、晶片1之別j接觸墊、晶片2 79 201245948 之IN2接觸墊,以及晶片3之IN3接觸墊係經由單一配線網而全 部耦接在一起;第一多核心微處理器102之接腳P4耦接至第二多 核心微處理器1〇2之接腳P6,以使晶片0之OUT接觸墊、晶片1 之IN 1接觸墊、晶片2之取2接觸墊,以及晶片3之取3接觸 塾係經由單一配線網而全部耦接在一起。圖2之CSR 234亦麵接 至封裝體間通訊配線1133,用以啟動微碼208以程式化CSR 234 而經由封裝體間通訊配線1133與其他核心106溝通。因此,每個 晶片104之管理者核心106係被啟動以經由封裝體間通訊配線 Π33與晶片間通訊配線118而與其他晶片ι〇4之管理者核心1〇6 (亦即,其同伴)溝通。當每一個管理者核心1〇6想要與其他晶 片104溝通時,其傳輸在其ουτ接觸墊刚上之資訊,且此資訊 係廣播至其他晶片1〇4並藉由經由適當的取接觸塾⑽被各自管 理者核心106所接收。如可能從圖u觀察到的,有利的是,相對 於每個多核心微處理器搬,每個晶片舰上之接觸塾⑽之數目 與封裝體102上之接腳P之數目不大於晶片綱之數目,其為相 當小的數目。 ’ 再者’請注意對於晶片104之一既定管理者核心1〇6而古, 個晶請之管理者核心刚係為既定管理者核心觸之” 伴核心廳’吾人可從圖11觀察到核心g、核心2、核心4以 及核心6為類似於圖9中配置的同伴,即使在圖9 個晶片崩係包含於單—個八核心微處驾封裝體_中,而: 圖u中’四刪顺係包含於兩個分離的四核心微處理器封裝 201245948 體102中。因此,相關於圖ι〇所說明之微碼2〇8係被設計成如在 圖11之系統1100中操作。此外,所有四個同伴核心1〇6 一起形 成一同儕合作同屬性群組,其中每個同伴核心1〇6係在沒有仲裁 的情況下被啟動,以在無論哪一個同伴核心1〇6被指定為Bsp核 心都可直接與任何其他之同伴核心1〇6進行協調。 吾人更進一步注意到,雖然接腳P在多處理器實施例(例如 圖11與圖12之所示者)中是需要的,但如果必要的話,接腳可 月匕在單一多核心微處理器1〇2實施例中被省略,雖然它們對於除 錯目的是有益的。 現在參考圖12所顯示之方塊圖,其顯示依據本發明電腦系統 1200執行分配在兩個多核心微處理器12〇2之多重處理核心 間的分散式電源管理之一替代實施例。系統12〇〇係類似於圖u 之系統1100,而多核心微處理器12〇2係類似於圖u之多核心微 處理态102。然而,系統1200之八個核心係依據一較深的階層式 協调系統並藉由旁路配線被組織且以實體連接。 每個晶片104只具有三個接觸墊108(OUT、IN1以及IN2), 用以耦合至晶片間通訊配線118 ;每個封裝體12〇2只具有兩個接 腳,在第一多核心微處理器1202上以Ρ^_Ρ2表示,以及在第二 多核心微處理器1202上以P3與P4表示;而連接圖12之兩個多 核心微處理器之晶片間通訊配線118與封裝體間通訊配線 1133具有不同於圖n中對應元件的配置。 在圖12之系統12〇〇中,核心〇與核心4被指定為它們各自 81 201245948 的多核心微處理器1202之”封裝體管理者&quot;或”p管理者”。再者, 除非另有㈣’否貞彳專門用語&quot;好友&quot;於此制以表示彼此通訊之不 同封裝體1202上之管理者核心、i06;因此,於圖丄2之本實施例中, 核心0與核心4係為好友。第一多核心微處理器12〇2之晶片間通 訊配線118係被設計如下。在第一封裝體12〇2之内,晶片〇之 OUT接觸墊與“丨之取丨接觸墊經由單—配線網滅至接腳 P1,曰曰片1之OUT接觸墊與晶片0之IN 1接觸墊經由單一配線 網耦接;而晶片0之IN2接觸墊係耦接至接腳P2。在第二封裝體 1201之内,晶片2之0UT接觸墊與晶片3之取工接觸墊經由單 —配線網耦接至接腳P3 ;晶片3之〇111接觸墊與晶片2之取i 接觸塾經由單-配線網祕;而晶片2之IN2接觸塾餘接至接 腳P4。再者’經由系統uoo之主機板之封裝體間通訊配線1133, 接腳P1係耦接至接腳P4,以使晶片0之〇υτ接觸墊、晶片 IN 1接觸塾’而晶片2之ΓΝ2接觸塾經由單一配線網而全部搞接 在-起;以及接腳P2係麵接至接腳P3 ’以使晶片2^〇υτ接觸 整、晶片3之IN 1接觸墊,以及晶片〇之沉2接觸墊經由單一配 線網而全部搞接在一起。 因此,不像在圖9之系統900中以及在g η之系統膽中, 於其中每個管理者如1G6可與其他管理者如通訊,在圖 12之系統1200中’只有管理者核心〇與管理者核心4可彼此溝通 (亦即,經由於此所說明之旁路配線)。圖12之實施例勝過圖u 之一項優點為相關於每個多核心微處理器12〇2,每個晶片1〇4上 82 201245948 之接觸塾108 S目⑴比晶片刚之數目小,以及每個職體1202 上之接腳p數目⑵比晶片104之數目小,其係為一相當小的數 目此外在核心1〇6之間的C-狀態交換之數目可能更少。於一 實把例中’為了除錯的目的,第—多核錢處理器1迎亦包含搞 接至晶片1之WT接⑽之—第三接腳,而第二多核心微處 理器1202亦包含麵接至晶片3之〇υτ接難1〇8之一第三接腳。 見在多考圖I3所顯示之流程圖,其顯示依據本發明圖η之 系統1200用以執行分配在雙四核心微處理器⑽(八個核心)系 、”先㈣之夕重處^£核心、1〇6間的分散式電源管理操作。更明確而 σ圖I3之机私圖顯示圖3 (與圖〇吵加―c_狀態微碼2〇8之操 作’類似於圖4與1G之流程圖,其在許多方面是相似的,且相同 號碼的方塊是類似的。然而’在圖13之流程圖中所說明之核心觸 之sync_C-狀態微碼施所負責之晶片間通訊配線118及封裝體間 1孔配線1133之配置在圖12之系统12〇〇與圖n之系統膽兩 者之間是不同的,特別是某些管理者核心刚(亦即核心2及核 心4)亚未被設計成與系統·之所有其他管理者核心⑽直接 溝通’但取而代之的是好友(核心〇及核心4)以一種階層式方式 向下傳遞至它㈣嚼(分別為核心2與心6),其再依序向下 傳遞至它們的夥伴如鹰。現在朗這些差異。 流程開始於圖中之方塊4〇2,並繼續前進至方塊424,如 相關於圖4所說明者。然而,圖1〇並未包含方塊伽或似。反 之,流程繼續從方塊424前進至方塊1326。此外,於決定方塊4幻, 83 201245948 如杲被中斷的核心i〇6係為一好友而非一夥伴或同伴,則流程繼 續至方塊1301。 於方塊1326,sync_C-狀態微碼208藉由計算&quot;C”與,,£)&quot;值之最 小值來計算(本地)混合C-狀態之一最近計鼻的數值,以&quot;c&quot;表示。 流程繼續至決定方塊1327。於決定方塊1327,如果於方塊 1326所計算之&quot;C&quot;值小於2或核心106並非是封裝體管理者核心 106,則流程繼續至方塊416 ;否則,流程繼續至方塊1329。 於方塊1329,sync_C-狀態微碼208藉由程式化CSR234在其 好友上產生sync_C-狀態之新實例,用以將於方塊1326所計算之 X”值傳送至其好友並用以中斷好友。這要求好友計算並傳回一混 合C-狀態(這種情形類似上述與圖4相關之說明,可能構成整個 處理器之合成C-狀態)’並要求好友將其提供回到這個核心1〇6。 流程繼續至方塊1331。於方塊1331,sync_C-狀態微碼208 程式化CSR 234以偵測好友已傳回一混合〇狀態至核心1〇6,並 獲得混合C-狀態,以&quot;D”表示。 流程繼續至方塊1333。於方塊1333,sync_C-狀態微碼208 藉由計算&quot;C”與&quot;D&quot;值之最小值來計算一最近計算的混合c_狀態, 以’’C”表示。吾人可注意到’假設D至少為2,於是一旦流程繼續 至方塊1333,就會於方塊1333中,在,’CM值之合成的C-狀態計算 時’考量系統1200中之每個核心1〇6之c_狀態;因此,合成的 c-狀態於此被稱為系統1200合成的c_狀態。流程繼續至方塊416。 流程繼續從圖13中之方塊434,並繼續前進至方塊444與 84 201245948 448 ’如相關於圖4所說明的。然而’圖13並不包含方塊452、454 或456。反之,流程繼續從方塊448至方塊1352。 於方塊1352 ’ sync_C•狀態微碼208藉由計算”G',與&quot;H&quot;值之最 小值來計算一最近計算的本地混合C-狀態,以”G&quot;表示。 流程繼續至決定方塊1353。於決定方塊1353,如果於方塊 1352所計算fG&quot;值小於2或核心106並非是封裝體管理者核心 106,則流程繼續至方塊442 ;否則,流程繼續至方塊1355。 於方塊1355 ’ sync_C-狀態微碼208藉由程式化CSR 234在其 好友上產生sync_C-狀態之新實例,用以將於方塊1352所計算之 nG·'值傳送至其好友,並用以中斷好友。這要求好友計算並傳回一 混合C-狀態到這個核心106。 流程繼續至方塊1357。於方塊1357,sync_C-狀態微碼208 程式化CSR 234以偵測好友已傳回一混合C-狀態至核心1〇6,並 獲得混合C-狀態,以&quot;H&quot;表示。流程繼續至方塊1359。 於方塊1359,sync_C-狀態微碼208藉由計算&quot;G”與&quot;H&quot;值之最 小值來計算一最近計算的本地混合C-狀態,以&quot;G”表示。吾人可注 意到,假設Η至少為2,則一旦流程繼續至方塊1359,就會於方 塊1359中,在&quot;G”值之合成C-狀態計算時考量系統1200中之每個 核心106之C-狀態;因此,合成的C-狀態於此被稱為系統1200 合成C-狀態。流程繼續至方塊442。 流程繼續從圖13中之方塊466’並繼續經由方塊476與482, 如相關於圖4所說明的。然而’圖13並不包含方塊484、486或 85 201245948 488。反之,流程繼續從方塊482至方塊1381。 於方塊1381,sync_C-狀態微碼208藉由計算”L&quot;與&quot;M&quot;值之最 小值來計算一最近計算的本地混合C-狀態,以&quot;L&quot;表示。 流程繼續至決定方塊1383。於決定方塊1383,如果於方塊 1381所計算的”L&quot;值小於2或核心106並非是封裝體管理者核心 106,則流程繼續至方塊474 ;否則,流程繼續至方塊1385。 於方塊1385,sync_C-狀態微碼208藉由程式化CSR 234在其 好友上產生sync_C-狀態之新實例’用以將於方塊1381所計算之 &quot;L&quot;值傳送至其好友’並用以中斷好友。這要求好友計算並傳回一 混合C-狀態到這個核心106。 流程繼續至方塊1387。於方塊1387中,sync_c·狀態微碼2〇8 程式化CSR 234以偵測好友已傳回一混合C_狀態至核心1〇6,並 獲得混合C-狀態,以”M&quot;表示。流程繼續至方塊丨389。 於方塊1389,sync_C-狀態微碼208藉由計算&quot;L”與”M&quot;值之最 小值來计异一最近计异的本地synced C-狀態,以&quot;L,'表示。吾人可 注意到,假設Μ係至少2,則一旦流程繼續至方塊ι389,就會於 方塊1389中’在&quot;L”值之合成C-狀態計算時考量系統12〇〇中之每 個核心106之C-狀態,因此,合成c_狀態於此被稱為系統12〇〇 合成C-狀態。流程繼續至方塊474。如上所述,於決定方塊432 中,如果情的核心106為-好友而非一夥伴或同伴,則流程繼 續至方塊1301。 於方塊1301,核心106被其好友所中斷,所以微碼2〇8程式 86 201245948 化CSR234,用以從其好友獲得好友之合成C-狀態,在圖13中以 nQ&quot;表示。應注意的是,好友不會唤醒synch_C-狀態之實例,如果 其尚未為其封裝體確認合成C-狀態至少為2的話。 ,流程繼續至方塊1303。於方塊1303,sync_C-狀態微喝2〇8 . 計算一本地混合C-狀態(以MR”表示)作為其應用於方塊13〇丨所 接收之C-狀態&quot;Y”值與”Q&quot;值之最小值。 流程繼續至決定方塊1305。於決定方塊1305,如果於方塊 1303所計算之”R”值小於2,則流程繼續至方塊1307 ;否則,流程 繼續至方塊1311。 於方塊1307,為因應來自其好友請求之核心間中斷,微碼2〇8 程式化CSR 234以將於方塊1303所計算之&quot;R&quot;值傳送至其好友。 流程繼續至方塊1309。於方塊1309中,常式將於方塊1303所計 算之&quot;R&quot;值傳回至其呼叫者。流程於方塊1309結束。 於方塊1311,Sync—C-狀態微碼208藉由程式化CSR 236在其 夥伴上產生sync_C-狀態之新實例,用以將於方塊1303所計算之 ”R”值傳送至其夥伴,並用以中斷夥伴。這要求夥伴計算並傳回一 混合C-狀態至核心1〇6。 流程繼續至方塊1313。於方塊1313中’ sync_C-狀態微碼208 程式化CSR 236以偵測夥伴已傳回一混合C-狀態至核心1〇6,並 獲得夥伴混合C-狀態,在圖13中以&quot;S,,表示。 流程繼續至方塊1315。於方塊1315 ’ sync—C-狀態微碼208 藉由計算”R&quot;與”S”值之最小值來計算一最近計算的本地混合(:_狀 87 201245948 態,以&quot;R&quot;表示。 流程繼續至決定方塊1317。於決定方塊1317中,如果於方塊 1315所計算之&quot;R&quot;值小於2,則流程繼續至方塊1307 ;否則,流程 繼續至方塊1319。 於方塊1319,sync_C-狀態微碼208藉由程式化CSR 234在其 同伴上產生sync_C-狀態之新實例,用以將於方塊1315所計算之 &quot;R&quot;值傳送至其同伴,並用以中斷同伴。這要求同伴計算並傳回一 混合C-狀態至這個核心1〇6。 流程繼續至方塊1321。於方塊1321,sync_C-狀態微碼208 程式化CSR 234以偵測同伴已傳回一混合c-狀態至核心106,並 獲得混合C-狀態,以&quot;S”表示。 流程繼續至方塊1323。於方塊1323,sync_C-狀態微碼208 藉由計算”R”與”s”值之最小值來計算一最近計算的本地混合^狀 態’以&quot;R”表示。吾人可注意到,假設s係至少2,於是一旦流程 前進至方塊〗323 ’就會於方塊Π23中,在τ值之計算時考量系 統12〇0中之每個核心1〇6之c_狀態;因此’ τ將構成系統簡 之合成cn流程輯至方塊13〇7。 見在參考圖Μ所顯不之方塊圖,其顯示依據本發明電腦系統 八沒/亍刀配在夕核心微處理器1402之多重處理核心1〇6間的 刀散式電源g理之一替代實施例。系統刚在某些方面類似於圖 吟i=9GQ’因為其包含在單—職體上具有經由晶片間通訊配 7 峨在一起之四個雙核心晶片刚之單一八核心微處理器 88 201245948 1402。然而,系統1400之八個核心係依據一較深的三層之階層式 協調系統而藉由旁路配線被組織且實體連接。 首先’晶片間通訊配線118之配置係與圖9不同,如下所述。 值知注意的’系統1400在某些方面類似於圖12之系統12〇〇,於 其中核心依據一種三層之階層式協調系統被組織在一起且實體連 接。四個晶片104之每一者包含用以耦接至晶片間通訊配線118 之二個接觸塾108 ’亦即OUT接觸墊、in 1接觸塾以及in 2接觸 塾。圖14之多核心微處理器1402包含以&quot;ρι&quot;、’’ρ2”、&quot;ρ3&quot;以及&quot;p4&quot; 表不之四個接腳。圖14之多核心微處理器14〇2之晶片間通訊配 線118之配置如下。晶片〇之out接觸墊、,晶片1之取丨接觸墊, 以及晶片2之IN 2接觸墊經由耦接至接_P1之單一配線網而全 部耦接在一起;晶片丨之0UT接觸墊與晶片〇之取丨接觸墊經 由耦接至接腳P2之單一配線網而耦接在一起;晶片2之〇υτ接 觸墊'晶片3之ΙΝ1接觸墊以及晶片0之ΙΝ2接觸墊係經由耦接 至接腳Ρ3之單一配線網而全部耦接在一起;晶片3之〇υτ接觸 墊與晶片2之IN〗接觸墊經由耦接至接腳柯之單一配線網而耦 接在一起。 圖14之核心106係被設計成用以依據圖13之說明操作,對 核心〇與核心4而言,即使它們位於相同的封裝體14〇2 (與上述 相關於圖12所規定的專門用語,,好友”之意思相反)仍被視為好 友’而這兩個好友於圖14之實施例中經由晶片間通訊配線118而 非經由圖12之封裝體間通訊配線1133做彼此溝通,。於此應注 89 201245948 意的疋,除了處理器之實體模型以外,核心係依據一種較深的且 具有三個層次之域的階層式協調系統而設計。 現在參考圖15所顯示之方塊圖,其顯示依據本發明電腦系統 1500執行分配在一種多核心微處理器15〇2之多重處理核心】% 間的分散式電源管理之一替代實施例。系統丨5〇〇在某些方面類似 於圖14之系統1400,因為其包含單一個八核心微處理器15〇2, 其具有以核心0至核心7表示之八個核心1〇6。然而,多核心微處 理器1502包含經由晶片μ通訊配線118祕在一起之兩個四核心 晶片1504。兩個晶片1504之每一者包含用以搞接至晶片間通訊配 線118之兩個接觸墊108,亦即一 〇1;1接觸墊以及沉卜取2和 取3接觸墊。多核心微處理器15〇2包含以”卩]”與|,1&gt;2”表示之兩個 接腳。多核心微處理器1502之晶月間通訊配線118之配置如下。 Β曰片0之OUT接觸墊與晶片丨接觸墊經由耦接至接腳 之單-配線網而耗接在—而晶片丨之㈤了接觸墊與晶片〇之 IN 1接觸塾經_接至接腳P1之單—配線網而搞接在一起。此 外’四核心晶片1504之核心間通訊_ 112將每個核心1〇6輕接 片!5〇4之其他核心刚,用以促進分配在一種多核心微處理 器1502之多重處難心觸間的分散式電源管理。 圖15之核心106被設計成用以依據圖13之說明操作,並透 =下敘述獲得理解。魏,每個晶片本身所具有之核心係依據 又層之層式協調純,並藉由旁路配線*被組織且實體連 接日日片〇具有兩個夥伴同屬性群組(核心〇與核心工;核心2 201245948 與核心3)以及一個同伴同屬性群組(核心〇與核心2)。同樣地, 晶片1具有兩個夥伴同屬性群組(核心4與核心5 ;核心6與核心 7)以及一個同伴同屬性群組(核心4與核心6)。於此可注意到同 : 伴核〜縱使匕們位於相同的晶片上(與上述相關於圖1所規定的 . 之同伴之特性兄述相反)仍被視為同伴。此外,同伴於圖15之 貝細例中經由核心間通訊配線U 2而非經由圖12之晶片間通訊配 線118進行彼此之通訊。 其次,封裝體本身界定一第三階層式範圍及對應的好友同屬 性群組。換g之’核心〇及核心4縱使它們位於相同的封裝體1502 上(與上述相關於圖12所規定的專門用語,'好友”之意思相反)仍 被視為好友。又’好友於圖15之實施例中經由晶片間通訊配線118 而非經由圖12之封裝體間通訊配線1133進行彼此之通訊。 現在參考圖16所顯示之方塊圖,其顯示依據本發明之電腦系 統1600執行分配在一種多核心微處理器16〇2之多重處理核心1〇6 間的分散式電源管理之一替代實施例。系統1600在某些方面類似 於圖15之系統15〇〇,因為其包含單一個八核心微處理器16〇2, 其具有以核心0至核心7所表示之八個核心106。然而,每個晶片 104包含多條在每一個核心1〇6之間的核心間通訊配線η),用以 允許母個核心106與晶片104中之其他核心106進行通訊。因此, 為說明圖16每個核心1〇6之微碼208之操作:(1)核心〇、核心 1、核心2以及核心3被視為夥伴,而核心4、核心5、核心6以 及核心7被視為夥伴;(2)核心〇及核心4被視為同伴。因此, 91 201245948 系統⑽〇係依據由夥伴與同伴同屬性群組所組成之一雙層階層式 協调系統域由旁驰馳組織且實體連接。此外,存在於晶片 之每-個核心之_核心間通訊配線112,可促進供;所界定之 夥伴同屬性群組用之一同儕合作協調模型。雖然能夠依據-同儕 合作協調翻操作,但圖17㈣—種健心之間的分散式電源管 理使用之管理者合作協調模型。 現在參考圖17所顯示之流程圖’其顯示依據本發明圖16之 系統1600用以執行分配在多核心微處理器1〇2之多重處理核心 應間的分散式電源管理之操作。更明確而言,圖17之流程圖顯 不圖3 (與圖6)之sync_〇狀‘態微碼2〇8之操作,類似於圖*之 流程圖’其在許多方面是她的’且㈣號碼的方塊是類似的。 然而,在圖17之流程圖中所說明之核心1〇6之微碼2〇8負責存在 八個核心106之情形而非於圖丨之實施例中之四個核心1〇6,具體 地說四個核心1〇6係兩個雙晶片1〇4之方式而存在,而現在說明 其差異。尤其,一晶片1〇4之每個管理者核心1〇6具有三個夥伴 核心106而非一個夥伴核心1〇6。 流程開始於圖17中之方塊402,並繼續經由決定方塊404且 離開決定方塊404之1'NO”分支至決定方塊432 ,如相關於圖4所 說明者。然而,圖17並不包含方塊406至418。反之,流程繼續 從決定方塊404離開&quot;YES”分支至方塊1706。 於方塊1706,sync_C-狀態微碼208藉由程式化圖2之CSR 236 以在一夥伴上產生sync_C-狀態常式之新實例,用以將於方塊402 92 201245948 所接收或於方塊Π12所產生(討論於下)之,ά&quot;值傳送至其下一 個夥伴,並用以中斷夥伴。這要求夥伴計算並傳回一混合C_狀態 至核心106。在包含方塊1706、1708、1712、414以及1717之迴 圈中,微碼208掌握其已造訪之夥伴的記錄,用以確保其造訪它 們每一個(除非於決定方塊414被發現是真實的狀況)。流程繼續 至方塊1708。 於方塊1708 ’ sync—C-狀態微碼208程式化CSR 236以债測下 一個夥伴已傳回一混合C-狀態至核心106,並獲得夥伴之混合C_ 狀態,在圖17以&quot;Βπ表示。流程繼續至方塊1712。 於方塊1712,sync_C-狀態微碼208藉由計算&quot;Α&quot;及”Β”值之最 小值來計算一最近計算的本地混合C-狀態,其係以&quot;A”表示。流程 繼續至決定方塊1714。 於決定方塊1714,如果於方塊1712所計算之”Απ值小於2或 核心106並非是管理者核心106 ’則流程繼續至方塊1716 ;否則, 流程繼續至決定方塊1717。 於方塊1716 ’ sync_C-狀態微碼208將於方塊1712所計算之 &quot;A&quot;值傳回至其呼叫者。流程於方塊1716結束。 ;{^決定方塊1717,sync_C-狀態微碼208決定所有其夥伴是否 已被造訪,亦即核心106是否已經由方塊Π06與1708而與每一 個其夥伴交換混合C-狀態。如果是,則流程繼續至方塊1719 ;否 則,流程回復至方塊Π06。 於方塊1719,sync_C-狀態微碼208決定於方塊1712所計算 93 201245948 之&quot;A&quot;值成為其晶片合成c_狀態,其係以”c”表示,且流程繼續至 方塊422並繼續進行至方塊428,如上相關於圖4所述。 流程繼續從決定方塊438之&quot;NO”分支至決定方塊1739。 於決定方塊1739,sync_C-狀態微碼208決定所有其夥伴是否 已被造訪,亦即’核心106是否已經經由方塊1741及1743 (討論 於下)而與每一個其夥伴交換一混合C-狀態。如果是,流程繼續 至方塊446,並繼續進行經由至方塊456,如上相關於圖4所述; 否則,流程繼續至方塊1741。 於方塊1741 ’ sync_C-狀態微碼208藉由程式化圖2之CSR 236 在其下一個夥伴上產生Sync_C-狀態常式之新實例,用以將於方塊 436或於方塊Π45 (討論於下)所計算之,’G”值傳送至其下一個夥 伴,並用以中斷夥伴。這要求夥伴計算並傳回一混合〇狀態至核 心106。在包含方塊438、1739'174:1、1743以及1745之迴圈中, 微碼208掌握其已造訪之夥伴的記錄,用以確保其造訪它們每一 個(除非於決定方塊438被發現是真實的狀況)。流程繼續至方塊 1743。 於方塊1743,sync_C-狀態微碼208程式化CSR236以偵測下 一個夥伴已傳回一混合C-狀態至核心1〇6,並獲得夥伴之混合&amp; 狀態,在圖17中以”F”表示。流程繼續至方塊1745。 於方塊1745,synC_C-狀態微碼2〇8藉由計算,,F,及%”值之最 小值來計算-最近計算的柄混合c狀態,其係心&quot;表示。流程 回復至決定方塊438。 94 201245948 圖17並不包含方塊478至方塊488。取而代之的是,流程繼 續離開決定方塊472之”ΝΟπ分支至決定方塊1777。 於決定方塊1777 ’ sync_C-狀態微碼208決定所有其夥伴是否 已被造訪’亦即,核心106是否已經經由方塊1778及1782 (討論 於下)而與每一個夥伴交換一混合C-狀態。如果是,流程繼續至 方塊474並繼續進行經由至方塊476,如上相關於圖4所述;否則, 流程繼續至方塊1778。 於方塊1778,sync_C-狀態微碼208藉由程式化圖2之CSR 236 在下一個夥伴上產生sync一C-狀態常式之新實例,用以將於方塊 468或於方塊1784 (討論於下)所計算之值傳送至其下一個夥 伴,並用以中斷夥伴。這要求夥伴計算並傳回一混合心狀態至核 心106。在包含方塊472、1777、1778、1782以及1784之迴圈中, 微碼208掌握其已造訪之夥伴的記錄,用以確保其造訪它們每一 個(除非於決定方塊472被發現是真實的狀況)。流程繼續至方塊 1782。 於方塊1782,sync—C-狀態微碼208程式化CSR 236以偵測下 個澤夕伴已傳回一混合C-狀態至核心1〇6,並獲得夥伴之遇合匸 狀態’在圖17以”M&quot;表示。流程繼續至方塊1784 ^ 於方塊1784,Sync_C-狀態微碼2〇8藉由計算,,L”及”M&quot;值之最 小值來計算-最近計算的本地混合「織,其細&quot;L&quot;表示。流程 回復至決定方塊472。 如較早所陳述的’如應用至圖Μ之圖顯示一管理者仲裁 95 201245948 的階層式協調模型至一微處理器1602之應用,其旁路配線促進對 於至少某些之核心同屬性群組之一同儕合作協調模型。這種組合 提供各種優點。就另一方面而言,微處理器1602之實體架構提供 在界定與再界定(defining and redefining)階層式域以及指定與再 指定(designating and redesignating)域管理者上的彈性,如與申 請案序號61/426,470之段落相關所說明的,前述申請案之申請曰 為2010年12月22曰,名稱為”在一多核心處理器中之動態及選 擇性核心禁能(Dynamic and Selective Core Disablement),,,及其 同時申請的非臨時申請案(CNTR.2536),其係於此併入作參考。 此外,在提供這種核心間協調彈性之微處理器上,可依據預定情 況或配置設定而在-似上_賴式巾提供可以行動之一階層 式協調系統。舉例而言’―階層式協_統可使用所指定的管理 者核心而優先地·_之管理者賴觀,但是在某些預定或 偵測條件之下’可將—不同的核心標示為供該同屬性群組用之一 暫時s理者、或者切換成供—既定同屬性群組使用之—同傅合作 ^調模型。可能的模型切換條件之例子包含所指找管理相心 二反=禁能、所減管理者如基於它們的㈣或緊急性而處 只有源狀態(例如从㈣), 下tt 複合電源狀態時係一 兄之電源狀態複合電源狀態發現過程可在實施受限制的 96 201245948 中每個核心之應用電源狀 電源狀態之前進行操作,以負責處理器 態0 . 然而如較早錢贿情敘述者 配置與等級亦屬本發明所考量的。此外,本發= 特定域層次之受限制電 丨考紅3夕重 此編、 電源狀態之非常進階的設定,於 域中。乂級之受限制電源狀態將應用於處理器之漸進較高的 :例而言’在具有多重多核心晶片之—多核心多處理器中, Γγ Γ核心間被共用之-pll,但由微處理器之 一 ^所共用之單—職,譬如在C舰朋中所說明的, 一·文限制域的電源狀態階層可被定義而包含尤其適合於一核心内 4(且非外部被共用)資源之第一組電源狀態、尤其適合於由晶 :上之核〜所共用’而不能被晶片外部所共用之資源(例如虹 ’、取)之下㈣源狀·⑮、且制適合於整個微處理器之又另 ―組電源狀態(例如電壓健匯流排時脈)。 因此’於-貫施例中,每個域具有其本身的複合電源狀態。 對母倾而言,存在有單—適當的受認證如(例如該域之 κ理者)’其具有實施或啟動—受限制電源狀態之實施的授權,如 ^一對應_職的電源狀態階層銳縣統所界定者,係魏在 人衝名之域上。适種進階配置尤其適合包含譬如cntr·⑸4所顯 示之A施例,於其中子群組之處理器核心共用快取、pLL等等。 本發明亦考慮數個實施例’於其中—分散式同步過程係利用 97 201245948 -種不需要喚醒所有核心的方式來不僅管理—受限制電源狀態之 實現,而且選擇性地實施一受限制電源狀態之一唤起狀態或撤 銷。這種進階實施例與類似圖5之系統形成對比,於其中一晶片 組STPCLK之解除設置可完全喚醒所有核心。 現在參考圖23,其描繪sync_state邏輯細之—個實施例, 以顯示譬如在微碼中進行有條件地實施與選擇性触銷—限制操 作狀態兩者之情形。如下所述,syne—state邏輯2支持—種域_ 區別(d〇main-differentiated)的電源狀態階層協調系統之實現。有 利的是’ sync一state邏輯2300的可計量性相當好,因為其可被延 伸至實際上紐何魏度(d_in_levd &amp;帅)之階 層式協調系統。又’邏輯2300不僅可用對微處理器絲看來是全 域的方式、而且對在微處理器之_特定群組核心(例如,只對 一晶片之核心,如以下關於方塊2342所說明的)以更多限制的方 式被實施。此外’ sync_state邏輯2300可利用不同且具相關定義 的階層式協調系統、應用的操作狀態以及域層次臨界值,而獨立 應用至不同操作狀態之群組中。 在類似於sync一C-狀態微碼208之較早顯示的實施例之實施樣 態中,sync—state邏輯2300可能在本地或外部地被產生,並在傳 送一探測狀態值之一常式中執行。例如’一電源狀態管理微碼 常式可接收由一 MWAIT指令所傳送、或如與CNTR 2534相關所 纣淪的一目標操作狀態,利用供核心之本地核心邏輯產生一目標 操作狀態(例如一要求的VID或頻率比率值)。接著,電源狀熊管 98 201245948 理微碼常式可將目標值儲存為核心的目標操作狀態〇TARGET,然後 藉由將0TARGET傳送成為探測狀態值〃p,,來喚醒sync—伽e邏輯 2300或者’在*頁似於先前實施例所討論的實施樣態,辦。—咖❹ 邏輯,230()可能藉由—传常式響應—外部產生㈣步需求被喚 醒。為簡化之便,這種實例被稱為sync—state邏輯23〇〇之外部喚 醒實例。 在更進-步繼續前進以前,吾人應注意到,再為簡化之便, 圖23顯示以一種適合管理操作狀態之形式的啊一她邏輯 23〇〇 ’操作狀祕在要求漸進地更大程度之核心間協調予漸進地 較高需求驗態(舉_言,如於c.lg)的方式被界定或 被安排。吾人將理解具麵常知識者可利用賴地躺邏輯來修 改sync—state邏輯2300以支援一操作狀態階層(例如VID或頻率 比率狀態)’於其中操作狀態係朝相反方向被界定。或者,因傳統 或選擇而朝-個方向被界定之操作狀態,可根據定義而—般的&quot;安 排在相反方向中。因此’ sync—她邏輯·可只藉由重新安排 它們’並施加相反指示的基準值(例如負的原始值)而被應用至 操作狀態(例如需求的VID與頻率比率狀態)。 吾人亦注意姻23顯示syne—state邏輯2動是制為一嚴格 地階層式協⑽、統而設計,於其巾所有包含_雜群組依據一 管理者仲裁協賴麵作。如關於先前_示的可某些程度協調 對等合作之同步邏輯實施例所證明的,本發明不應被理解成受限 於嚴格地階層式侧祕(除麵翻姑㈣程度)。 99 201245948 流程於方塊2302開始,於此sync_state邏輯2300接收探測狀 態值'T&quot;。流程繼續至方塊2304,於此sync_state邏輯2300亦獲得 本地核心的目標操作狀態〇TARGET、可由本地核心實行之最大的操 作狀態〇MAX、由本地核心所控制之最大的域層次DmaX,以及並 未涉及或干涉一特定域D之外部資源之最大可利用的域-特定狀態 MD。吾人應注意到,sync_state邏輯2300獲得或計算方塊2304 之值的方式或年表(chronology)並不重要。在流程圖中之方塊2304 僅用來介紹適用於sync_state邏輯2300之重要變數。 他封裝體上之其他核心協調,如果有的話),等等。 的最大可應用的域-特定狀態Md係為:m = 在一個例示的但非限制的實施例中,域層次D係被界定如 下:單一核心為〇 ;多核心晶片為];多晶片封裝體為2,等等。〇 與1之操作狀態係不受限制的(意指一核心可實施它們而無須與 其他核心協調),2與3之操作狀態係相關於相同晶片之核心而受 限(思指它們可能在-晶&gt;;之核心、上被實施以與其他晶片上之核 〜協為,但不而要與在其他晶#上之其他核心協調),而4與$之 操作狀態係相關於相_裝體之核心而受限(意指它們可能在與 該封裝體之核d周之後而在該缝體上被實施,但不需要與其As can be observed in the text, broadly speaking, when a core 106 encounters a MWAIT instruction, it first exchanges C-state information with its partner, and the two cores 106 are based on the C-states of the two cores 106. Wafer 111 calculates an at least partially synthesized c-state, but for example in the case of a dual core wafer, it will be the same value. The manager core 106 only converts the c-states after the compute wafer 104 is synthesized, and then exchanges C-state information with their companions, and the two are calculated based on the composite 匸_ state of the two wafers 04 for the multi-core microprocessor 102. The composite c_ state will be the same value. According to this approach, the benefit is that all cores 106 calculate the same composite c_ state regardless of the order in which the core 106 receives their MWAIT instructions. Moreover, preferably, regardless of the order in which the cores 106 receive their MWAIT instructions, they are coordinated with each other in a distributed manner such that the multi-core microprocessor 110 can communicate with the chipset 114 as a single entity. Participate in the licensing of power-saving actions relative to the multi-core microprocessor i's globally, such as removing the bus 116 clock. It is advantageous that 'this type of distributed C·state synchronization is used to achieve the implementation of the power supply, in the case where there is no need to make the secret crystal 1〇4 but the dedicated hardware for performing power management outside the 1G6. It is performed, which may provide the following advantages: adjustable (size) capability, arbitrability, yield miscellaneous, power reduction, and actual size reduction of the domain wafer. It may be noted that each core 1G6 of other multi-core microprocessor embodiments having different numbers and configurations may employ similar microcode employment, as explained with respect to Figures 3-6. For example, a core of a dual-core microprocessor that does not have two cores 106 in a single chip 1 (eg, Figure 201245948) 8 may employ a similar micro-mamm, as relevant Each core 106 has only a companion and no _ of Figures 3 through 6. Similarly, a dual core microprocessor embodiment with two single core ~ chip collapse (such as shown in Figure 19) The core (10) may make similar microcode. As illustrated in Figures 3 through 6 of each core, which has one companion and no partner (or reassign core 106 as a companion). Similarly, a dual core microprocessor having a single core single chip package 104 (e.g., as shown in FIG. 20) may have a similar microcode 2〇8 for each core of the embodiment. Each core (10) has only - friends and no companions or partners (or reassigns the core, as illustrated by Figures 3 through 6 of the same. Again, other asymmetric configurations with core 106 (eg, Figures 21 and 22) Each of the cores 1-6 of the plurality of core microprocessor embodiments of the display may employ a similar microcode 208 that is altered relative to Figures 3 through 6, for example as described below with respect to Figures 1, 以及, 13 and 17. The operation of the core 〇6 and/or the package having different numbers and configurations as described herein (which employs the following microcode 208 of the core 106 described with respect to FIGS. 3 to 6 and 10, 13 and 17) The system embodiment and the like other than the combination are also considered by the present invention and can be equivalently modified according to the actual application. Referring now to the block diagram shown in Fig. 9, which shows the computer system of the present invention Multiple processing in a multi-core microprocessor 902 An alternative embodiment of decentralized power management between the cores 106. The system 900 is similar to the system of Figure 1, and the multi-core microprocessor 902 is similar to the multi-core microprocessor 102 of Figure 1; however, the multiple 72 201245948 core micro Processor 902 is an eight core microprocessor 9A2 that includes four dual core wafers 104 organized on a single microprocessor package, represented by a wafer cassette, a wafer 2, and a wafer 3. The wafer cassette contains a core 〇 and core j, and the chip core 2 and core 3, similar to the figure; in addition, the wafer 2 contains the core 4 and the core /, and the wafer 3 contains the core 6 and the core 7. Within each chip, the core For each other's partners, but each chip selects a core labeled as the manager of the wafer. The wafer manager on the package has multiple inter-wafer communication wirings that connect each wafer to every other wafer. This allows one The implementation of the coordination system, in which the wafer official includes a peer-collab〇rative member of the same attribute group; that is, 'each wafer; the I* manager is able to interact with any other wafer manager on the package. Association The inter-wafer communication distribution line 118 is designed as follows: the 〇τ contact 〇 of the wafer IN, the IN i of the wafer 1 ; the 妾 pad, the m 2 pin of the wafer 2, and the 3 pin of the wafer 3 are via a single- Wiring _ is connected to pin P1; QIT contact 晶片 of wafer 1, IN 1 contact pad of wafer 2, 4 2 contact pads of wafer 3, and pad contact pads of wafer 系 are connected to pin p2 via single-wiring: wafer 2 〇υ 塾 contact 塾, wafer 3 IN i #妾 contact pad, wafer 〇 m 2 contact 塾 and wafer 〗 〖 3 contact 塾 she from a single - g pass 峨 charm pin P3; "3 contact 塾, The contact of the IN 1 contact 塾, the wafer] and the 2 3 of the wafer 2 are coupled to the pin p4 via a single wiring network. § When each manager core 1〇6 wants to communicate with other chips1〇4, it will transmit the information of its OUT contact pad 1〇8, and this information will be broadcast to other wafers just after the appropriate IN contact.塾108 is received by the respective manager core, 〇6. As seen in Figure 73 201245948, it can be seen that 'the advantage is the number of contact pads (10) on each wafer i〇4 and the number of pins P on the package 902 (i.e., regarding the multiple cores allocated as described herein). The contact pads and pins of the decentralized power management; however, the multi-core microprocessor 102 may of course contain other contacts and pins for other purposes, such as data, address and control busbars, which are not greater than The number of wafers 1〇4, which is a relatively small number. This is particularly advantageous in the design of the finite (4) finite and domain pin, which may be common, since the number of contact 塾/pins on the package is standard, for microprocessor manufacturers. In other words, f tries to follow these "quasi-values have their own benefits" and most of the contact pads/pins may have been made in this shop. In addition, the following alternative embodiments, each of which is on the wafer The number of contact ports 108 is or may be less than the number of wafers. _ Referring to the flow chart shown in FIG. 10, which shows the system 9 of FIG. 9 in accordance with the present invention, the distribution is performed in the eight core microprocessor 9〇2. Multi-processing core operation of the sub-type power management. More specifically, the flow chart of Tuchuan shows the operation of Figure 2, ?)_-C marriage microcode 2. 8. Similar to the flowchart of Figure 4. , - the heart is similar 'and the same number of squares are similar. : The core of the description shows the state of the _ state micro _ = there is a (10) to explain the difference. In particular, the wafer just each manager core 106 and 1 Two steel with core 1G6, _ with (10). In addition, the manager Bu =:::: one The same two phase is arbitrated by the package manager or BSP. 201245948 The process begins at block 402 in Figure 10 and continues through block 416 as explained in relation to Figure 4. However, Figure 10 does not include block 422. 424, 426, or 428. Conversely, flow continues from decision block 414 to &quot;NO'' branch to decision block 1018. At decision block 1018, sync_C-state microcode 208 determines whether all of its peers have been visited, i.e., Whether the core 106 has exchanged C-states with each of the peers by blocks 1022 and 1024. If so, the flow continues to block 416; otherwise, the flow continues to block 1022. At block 1022, the sync_c-state microcode 208 is stylized. Figure 2, CSR234, generates a new instance of the sync_C-state on its next companion to pass the &quot;C&quot; value to its next companion and to interrupt the companion. In the case of the first companion, the nC&quote sent The value is calculated at block 412; in the case of the remaining companions, the "c" value is calculated at block 1026. In the loop containing blocks 414, 1018, 1022, 1024, and 1026, the microcode 208 chase The companions have been visited to ensure that they have visited each of them (unless the decision block 414 was found to be a real condition). Flow continues to block 1024. At block 1024, the sync-C-state microcode 208 stylizes the CSR 234. Detecting that the next companion has passed back a mixed C-state and obtaining a mixed C-state, indicated by nD&quot;. Flow continues to block 1026. At block 1026, the sync_c-state microcode 208 is calculated by &quot;C&quot; The minimum value of the "D" value is used to calculate a recently calculated local mixed c_state 'in nC&quot;. The process returns to decision block 414. Flow continues from block 434 in Figure 1 and continues through block 444 as illustrated in Figure 75 201245948 with respect to Figure 4. However, FIG. 10 does not include blocks 446, 448, 452, 454, or 456. Conversely, the flow continues from decision block 438 to the &quot;NO&quot; branch to decision block 1045. At decision block 1045, the 'sync_C-state microcode 208 determines whether all of its companions have been visited', i.e., whether the core 106 has been blocked by block 1 46 and 1048 exchange the C-state with each of the peers. If so, the flow continues to block 442; otherwise, the flow continues to block 1046. At block 1046, the sync_c-state microcode 208 is programmed by the CSR 234 in its next companion. A new instance of the sync_C state routine is generated to pass the &quot;g" value to its next companion and to interrupt the companion. In the case of the first companion, the &quot;G&quot; value sent is calculated at block 436; in the case of the remaining companions, "G," the value is calculated at block 1052. Flow continues to block 1048. At block 1048, the microcode 208 programs the CSR 234 to detect that the next companion has returned a mixed C-state to the core 1〇6 and obtains a mixed C-state, indicated by "H&quot;. Flow continues to block 1052. At block 1052, the sync_c-state microcode 208 calculates a most recently calculated local blending state by calculating the minimum value of the &quot;H&quot; value, denoted by &quot;Gn. The flow returns to decision block 438. The flow continues from Figure 10 Block 466, and continues via block 476, as explained in relation to Figure 4. We can note that in block 474, the companion (core 1〇6 transmits the L&quot; value to it) is the interrupt core 1〇6 companion. In addition, the flow continues from the decision block 472 in Figure 1 离开 to the "NO&quot; branch and continues through block 484, as described in relation to 76 201245948 in Figure 4. However, Figure ι does not include blocks 486 or 488. The flow continues from block 484 to decision block 1085. At decision block 1085, if the &quot;L&quot; value is less than 2, the flow continues to block 474; otherwise, the flow continues to decision block 1087. The flow continues from block 484 to the decision block. In the case of 1085, the &quot;L&quot; value is calculated at block 484; in the case where the flow continues from block 1093 to decision block 〇85, the &quot;L&quot; value is spurred out at block 1〇93. . Flow continues to decision block 1087. The decision block 1087&apos; synch_C-state microcode 208 determines if all of the peers have been visited, i.e., whether the core 106 has exchanged the c_state with each of the peers or received the C-state from each of the peers. In the event of a break with the peer, the c_state is received via block 466 (and will be sent via block 474); therefore, the interrupted companion is considered to have been visited; in the remaining companions, the c_state is via Blocks 1〇89 and 1091 are exchanged. If all of the companions have been visited, the flow continues to block 474; otherwise the process continues to block 1089. At block 1089, the microcode 208 generates a new instance of the sync-C-state routine on the next companion by the stylized CSR 234 for transmitting the &quot;L&quot; value to its next companion and for interrupting the companion . In the case of the first companion, the "L," value sent is calculated at block 484; in the case of the remaining companions, the &quot;L&quot; value is calculated at block 1093. Go to block 1091. At block 1〇91, the microcode 208 is programmed to CSR 234 to make a companion that has been passed back to the hybrid core lG6 and obtain a mixed C-state, indicated by &quot;M&quot;. 77 201245948 Flow continues Block 1093. At block 1093, the sync_c-state microcode 208 calculates the most recently calculated value of the local mixed C-state by calculating the minimum of the "L&quot; and "M," values, expressed as &quot;L&quot;. To decision block 1085. Referring now to the block diagram shown in FIG. 11, an alternate embodiment of the distributed power management of the computer system 1100 of the present invention for distributing between the multiple processing cores 106 of two multi-core microprocessors 102 is shown. System 1100 is similar to system 100 of FIG. 1, and two multi-core microprocessors 102 are each similar to multi-core processor 102 of FIG. 1; however, this system includes two multi-cores coupled together Microprocessing state to provide a The eight-core system is 11〇〇. Therefore, the system of FIG. 2(8) is also similar to the system 9〇〇 of FIG. 9 , which includes four dual-core chips 1〇4, with a chip 〇, a Japanese film, a Japanese film 2, and Wafer 3 is indicated. Wafer 0 contains core 〇 and core i, wafer 1 contains core 2 and core 3, wafer 2 contains core 4 and core 5, and wafer 3 includes s core 6 and core 7. However, the wafer cassette and the chip j are included in the first multi-core microprocessor package, and the wafer 2 and the wafer 3 are included in the second multi-core unit. Therefore, although the core I% is allocated among the multiple multi-core microprocessor packages 102 in the present embodiment of the present embodiment, the core 106 has certain power management related resources, that is, by the chipset. The m and the chipset 114 are used to snoop or not to snoop the bus. The clock is cached on the processor bus: slightly, so the chipset 114 can be pre-determined by a 1/〇 connection address. And the single-1/0 read transmission on the destination bank 116 is expected. In addition, the cores 106 of the two packages H 102 potentially share a VRM, and the core c of the chip collapse may share - PLL ' as described above. Advantageously, the core of the system of Figure 11, 106 (especially the microcode 208 of the 2012 2012 948 core 106) is designed to communicate with each other, as such and CNTR. As explained in 2534, the control of the resources related to the shared power management is coordinated in a decentralized manner by using the inter-core communication wiring ιι2, the inter-chip communication wiring 118, and the inter-fab communication wiring 1133. The inter-chip communication wiring 118 of the first multi-core microprocessor 102 is designed as shown in FIG. However, the pins of the second multi-core microprocessor 1〇2 are represented by &quot;, &quot;p6„, "P7" and "P8", and the inter-chip communication wiring of the second multi-core micro-processing stomach 1〇2 The 118 series is set as follows: the contact pads of the wafer 2 and the contact pads of the wafer 3 are coupled to the pin 5 via a single wiring net; the contact pads of the wafer 2 and the IN 2 contact pads of the wafer 3 are via a single wiring. The net is coupled to the pin p6; the 〇υτ contact pad of the chip 2 and the IN 1 contact pad of the chip 3 are coupled to the pin p7 via a single wiring net; the OUT 3 of the wafer 3 contacts the 3U 2 of the IN 3 contact pad via a single _ The wiring network is coupled to the pin P8. Further, the pin P1 of the first multi-core microprocessor 102 is coupled to the second multi-core microprocessor 102 via the inter-package communication wiring 1133 of the motherboard of the system U00. The pins P7' are such that the out contact pads of the wafer, the sink contact pads of the wafer, the IN 2 contact pads of the wafer, and the IN 3 contact pads of the wafer 3 are all coupled together via a single wiring network; A pin p2 of a multi-core microprocessor 1 耦 2 is coupled to a pin P8 of the first multi-core 〇 processor 1 , 2 to enable the wafer 1 The out contact, the IN 1 contact of the wafer 2, the IN 2 contact of the wafer 3, and the IN 3 contact pads of the wafer are all coupled together via a single wiring network; the first multi-core micro-processing is 102 The pin P3 is coupled to the pin P5 of the second multi-core microprocessor ι2 to make the OUT contact pad of the wafer 0, the j contact pad of the wafer 1, the IN2 contact pad of the wafer 2 201245948, and the wafer The IN3 contact pads of the three are all coupled together via a single wiring network; the pin P4 of the first multi-core microprocessor 102 is coupled to the pin P6 of the second multi-core microprocessor 1〇2 to enable the chip The OUT contact pad of 0, the IN 1 contact pad of the wafer 1, the 2 contact pad of the wafer 2, and the 3 contact contacts of the wafer 3 are all coupled together via a single wiring net. The CSR 234 of FIG. 2 is also connected. The inter-package communication wiring 1133 is used to activate the microcode 208 to program the CSR 234 to communicate with other cores 106 via the inter-package communication wiring 1133. Therefore, the manager core 106 of each wafer 104 is activated to be packaged. Inter-body communication wiring Π33 and inter-wafer communication wiring 118 and other wafers ι管理者4 manager core 1〇6 (ie, its companion) communicates. When each manager core 1〇6 wants to communicate with other chips 104, it transmits the information just on its ουτ contact pad, and this The information is broadcast to other wafers 1 4 and received by respective manager cores 106 via appropriate access ports (10). As may be observed from Figure u, advantageously, relative to each multi-core microprocessor The number of contacts (10) on each wafer ship and the number of pins P on the package 102 are not greater than the number of wafers, which is a relatively small number. 'More' Please note that for one of the wafers 104, the established manager core is 1〇6, and the manager of the core is the core of the established manager.” With the core office, we can observe the core from Figure 11. g, core 2, core 4, and core 6 are companions similar to those configured in Figure 9, even though the Figure 9 chip collapses are included in the single-eight-core micro-drive package_, and: Figure 4 The defragmentation system is included in two separate quad core microprocessor packages 201245948 body 102. Thus, the microcode 2 〇 8 system described with respect to Figure ι is designed to operate as in system 1100 of Figure 11. All four companion cores 1〇6 form a co-operating attribute group together, in which each companion core 1〇6 system is activated without arbitration, so that no matter which companion core 1〇6 is designated as The Bsp core can be directly coordinated with any other companion cores 1. I further note that although the pin P is required in a multiprocessor embodiment (such as those shown in Figures 11 and 12), But if necessary, the pin can be in a single month. The multi-core microprocessors 1 2 are omitted in the embodiment, although they are beneficial for debugging purposes. Referring now to the block diagram shown in Figure 12, the computer system 1200 is shown to perform the allocation in two multi-core micros in accordance with the present invention. An alternative embodiment of decentralized power management between multiple processing cores of processor 12〇2. System 12 is similar to system 1100 of Figure u, and multi-core microprocessor 12〇2 is similar to Figure u. The core micro-processing state 102. However, the eight cores of the system 1200 are organized and physically connected by a deep hierarchical hierarchical coordination system by bypass wiring. Each wafer 104 has only three contact pads 108 ( OUT, IN1 and IN2) for coupling to the inter-wafer communication wiring 118; each package 12〇2 has only two pins, represented by Ρ^_Ρ2 on the first multi-core microprocessor 1202, and The two-core microprocessor 1202 is represented by P3 and P4; and the inter-chip communication wiring 118 and the inter-package communication wiring 1133 connected to the two multi-core microprocessors of FIG. 12 have different configurations from the corresponding elements in FIG. In the system of Figure 12〇 , The core and the core 4 square of their respective designated multi-core microprocessor 1202 81 201 245 948 The "package manager &quot; or" p manager. " Furthermore, unless otherwise (four) 'no specific terms &quot; friends &quot; this system to represent the manager core, i06 on different packages 1202 communicating with each other; therefore, in the embodiment of Figure 2, Core 0 and Core 4 are friends. The inter-wafer communication wiring 118 of the first multi-core microprocessor 12A is designed as follows. Within the first package body 12〇2, the OUT contact pad of the wafer defect and the “contact contact pad of the chip” are extinguished to the pin P1 via the single-wiring net, and the OUT contact pad of the wafer 1 and the IN of the wafer 0 are 1 The contact pads are coupled via a single wiring network; and the IN2 contact pads of the wafer 0 are coupled to the pins P2. Within the second package 1201, the contact pads of the 0UT contact pads of the wafer 2 and the wafer 3 are passed through a single- The wiring net is coupled to the pin P3; the contact pad of the wafer 3 is in contact with the chip 2 via the single-wiring net; and the IN2 contact of the chip 2 is connected to the pin P4. Uo's motherboard inter-package communication wiring 1133, pin P1 is coupled to pin P4, so that the wafer 0 接触τ contact pad, the wafer IN 1 contacts 塾 ' and the wafer 2 ΓΝ 2 contacts 塾 via a single distribution network And all the contacts are connected to each other; and the pin P2 is connected to the pin P3' to make the wafer 2^〇υτ contact the whole, the IN 1 contact pad of the wafer 3, and the sink 2 contact pad of the wafer via the single wiring net And all come together. Therefore, unlike in the system 900 of Figure 9 and in the system of g η, each of the tubes For example, 1G6 can communicate with other managers, and in the system 1200 of Figure 12, only the manager core and the manager core 4 can communicate with each other (i.e., via the bypass wiring as described herein). An advantage of the embodiment over graph u is that with respect to each multi-core microprocessor 12〇2, the contact 塾108 S mesh (1) of each wafer 1 〇 4 on 82 201245948 is smaller than the number of wafers, and each The number of pins p (2) on the body 1202 is smaller than the number of the wafers 104, which is a relatively small number. In addition, the number of C-state exchanges between the cores 1 and 6 may be less. For the purpose of debugging, the first multi-core processor 1 also includes a third pin that is connected to the WT connector (10) of the chip 1, and the second multi-core microprocessor 1202 also includes a surface-to-wafer 3 〇υτ接难1〇8 one of the third pins. See the flowchart shown in the multi-test I3, which shows the system 1200 according to the present invention for performing distribution on the dual quad core microprocessor (10) (eight The core system is the "distributed power management operation" of "the first (four) eve of the night, the core, and the 〇6. More specifically, the private graph of σ diagram I3 shows FIG. 3 (the operation of “c_state microcode 2〇8” is similar to the flowcharts of FIGS. 4 and 1G, which are similar in many respects, and The blocks of the same number are similar. However, the configuration of the inter-chip communication wiring 118 and the inter-package 1-hole wiring 1133 which are responsible for the sync_C-state microcode application of the core touch described in the flowchart of Fig. 13 is shown in Fig. 12. The system 12〇〇 is different from the system of Figure n, especially some of the manager cores (ie Core 2 and Core 4) are not designed to be with the system. (10) Direct communication 'But instead a friend (core 核心 and core 4) is passed down to it in a hierarchical way (4) chew (core 2 and heart 6 respectively), which are then passed down to their partners in sequence. Eagle. Now let's make these differences. The flow starts at block 4〇2 in the figure and proceeds to block 424, as explained in relation to Figure 4. However, Figure 1〇 does not contain block gamma or similar. Proceeding from block 424 to block 1326. Additionally, in decision block 4 , 83 201245948 If the interrupted core i 6 is a friend rather than a partner or companion, the flow continues to block 1301. At block 1326, the sync_C-state microcode 208 is calculated by &quot;C&quot; The minimum value of £) is used to calculate the value of one of the (local) mixed C-states, as indicated by &quot;c&quot;. Flow continues to decision block 1327. At decision block 1327, if calculated at block 1326 If the &quot;C&quot; value is less than 2 or the core 106 is not the package manager core 106, then flow continues to block 416; otherwise, the flow continues to block 1329. At block 1329, the sync_c-state microcode 208 is programmed by CSR 234. A new instance of the sync_C-state is generated on its buddy to pass the X" value calculated at block 1326 to its buddy and to interrupt the buddy. This requires the friend to calculate and return a mixed C-state (this situation is similar to the description above in connection with Figure 4, which may constitute the composite C-state of the entire processor) and ask the friend to provide it back to this core. . Flow continues to block 1331. At block 1331, sync_C-state microcode 208 programs CSR 234 to detect that the friend has returned a mixed state to core 1〇6 and obtains a mixed C-state, indicated by &quot;D." Flow continues to block 1333 At block 1333, the sync_C-state microcode 208 calculates a most recently calculated mixed c_ state by computing the minimum of the &quot;C&quot; and &quot;D&quot; values, denoted by ''C'. We may notice' Assuming D is at least 2, then once the flow continues to block 1333, in block 1333, at the C-state calculation of the 'combination of CM values', the c_state of each core 1 〇6 in the system 1200 is considered. Thus, the synthesized c-state is referred to herein as the c_ state synthesized by system 1200. Flow continues to block 416. Flow continues from block 434 in Figure 13 and proceeds to block 444 and 84 201245948 448 'as relevant This is illustrated in Figure 4. However, Figure 13 does not include blocks 452, 454 or 456. Conversely, the flow continues from block 448 to block 1352. At block 1352 'sync_C• state microcode 208 by calculating "G', The minimum value of the &quot;H&quot; value to calculate a most recent calculation The local mixed C-state is indicated by "G&quot;. Flow continues to decision block 1353. At decision block 1353, if the fG&quot; value calculated at block 1352 is less than 2 or the core 106 is not the package manager core 106, then the flow continues. To block 442; otherwise, the flow continues to block 1355. At block 1355, the sync_C-state microcode 208 generates a new instance of the sync_C-state on its buddy by the stylized CSR 234 for the nG to be calculated at block 1352. The value is passed to its friend and used to interrupt the friend. This requires the friend to calculate and return a mixed C-state to the core 106. Flow continues to block 1357. At block 1357, sync_C-state microcode 208 stylizes CSR 234 To detect that the friend has returned a mixed C-state to core 1〇6 and obtain a mixed C-state, indicated by &quot;H&quot;. The flow continues to block 1359. At block 1359, the sync_C-state microcode 208 is Calculate the minimum value of the &quot;G" and &quot;H&quot; values to calculate a recently calculated local mixed C-state, expressed as &quot;G." We can note that assuming Η is at least 2, once the process continues to the block 135 9, in block 1359, the C-state of each core 106 in system 1200 is considered in the composite C-state calculation of the &quot;G" value; thus, the synthesized C-state is referred to herein as system 1200. Synthesize the C-state. Flow continues to block 442. Flow continues from block 466' in Figure 13 and continues through blocks 476 and 482, as explained in relation to Figure 4. However, Figure 13 does not include blocks 484, 486 or 85 201245948 488. Conversely, flow continues from block 482 to block 1381. At block 1381, the sync_C-state microcode 208 calculates a recently calculated local mixed C-state by computing the minimum of the "L&quot; and &quot;M&quot; values, as indicated by &quot;L&quot;. Flow continues to decision block 1383. At decision block 1383, if the "L&quot; value calculated at block 1381 is less than 2 or the core 106 is not the package manager core 106, then flow continues to block 474; otherwise, flow continues to block 1385. At block 1385, the sync_C-state microcode 208 generates a new instance of the sync_C-state on its buddy by the stylized CSR 234 to transmit the &quot;L&quot; value calculated at block 1381 to its buddy' and to interrupt Friends. This requires the friend to calculate and pass back a mixed C-state to this core 106. Flow continues to block 1387. In block 1387, sync_c·state microcode 2〇8 stylizes CSR 234 to detect that the friend has returned a mixed C_ state to core 1〇6 and obtains a mixed C-state, indicated by “M&quot;. Flow continues To block 389. At block 1389, the sync_C-state microcode 208 calculates the locally-synchronized locally synchronized C-state by calculating the minimum of the &quot;L&quot; and the &quot;M&quot; value, &quot;L,' Representation. We may note that, assuming at least 2, once the process continues to block ι389, each of the system 12〇〇 is considered in the composite C-state calculation of the 'in&quot;L value in block 1389. The C-state of core 106, therefore, the synthesized c_state is referred to herein as system 12 〇〇 synthesized C-state. Flow continues to block 474. As discussed above, in decision block 432, if the core 106 of the emotion is a friend rather than a partner or companion, then flow continues to block 1301. At block 1301, the core 106 is interrupted by its friends, so the microcode 2〇8 program 86 201245948 converts the CSR 234 to obtain the synthesized C-state of the friend from its friends, represented by nQ&quot; in FIG. It should be noted that the buddy does not wake up the instance of the synch_C-state if it has not confirmed for its encapsulation that the composite C-state is at least 2. The flow continues to block 1303. At block 1303, the sync_C-state is slightly drinking 2〇8.  A local mixed C-state (represented by MR) is calculated as the minimum value of the C-state &quot;Y&quot; value and the &quot;Q&quot; value it is applied to at block 13: The flow continues to decision block 1305. Block 1305, if the "R" value calculated at block 1303 is less than 2, the flow continues to block 1307; otherwise, the flow continues to block 1311. At block 1307, in response to an inter-core interrupt from its friend request, the microcode 2程式8 The stylized CSR 234 transmits the &quot;R&quot; value calculated at block 1303 to its friend. Flow continues to block 1309. In block 1309, the routine will calculate the &quot;R&quot; value passed at block 1303. Returning to its caller, flow ends at block 1309. At block 1311, the Sync-C-state microcode 208 generates a new instance of the sync_C-state on its partner by the stylized CSR 236 for use in block 1303. The "R" value is passed to its partner and used to interrupt the partner. This requires the partner to calculate and return a mixed C-state to core 1〇 6. Flow continues to block 1313. In block 1313, 'sync_C-state microcode 208 Stylized CSR 236 to detect The test partner has returned a mixed C-state to core 1〇6 and obtained the partner mixed C-state, indicated by &quot;S, in Figure 13. Flow continues to block 1315. At block 1315 'sync-C- The state microcode 208 calculates a most recently calculated local mix by calculating the minimum of the "R&quot; and "S" values (: _form 87 201245948 state, indicated by &quot;R&quot;. Flow continues to decision block 1317. In block 1317, if the &quot;R&quot; value calculated at block 1315 is less than 2, then flow continues to block 1307; otherwise, flow continues to block 1319. At block 1319, sync_C-state microcode 208 is programmed by CSR 234 A new instance of the sync_C-state is generated on its companion to pass the &quot;R&quot; value calculated at block 1315 to its companion and to interrupt the companion. This requires the companion to calculate and return a mixed C-state to this Core 1〇 6. Flow continues to block 1321. At block 1321, sync_C-state microcode 208 stylizes CSR 234 to detect that the companion has returned a mixed c-state to core 106 and obtains a mixed C-state to &quot ;S” indicates. The process continues To block 1323. At block 1323, the sync_c-state microcode 208 calculates a recently calculated local mix state by calculating the minimum of the "R" and "s" values as &quot;R&quot;. We can note that it is assumed that s is at least 2, so once the process proceeds to block 323 ', it will be in block Π23, and the c_ state of each core 1〇6 in system 12〇0 is taken into account in the calculation of τ value. Therefore, 'τ will form a simplified synthesis of the system to the block 13〇7. Referring to the block diagram shown in the reference figure, it is shown that the computer system of the present invention is replaced by one of the multi-processing power cores of the multi-processing core 1〇6 of the core core microprocessor 1402. Example. The system has just been similar in some respects to Figure =i=9GQ' because it consists of a single eight-core microprocessor with four dual-core wafers that are connected together via inter-wafer communication. 1402. However, the eight cores of system 1400 are organized and physically connected by bypass wiring in accordance with a deeper three-layer hierarchical coordination system. First, the arrangement of the inter-wafer communication wiring 118 is different from that of Fig. 9, as follows. The system 1400 is similar in some respects to the system 12 of Figure 12, in which the core is organized and physically connected in accordance with a three-tier hierarchical coordination system. Each of the four wafers 104 includes two contact pads 108' for coupling to the inter-wafer communication wires 118, namely an OUT contact pad, an in 1 contact port, and an in 2 contact port. The multi-core microprocessor 1402 of Fig. 14 includes four pins which are represented by &quot;ρι&quot;, ''ρ2', &quot;ρ3&quot;, and &quot;p4&quot;. The core microprocessor of Figure 14 is 14〇2 The inter-wafer communication wiring 118 is configured as follows: the out contact pads of the wafer, the contact pads of the wafer 1, and the IN 2 contact pads of the wafer 2 are all coupled together via a single wiring network coupled to the _P1 The wafer contact pad and the wafer contact pad are coupled together via a single wire mesh coupled to the pin P2; the pad 2 of the wafer 2 contacts the pad 1 of the wafer 3 and the wafer 0 The 接触2 contact pads are all coupled together via a single wiring net coupled to the pin 3; the 〇υτ contact pads of the wafer 3 and the IN contact pads of the wafer 2 are coupled via a single distribution network coupled to the pins The cores 106 of Figure 14 are designed to operate in accordance with the teachings of Figure 13, for cores and cores 4, even if they are located in the same package 14〇2 (as described above in relation to Figure 12) The specific term, the opposite of the friend" is still considered good In the embodiment of Fig. 14, the two friends communicate with each other via the inter-chip communication wiring 118 instead of the inter-package communication wiring 1133 of Fig. 12. In this case, in addition to the physical model of the processor, the core is designed according to a deep hierarchical coordination system with three levels of domain. Referring now to the block diagram shown in FIG. 15, an alternate embodiment of distributed power management distributed between a multi-processing core of a multi-core microprocessor 15A2 is performed in accordance with the computer system 1500 of the present invention. System 丨5〇〇 is similar in some respects to system 1400 of Figure 14, as it includes a single eight-core microprocessor 15〇2 having eight cores 〇6 represented by cores 0 through 7. However, multi-core microprocessor 1502 includes two quad-core wafers 1504 that are secretly coupled via wafer μ communication wiring 118. Each of the two wafers 1504 includes two contact pads 108 for engaging the inter-wafer communication harness 118, i.e., a contact pad and a sinking pad 2 and a padding pad. The multi-core microprocessor 15 〇 2 includes two pins indicated by "卩"" and |, 1 &gt; 2". The configuration of the inter-month communication wiring 118 of the multi-core microprocessor 1502 is as follows. The contact pad and the wafer contact pad are consumed by the single-wiring net coupled to the pin - and the wafer is (5) the contact pad is in contact with the IN 1 of the wafer, and the single-wiring is connected to the pin P1. The network is connected together. In addition, the core communication between the four core chips 1504 _ 112 will be lightly connected to each core 1 〇 6! 5 〇 4 other core just to promote distribution in a multi-core microprocessor 1502 The decentralized power management of multiple locations is difficult. The core 106 of Figure 15 is designed to operate according to the description of Figure 13, and is understood by the following description. Wei, the core of each wafer itself is based on The layers of the layers are coordinated and pure, and are organized by the bypass wiring* and physically connected to the day and day. There are two partners with the same attribute group (core and core workers; core 2 201245948 and core 3) and a companion Attribute group (core 〇 and core 2). Similarly, wafer 1 has two Partners with the same attribute group (Core 4 and Core 5; Core 6 and Core 7) and a companion with the same attribute group (Core 4 and Core 6). Here we can note the same: Companion core ~ Even if we are on the same chip Upper (as described above in relation to Figure 1).  The opposite of the characteristics of the companion is still considered a companion. Further, the companion communicates with each other via the inter-core communication wiring U 2 instead of the inter-wafer communication wiring 118 of Fig. 12 in the example of Fig. 15 . Second, the package itself defines a third hierarchical range and a corresponding buddy group. For the core of the core and the core 4 even if they are located on the same package 1502 (as opposed to the above-mentioned terminology specified in Figure 12, the meaning of 'friends') is still considered a friend. And 'friends in Figure 15 In the embodiment, communication is performed with each other via the inter-wafer communication wiring 118 instead of the inter-package communication wiring 1133 of Fig. 12. Referring now to the block diagram shown in Fig. 16, the computer system 1600 according to the present invention performs the allocation in a An alternative embodiment of decentralized power management between multiple processing cores 1 and 6 of multi-core microprocessors 16 。 2. System 1600 is similar in some respects to system 15 图 of Figure 15 because it includes a single eight core The microprocessor 16〇2 has eight cores 106 represented by core 0 to core 7. However, each wafer 104 includes a plurality of inter-core communication wires η) between each core 1〇6, To allow the parent core 106 to communicate with other cores 106 in the wafer 104. Thus, to illustrate the operation of the microcode 208 for each core 1-6 of Figure 16, (1) core 〇, core 1, core 2, and core 3 Considered as a partner Core 4, Core 5, Core 6 and Core 7 are considered partners; (2) Core and Core 4 are considered as companions. Therefore, 91 201245948 System (10) is based on the same attribute group of partners and peers. One of the two-level hierarchical coordination system domains is connected and physically connected. In addition, the inter-core communication wiring 112 existing in each core of the chip can facilitate the supply; the defined partner is the same attribute group. One of the peer cooperation coordination models. Although it is possible to coordinate the operation according to the peer cooperation, Figure 17 (4) - the manager cooperation coordination model used for distributed power management between the heart. Now refer to the flowchart shown in Figure 17 The system 1600 of Figure 16 in accordance with the present invention is shown for performing the operation of distributed power management distributed between multiple processing cores of a multi-core microprocessor 1 。 2. More specifically, the flowchart of Figure 17 is not shown in Figure 3. The operation of the sync_〇' state microcode 2〇8 (with Figure 6) is similar to the flowchart of Figure *, which is her in many respects and the squares of the (four) number are similar. However, in Figure 17 Illustrated in the flowchart The microcode 2〇8 of the core 1〇6 is responsible for the case where there are eight cores 106, instead of the four cores 1〇6 in the embodiment of the figure, specifically four cores 1〇6 series two dual wafers 1 The method of 〇4 exists, but now the difference is explained. In particular, each manager core 〇6 of a wafer 1-4 has three buddy cores 106 instead of one buddy core 〇6. The flow begins in Figure 17. Block 402 continues and passes through decision block 404 and leaves the 1 'NO' branch of decision block 404 to decision block 432 as explained in relation to FIG. However, Figure 17 does not include blocks 406 through 418. Conversely, flow continues from decision block 404 to the &quot;YES&quot; branch to block 1706. At block 1706, sync_C-state microcode 208 generates a sync_C-state routine on a partner by programming CSR 236 of FIG. An example, to be received at block 402 92 201245948 or generated at block Π12 (discussed below), the ά&quot; value is passed to its next partner and used to interrupt the partner. This requires the partner to calculate and return a hybrid C. _ state to core 106. In the loop containing blocks 1706, 1708, 1712, 414, and 1717, the microcode 208 keeps a record of its visited partners to ensure that they visit each of them (unless the decision block 414 is The process is found to be a real condition. The flow continues to block 1708. At block 1708 'sync-C-state microcode 208, the CSR 236 is programmed to test that the next partner has returned a mixed C-state to the core 106 and obtained the partner. The mixed C_ state is represented by &quot;Βπ in Figure 17. Flow continues to block 1712. At block 1712, the sync_C-state microcode 208 calculates a recent count by calculating the minimum of &quot;Α&quot; and "Β" values. The local mixed C-state of the calculation, which is represented by &quot;A". Flow continues to decision block 1714. At decision block 1714, if the "Απ value is less than 2 or the core 106 is not the manager core 106' as calculated at block 1712, then flow continues to block 1716; otherwise, the flow continues to decision block 1717. At block 1716 'sync_C-state The microcode 208 passes the &quot;A&quot; value calculated at block 1712 back to its caller. Flow ends at block 1716. {^Decision block 1717, sync_C-state microcode 208 determines whether all of its partners have been visited. That is, whether core 106 has exchanged C-states with each of its partners by blocks Π 06 and 1708. If so, the flow continues to block 1719; otherwise, the flow returns to block Π 06. At block 1719, sync_C-state microcode 208 determines that the &quot;A&quot; value of 93 201245948 calculated at block 1712 becomes its wafer synthesis c_ state, which is represented by "c", and the flow continues to block 422 and proceeds to block 428, as described above in relation to FIG. The flow continues from the &quot;NO&quot; branch of decision block 438 to decision block 1739. At decision block 1739, the sync_c-state microcode 208 determines whether all of its partners have been visited, i.e., whether the core 106 has exchanged a mixed C-state with each of its partners via blocks 1741 and 1743 (discussed below). If so, the flow continues to block 446 and proceeds to block 456 as described above in relation to FIG. 4; otherwise, flow continues to block 1741. At block 1741, the sync_C-state microcode 208 generates a new instance of the Sync_C-state routine on its next buddy by programming the CSR 236 of FIG. 2 for use at block 436 or at block Π 45 (discussed below) Calculated, the 'G' value is passed to its next partner and used to interrupt the partner. This requires the partner to calculate and return a mixed state to core 106. Including blocks 438, 1739 '174: 1, 1743, and 1745 In the loop, the microcode 208 keeps a record of the partners it has visited to ensure that it visits each of them (unless the decision block 438 is found to be a real condition). Flow continues to block 1743. At block 1743, sync_C- The status microcode 208 stylizes the CSR 236 to detect that the next partner has returned a mixed C-state to the core 1 〇 6 and obtains the partner's mixed &amp; status, indicated by "F" in Figure 17. Flow continues to block 1745. At block 1745, the synC_C-state microcode 2〇8 is calculated by calculating the minimum value of the F, and %" values - the most recently calculated handle is mixed with the c state, which is represented by the "center". Flow returns to decision block 438. 94 201245948 Figure 17 does not include blocks 478 through 488. Instead, the flow continues to exit the decision block 472 "ΝΟπ branch to decision block 1777. At decision block 1777 'sync_C-state microcode 208 determines whether all of its partners have been visited', ie, whether core 106 has passed block 1778 and 1782 (discussed below) exchanges a mixed C-state with each of the partners. If so, the flow continues to block 474 and proceeds to block 476 as described above in relation to FIG. 4; otherwise, the flow continues to block 1778. At block 1778, the sync_c-state microcode 208 generates a new instance of the sync-C-state routine on the next buddy by programming the CSR 236 of FIG. 2 for use at block 468 or at block 1784 (discussed below) The calculated value is passed to its next partner and used to interrupt the partner. This requires the partner to calculate and return a mixed heart state to core 106. In the loop containing blocks 472, 1777, 1778, 1782, and 1784, the microcode 208 keep track of the records of the partners they have visited to ensure that they visit each of them (unless the decision is found to be true in decision block 472). Flow continues to block 1782. At block 1782, the sync-C-state microcode 208 programs the CSR 236 to detect that the next zebra has passed back a mixed C-state to the core 1 〇 6 and obtains the partner's 匸 匸 state. "M&quot; said. Flow continues to block 1784. At block 1784, the Sync_C-state microcode 2〇8 is calculated by calculating the minimum value of the L" and "M&quot; values - the most recently calculated local blend "weave, its fine" &quot;L&quot; Representation. The process reverts to decision block 472. As stated earlier, 'as applied to the diagram showing a hierarchical coordination model of a manager arbitration 95 201245948 to a microprocessor 1602 application, its bypass wiring facilitates At least some of the core co-attribute groups cooperate with the coordination model. This combination provides various advantages. On the other hand, the physical architecture of the microprocessor 1602 provides defining and redefining hierarchy. The domain and the resiliency of the designating and redesignating domain manager, as described in the paragraphs of Application No. 61/426,470, the application for the aforementioned application is December 22, 2010, entitled " Dynamic and Selective Core Disablement in a multi-core processor, and its non-provisional application (CNTR. 2536), which is incorporated herein by reference. In addition, on a microprocessor that provides such inter-core coordination flexibility, one of the hierarchical coordination systems can be actuated in a manner similar to the predetermined situation or configuration settings. For example, '--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- The same attribute group uses one of the temporary stalkers, or switches to a model that is used by the same-identity group. Examples of possible model switching conditions include the search for the management phase 2 = disable, the reduced manager if based on their (four) or urgency, only the source state (for example, from (4)), the next tt composite power state A brother's power state composite power state discovery process can operate before the implementation of the restricted power supply state of each core in 96 201245948 to be responsible for processor state 0.  However, the configuration and rating of the earlier narrators are also considered by the present invention. In addition, this issue = the restricted domain of the specific domain level, which is a very advanced setting of the power state, in the domain. The restricted power state of the class will be applied to the progressive higher of the processor: for example, in a multi-core multiprocessor with multiple multi-core chips, the ΓγΓ core is shared by -pll, but by micro The single-job shared by one of the processors, as explained in the C-Board, the power state hierarchy of the restricted domain can be defined to include, among other things, a core 4 (and not externally shared). The first set of power states of the resource is particularly suitable for the resources that are not shared by the outside of the wafer (for example, rainbow ', taken) (4) source type 15 and are suitable for the whole The microprocessor also has a group power state (such as a voltage health bus). Thus, in each embodiment, each domain has its own composite power state. For the mother, there is a single-appropriate authorization, such as (for example, the domain's clerk), which has the implementation or activation of the restricted power state, such as the power state hierarchy of the corresponding The definition of Rui County is the domain of Wei’s name. The advanced configuration is particularly suitable for including the A example shown in cntr·(5)4, in which the processor cores of the subgroup share the cache, pLL, and the like. The present invention also contemplates several embodiments in which the decentralized synchronization process utilizes 97 201245948, which does not require waking up all cores to not only manage the implementation of the restricted power state, but also selectively implements a restricted power state. One evokes the state or revokes. This advanced embodiment contrasts with a system similar to that of Figure 5 in which the de-setting of one of the chipsets STPCLK can completely wake up all cores. Referring now to Figure 23, an embodiment of the sync_state logic is depicted to show, for example, the conditional implementation of the selective contact-restricted operational state in the microcode. As described below, the syne-state logic 2 supports the implementation of the power state hierarchy coordination system of the domain-differentiated (d〇main-differentiated). The good news is that the sync-state logic 2300 is quite measurable because it can be extended to the hierarchical coordination system of the D_in_levd & handsome. Moreover, the logic 2300 can be used not only as a way for the microprocessor to appear to be global, but also for a particular group core of the microprocessor (eg, only for the core of a wafer, as explained below with respect to block 2342). More restrictions are implemented. In addition, the sync_state logic 2300 can be independently applied to groups of different operational states using different hierarchically coordinated systems with associated definitions, operational states of the applications, and domain level thresholds. In an implementation of an embodiment similar to the earlier display of the sync-C-state microcode 208, the sync-state logic 2300 may be generated locally or externally and in the routine of transmitting a detected state value. carried out. For example, a power state management microcode routine can receive a target operational state transmitted by an MWAIT instruction, or associated with CNTR 2534, using a local core logic for the core to generate a target operational state (eg, a request) VID or frequency ratio value). Then, the power supply bearer 98 201245948 microcode routine can store the target value as the core target operating state 〇 TARGET, and then wake up the sync-gamma logic 2300 by transmitting the 0TARGET to the probe state value 〃p. 'The * page is similar to the implementation discussed in the previous embodiment. —Curry Logic, 230() may be awakened by the externally generated (four) step requirement. For simplicity, this example is called an external wakeup instance of sync-state logic. Before moving on further, we should note that, for the sake of simplicity, Figure 23 shows that in a form suitable for managing the operational state, her logic 23 〇〇 'operational secrets are required to be progressively greater Coordination between the cores to progressively higher demand verification (for example, as in c. The way lg) is defined or arranged. It will be understood that a person skilled in the art can utilize the lying logic to modify the sync-state logic 2300 to support an operational state hierarchy (e.g., VID or frequency ratio state) in which the operational state is defined in the opposite direction. Alternatively, operational states that are defined in one direction by tradition or selection may be arranged in the opposite direction by definition. Thus the 'sync-her logic' can be applied to the operational state (e.g., the VID and frequency ratio states of the demand) only by rearranging them&apos; and applying a reference value of the opposite indication (e.g., a negative original value). We also pay attention to the fact that the marriage 23 shows that the syn-state logic 2 is designed as a strictly hierarchical association (10), and its all-inclusive group is based on a manager arbitration agreement. As evidenced by the previously described synchronous logic embodiments that may coordinate some degree of peer-to-peer cooperation, the present invention should not be construed as being limited to strictly hierarchical side secrets (except for the degree of face-to-face). 99 201245948 The flow begins at block 2302, where the sync_state logic 2300 receives the probe status value 'T&quot;. Flow continues to block 2304 where the sync_state logic 2300 also obtains the target operational state of the local core, TARGET, the maximum operational state 〇MAX that can be implemented by the local core, the largest domain level DmaX controlled by the local core, and is not involved. Or the maximum available domain-specific state MD of the external resource of a particular domain D. It should be noted that the manner in which sync_state logic 2300 obtains or calculates the value of block 2304 or the chronology is not important. Block 2304 in the flow diagram is only used to introduce the important variables that apply to the sync_state logic 2300. Other core coordination on his package, if any, and so on. The maximum applicable domain-specific state Md is: m = In an exemplary but non-limiting embodiment, the domain level D is defined as follows: single core is 〇; multi-core wafer is]; multi-chip package For 2, and so on. The operational states of 〇 and 1 are unrestricted (meaning that a core can implement them without coordination with other cores), and the operational states of 2 and 3 are limited by the core of the same chip (think they may be - The core of the crystal is implemented on the other cores, but not in coordination with other cores on other crystals, and the operating states of 4 and $ are related to the phase. Limited at the core of the body (meaning that they may be implemented on the body after d weeks of the core of the package, but need not

〇max ;晶片管理者核 100 201245948 心將具有1之Dmx以及3之對應的最大可自我實行的操作狀態 〇max ;以及封裝體管理者或BSP核心將具有2之DMAX以及5之 對應的最大可自我實行的操作狀態〇ΜΑχ。 流程繼續至方塊2306,於此Sync_state邏輯2300計算一初始 昆合值&quot;B”,其等於探測值叩”與本地核心的目標操作狀態OTARGEt 之最小值。又’如杲P是由一附屬家族核心所接收,且其值小於 或等於最大可應用的域-特定操作狀態Μ〇 (家族核心據此為憑證 來實施)’則基於這裡所說明的邏輯,這一般表示一附屬家族核心 凊求撤銷由本地或一較高階級的核心所實行之任何潛在的干涉較 易休眠狀態(interfering sleepier state)。此乃因為在一般配置中, 附屬家族核心已經實行相對於其所能夠的程度下為更清醒的p狀 L而其無法在沒有較高層級協調的情況下,單方面地撤銷經由 —個其不能控制的域所實行之干涉較易休眠狀態。 流程繼續至方塊2308,於此d或層次變數〇被初始化為零。 在上述所顯示之例子中,一個為0之D表示一個核心。 流程繼續至決定方塊2310。如果D等於Dmax,則流程_ 至方塊2340。否則,流程繼續至決定方塊2312。舉例而言,在一 非官理者核,syne—_ f式 2:勝而不需_示機㈣紙_任何—_至= 1為顯示在方塊2312-2320之間的邏輯係被提供給一管理者核心 之有條件地同步化附屬家族核心。關於另—例子,如果—晶片管 理者核心不具有其他管理者憑證’則其D_x等於丨。初始:心 101 201245948 為0,所以一條件同步過程可能依據方塊2312-2320而在晶片之其 他核心上被實施。但在完成任何這種同步(假設依據決定方塊2312 所述’其並非有條件地過早被終止)且已將D增加1 (方塊2316) 之後,流程將繼續(經由決定方塊2310)至方塊2340。 現在移到決定方塊2312,如果B&gt;MD,則流程繼續至決定方 塊2314。否則,流程繼續至方塊2340。以另一種方式陳述,如果 本地核心目前所計算的混合值B不會涉及或干涉由變數D所界定 域之外部資源,則不需要與任何更多的附屬家族核心同步。舉例 而言’如果目前計算的混合值B為1,這樣的數值表示只衝擊到 位於一既定核心之本地資源,因此不需要與更多的附屬家族核心 做同步。在另一例子中’假設本地核心為一好友核心,其具有足 夠憑證以關閉或衝擊共通於多重晶片之資源。但亦假設好友之目 前計算的混合值B為3,其為一個將只衝擊位於好友之晶片而非 好友所管理之其他晶片之本地資源之數值。又假設好友已依據方 塊2314'2318以及2320而完成與其本身晶片上之每一個核心之 同步,藉以使變數D增加至1 (方塊2316),並使新的Md = Mi = 3納入考量(方塊2312)。在這些情況之下,好友並不需要更進一 步與其他晶月上之附屬家族核心(例如同伴)同步,因為3或更 少之數值之好友之實現無論如何都不會影響其他晶片。 現在移到決定方塊2314,sync_state邏輯23〇〇評估在由D+i 所界定之域中疋否有任何(更多)尚未同步的附屬家族核心。如 果有任何這種核心,則流程繼續至方塊2318。如果不是的話,則 102 201245948 流程首先繼續至方塊2316 (於此D被增加),然後至決定方塊 2310,於此再次評估目前增加的D之值,如上所述。 現在移到方塊2318,因為一未同步的附屬家族核心已被侧 • (方塊2318),所以其可能受目前計算的混合值”B·,之實現(方塊 ,2312)所影響’因為其將影響由附屬家族核心所共用之資源,所 以synC_麵邏輯2300之本地實例在未同步的附屬家族核心上喚 醒一 sync_state邏輯2300之新的從屬實例。本地實例傳送其目前 计异的混合值’’B”以作為對於sync_state邏輯a·之從屬實例之— 探測值。如由sync—state邏輯23〇〇之邏輯所見的,從屬實例最後 將傳回-個不大於原有的,’B”(方塊23〇6)、且不小於附屬家族核 心的最大可應用的域-特定狀態Md (方塊2346)之數值,其為不 會干涉在本地與附屬家族核心之間所共用任何資源之最大值。因 此,當流程繼續至方塊2320時,sync—state邏輯2300之本地實例 採用由從屬實例所傳回之數值作為其本身的”^,值。 到現在為止,已將焦點指向用以有條件地同步化附屬家族核 心之sync—state邏輯2300之一部分。現在,將聚焦於方塊 2340-2348,其說明用以執行一目標及/或同步化狀態之邏輯,包含 與較向級的豕族核心(亦即,較高層級管理者)進行有條件地協 調。 現在移到方塊2340,本地核心執行其目前混合值”B”至其可接 受的程度。尤其’其執行B及之最小值,而由本地核心執行 敢大狀恕。吾人可注意到,相關於屬於域管理者之核心,方塊2340 103 201245948 設計這_仙執行紐動供賊使狀—複合電驗態之最小 方塊2306或2320之與應用於其域之最大受限制電源狀 態(亦即〇MAX)之實現。 流程繼續至決定方塊2342,於此聊__邏輯23〇〇評估本 地核心是否為微處理器之BSp。如果是,則沒有更高級的核心需 要協調,且流賴續至方塊簡。如衫,麟程繼續至決定方 塊2344。吾人應注意到’在實施例中的5她邏輯咖〇係以 對微處理器較不全域(less than a gbbal way )的方式地被應用以控 制操作狀.4 ’方塊2342係以預定組之操作狀態相關之”最高應用 域官理者&quot;置換’’BSP&quot;而改變。舉例而言,如果sync_state邏輯23〇〇 僅應用至由CNTR.2534中所說明之由晶片所共用pLL之期望頻率 時脈比率之中,則將以&quot;晶片管理者,,置換”BSpi,。 在決定方塊2344中,Sync_state邏輯2300評估sync__state夂 本地實例是否被一管理者核心所喚醒。如果是,則本地核心根據 定義與其管理者同步’所以流程繼續至方塊2348。如果否,則流 程繼續至方塊2346。 現在移到方塊2346,sync_state邏輯2300在其管理者核心上 喚醒一個sync_state之從屬實例。其將核心的最終混合值B與核心 的最大可應用的域-特定狀態MD之最大值作為最後探測值p而傳 送之。在此提供兩個例子以說明探測值P之選擇。 在第一例子中,假設B高於本地核心的最大可自我實行的操 作狀態0MAX (方塊2340)。換言之,在沒有較高層級協調的情況 104 201245948〇max; wafer manager core 100 201245948 The heart will have a Dmx of 1 and a corresponding maximum self-executable operating state 〇max; and the package manager or BSP core will have a maximum of 2 DMAX and 5 Self-executing operational status. Flow continues to block 2306 where the Sync_state logic 2300 calculates an initial collocation value &quot;B&quot; which is equal to the detection value 叩" and the minimum value of the target operating state OTARGEt of the local core. And 'if P is received by a subsidiary family core and its value is less than or equal to the maximum applicable domain-specific operational state (the family core is implemented as credentials accordingly)' based on the logic described here, This generally means that an affiliated family core seeks to revoke any potential interference sleepier state imposed by the core of a local or higher class. This is because in the general configuration, the affiliated family core has been implemented with a more sober p-like L relative to its ability, and it cannot be unilaterally revoked without a higher level of coordination. The interference performed by the controlled domain is easier to sleep. Flow continues to block 2308 where the d or hierarchical variable 〇 is initialized to zero. In the example shown above, a D of 0 represents a core. Flow continues to decision block 2310. If D is equal to Dmax, then flow _ to block 2340. Otherwise, the flow continues to decision block 2312. For example, in a non-official core, syne-_f 2: wins without the need for a machine (four) paper_any__ to = 1 for the logic displayed between blocks 2312-2320 is provided A manager's core conditionally synchronizes the core of the affiliated family. Regarding the other example, if the chip manager core does not have other manager credentials, then its D_x is equal to 丨. Initial: Heart 101 201245948 is 0, so a conditional synchronization process may be implemented on other cores of the wafer in accordance with blocks 2312-2320. However, upon completion of any such synchronization (assuming that it is prematurely terminated according to decision block 2312) and D has been incremented by one (block 2316), the flow will continue (via decision block 2310) to block 2340. . Moving to decision block 2312, if B&gt; MD, the flow continues to decision block 2314. Otherwise, the flow continues to block 2340. Stated another way, if the mixed value B currently calculated by the local core does not involve or interfere with the external resources of the domain defined by the variable D, then there is no need to synchronize with any more affiliated family cores. For example, if the currently calculated mixed value B is 1, such a value indicates that it only impacts local resources located in a given core, so there is no need to synchronize with more affiliated family cores. In another example, the local core is assumed to be a friend core with sufficient credentials to close or impact resources common to multiple chips. It is also assumed that the currently calculated mixed value B of the friend is 3, which is a value that will only impact the local resources of the other wafers managed by the friend's wafer rather than the friend. Also assume that the buddy has completed synchronization with each of the cores on its own wafer in accordance with blocks 2314'2318 and 2320, thereby increasing the variable D to one (block 2316) and taking the new Md = Mi = 3 into consideration (block 2312). ). Under these circumstances, the buddy does not need to be further synchronized with the core of the affiliated family (such as a companion) on other crystal moons, since the implementation of friends of 3 or less values will not affect other chips anyway. Moving now to decision block 2314, the sync_state logic 23 evaluates whether there are any (more) affiliated cores that have not been synchronized in the domain defined by D+i. If there is any such core, then flow continues to block 2318. If not, then 102 201245948 The flow first proceeds to block 2316 (where D is incremented), and then to decision block 2310, where again the value of the currently added D is evaluated again, as described above. Now move to block 2318, because an unsynchronized dependent family core has been sided (block 2318), so it may be affected by the currently calculated blend value "B·, the implementation (block, 2312)' because it will affect A resource shared by the core of the attached family, so the local instance of the synC_face logic 2300 wakes up a new slave instance of sync_state logic 2300 on the unsynchronized dependent family core. The local instance transmits its currently mixed mixed value ''B "As a dependent instance of the sync_state logic a · - the detected value. As seen by the logic of the sync-state logic, the dependent instance will eventually return - not greater than the original, 'B' (block 23〇6), and not less than the largest applicable domain of the attached family core. a value of a particular state Md (block 2346) that does not interfere with the maximum value of any resources shared between the local and the attached family core. Thus, when the flow continues to block 2320, a local instance of the sync-state logic 2300 The value returned by the dependent instance is taken as its own "^, value. Until now, the focus has been directed to one of the sync-state logic 2300 that conditionally synchronizes the core of the affiliated family. Now, focusing on blocks 2340-2348, which illustrate the logic for performing a target and/or synchronization state, including conditional coordination with the more advanced steroid cores (i.e., higher level managers) . Moving now to block 2340, the local core executes its current blend value "B" to the extent that it is acceptable. In particular, it performs the minimum of B and is executed by the local core. We may note that, in relation to the core of the domain manager, block 2340 103 201245948 design this _ 仙 纽 纽 — — 复合 复合 复合 复合 复合 复合 复合 复合 复合 复合 复合 复合 复合 复合 复合 复合 复合 复合 复合 复合 复合 复合 复合 复合 复合 复合 复合 复合 复合 复合 复合 复合The implementation of the power state (ie 〇MAX). The flow continues to decision block 2342, where the __ logic 23 〇〇 evaluates whether the local core is the BSp of the microprocessor. If so, there is no more advanced core to coordinate, and the flow continues to block. If the shirt, Lin Cheng continues to decision block 2344. We should note that 'in her embodiment, 5 her logic is used to control the operation in a way that is less than a gbbal way. 4 'block 2342 is in a predetermined group. The operating state is related to the "highest application domain official" &quot;replacement''BSP&quot; and changes. For example, if the sync_state logic 23〇〇 is only applied to the desired frequency of the pLL shared by the chip as explained in CNTR.2534 Among the clock ratios, it will be replaced by &quot;chip manager," BSpi. In decision block 2344, the Sync_state logic 2300 evaluates whether the sync__state local instance is awakened by a manager core. If so, the local core is synchronized with its manager by definition&apos; so the flow continues to block 2348. If no, the process continues to block 2346. Moving now to block 2346, sync_state logic 2300 wakes up a slave instance of sync_state on its manager core. It transmits the maximum value of the core's final blend value B and the core's maximum applicable domain-specific state MD as the last detected value p. Two examples are provided here to illustrate the choice of the detected value P. In the first example, it is assumed that B is higher than the maximum self-executable operating state 0MAX of the local core (block 2340). In other words, in the absence of higher level coordination 104 201245948

下,本地核心無法單方面導致B 一 守双ΰ之兀全貫施。在這樣的情況下, 方塊麗表示本地核心對其管理者核心之_請求,要求其可更完 全實施Β,如果可能的話。吾人將日柏依_ 23所提出之邏輯集 合’如果該項請求並非與管理者如本身的目標狀態以及與其他 潛在影響的心之顧狀態婦㈣,㈣者核心將婉拒此請 长否貝J g理者核〜將實施此請求並到達其與那些狀態相符的 程度,直職树的最大可自行的狀g Q_之最大值(方塊 2340)為止。诚方塊2346 述,雜麵心祕城始核心 的BU(可%等於原始核^的B值)之數值來請求其本身的 更高級核心(如果有的話)’這種請求方式將向上且透過階層而進 朽依此方式,如果應用條件滿足的話,則sync_state邏輯2300 將完全實施本地核心的最終混合值B。 在第二例子中,假設B小於本地核心的最大可自我實行操作 狀態〇max(方塊2340)。假設沒有影響本地核心所控制資源之外之 較高的干涉操作狀態存在,而後在方塊2340中,本地核心可完全 貫行B。但是如果較高之干涉的操作狀態生效,而本地核心將無 法單方面地撤銷干涉操作狀態。在這種情況下,方塊2346表示本 地核心對其管理者核心之一請求,要求其撤銷一既存的干涉操作 狀態至不再干涉B之完整實現之層級(亦即,本地核心最大可應 用的域-特定狀態MD)。吾人將明白到,依據圖23所提出之邏輯 集合,管理者核心將遵從該項請求,藉以實行不大於且可能小於 本地核心的MD之狀態。吾人應注意到,方塊2346可能或者請求 105 201245948 管理者只實行B。但如果B&lt;MD,則這可能使管理者核心執行一 種較本地核心完全實行B所需要之更清醒的狀態。因此,使用等 於本地核心的最終混合值B與本地核心的最大可應用的域-特定狀 態MD之最大值之探測值是較佳的選擇。因此,吾人將明白 sync一state 2302支持一種對於實現休眠狀態及唤起狀態兩者之極 簡方法。 現在移到方塊2348,sync_state邏輯2300將一數值傳回至呼 叫或執行等於核心的最終混合值3與核心的最大可應用域_特定狀 態md之最大值之程序。如以方塊2346作說明,吾人注意到方塊 2348可能或者剛好傳回b之數值。但如果Β &lt;Μ〇,則這可能使一 被奐醒的g理者核心(方塊2318)執行一種比本身所需要更清醒 的狀L。因此,傳回核心的最終混合值B與核心的最大可應用的 域-特定狀態Md之最大值是較佳的選擇。再者,吾人將明白依此 方式sync—state 2302支持一種對於實現休眠狀態與喚起狀態兩者 之極簡方法。 在另—實施例中,一個或多個額外決定方塊係介設於方 2344 sk. * 二6之間,以更進一步設定方塊2346對從屬synC-State 、貫%之條件。舉例而言,在一個適合條件下,如果, =程將繼續至方塊2346。在另一個適合條件之下,如果只有於 、、^ S人可撤銷之一干涉操作狀態目前正被應用至本地核 ^則流程將繼續至方塊2346。如果所應用之這兩個替代條件都 疋’則流程將繼續至方塊2346。依此方式,sync—她纖將 106 201245948 支持一種對於實現喚酿態更_的方法。細,吾人應該觀察 到這個替代實施例假設本地核心可__干涉操作狀態是否正被 應用在本地核心不—定能侧—干涉操作狀態之存在的一實施 例中,則圖23所描繪出之較少條件的實施方法是較佳的。 。人亦將明白在圖23中,當需要實行—目標較深的操作狀態 ^或其之_型式)時’複合操作狀態發_域由制-種依 =低至最同(或瑕靠近至最遠離的同屬性群組)的順序以漸進地 也、越核^之尋訪順序’來尋訪最高層級域(其包含其巢狀域)之 核心%不需要财的糾),而核心的共㈣義受目標操 作狀態所影響。又,當s魏行—較_操作狀態時,複合操作 狀態發_程只需接續的尋訪較高的管理者即可。此外,在上述 說明的替代實_巾,這種尋訪的延伸是要_目前實施的干涉 操作狀態(如果所需要的話)。 因此,在將-較早的示範實例應用至圖23中,2或3之目標 受限制電源狀態將只觸發應用晶片中之核心 ·_ 過程。⑷之目物剛爾將只_㈣ 之複合電源狀態發現過程。 圖23可更進一步以一種域·特定(除了核心-特定以外)之方 式敘述其特徵^繼續上述之例示圖例,—⑼可具有2與3之應 輯-特找源狀態。舉例而言’如杲⑸f理麵心經由一本地 或外部啟簡複合電·態發現難之—部分而發現其晶月本身 之複合電源狀態只有1時,因為i並非是可翻域·特定電源狀態, 107 201245948 料會實施它。如果w管理麵心發現其 連接核、綠態為5(或晶片之複合電·態與—節點地 =:=狀態數值之混合狀態等於5)作為-替代例 子以及如果S曰片官理者核心並不具有任何較高的管理者憑證, 則(假設其沒有這樣做)曰h总 壯能夕心# ^理者核心將實麵啟動3之電源 / ,、係為3 (晶片之最大·域·特定電源狀態)幻 (晶片之複合電源狀態或其之混合狀態)之最小值。再者,吾人 I注意到於此解中,“管理者核'靖_為其郎實施錢 =之電源狀態之魏,而不管任何朗於—較高域(該核心為 又阿域之-部分)之實際或局部的複合電源狀態(例如,4 或5 )為何。 繼續上述之_ ’於此^管理者發現晶片複合電源狀態或 其之混合狀態為5, “管理者將與制伴著手—複合電源狀態發 現過程,其將需要包含下-個較高層級域(例如,封裝體或整個 處理器)之尋訪,此複合電職祕現過程侧立於⑼管理者 的中間實現(如果有的話)與晶片之為3的電·態之外。這是 因為5大於3 ( “之最大朗域_特定電源狀態),所以一較高受 限制電源狀態之實施需要取決於應用於—個或多個較高級域之電 源狀態。此外,下-她高層級域特有的—較高受_電源狀態 之實施可能只藉由該域之管理者而被啟動及/或被實現(例如,多 封裝體處理器之封裝體管理者或單—封裝體處理器之哪)。值得 提醒的是,晶片管理者可能亦同時保持相關的封裝體管理者或 108 201245948 BSP憑證。 因此’在上述例子t,在發現過財之某些點,晶片管理者 核心將與-同伴交換其^複合電職態(或其之齡)。在某些 條件之下,這個發現過程將較高域(例如封裝體)之—至少局部 的複合電源狀態(其小於2)傳回至晶片管理者核心。又,這將不 會導致3之電源狀態之撤銷,其為晶片管理者核心已為晶片而實 施者。在其他條件之下,此種發現過程將對封裝體或微處理器羞 生-複合電源狀態(例如4或更多),其對應至4或更多之受限制 電源狀態。如果是’則該域之管理者(例如封裝體管理者·)將實 施-較高受限制的電源狀態,其係為較高層級域之複合電源狀態 (齡4或n5)與應用於較高層級域之最大受限制的電源狀態(於 此是5)之最小值。如果所應用的發現_正測試一更高級的受限 制電源狀態’則此種附有條件的域_特定電源_狀態實現過程將延伸 至更高級的域層次(如果有的話)。 如上述所述,圖23顯示一種可操作以合併域-相關 (domain-dependem)受限制電源狀態及相關臨界值之階層式域_ 特定受限綱電·辭理協辩、統。絲,其適對於侧 核心及群組核心之電源狀態管理之微調式域_特定分散方法 (fine-timed domain-specific decentralized approach ) 〇 吾人注意到圖2 3顯示以一種分散式分配方式提供轉變成更清 醒的狀紅電源狀態協調邏輯。然而,吾人將明白某些電源狀態 實施例包含數個電源狀態,在缺乏藉由晶片組或其他核心之先^ 109 201245948 電源狀U軸作之下,―财核d能無法從此等電源狀態被 喚起。舉例而言,在上述c_狀態結構中,2或更高之「狀態可能 與移除隨树脈細,其可紐—既定核^不能麵透過系統 匯流排所料之-指令,_魏為-更清_耗。電源或時 脈源可選雜地從—核片被移除之其錄處理器配置亦 被考慮。® 5綱覺崎輯之-實關來顧這些航,其藉由 ^醒所有核心關應STPCLK之解除設置。然而,魏邏紅更 多選擇性實施例可被考慮。在-個例子巾,考慮㈣統軟體(例 如作業系統或BIOS)所實施之覺醒邏輯,其中㈣軟體將首先發 佈—喚起或覺崎求給-歡私,且如果在—段舰時間間隔 之内並未接收—響應或如並不遵從的話,則賴將視需要遞迴 地發佈喚域覺醒請求給題較高的管理者及晶片組(可能是), 直到接收到一期望的響應或偵測到適當的遵從為止。這種由軟體 系統所執行的覺醒邏輯將翻23之電源狀_觸輯進行協調, 二、種優先为散方式(於此每個目標的核心藉由使用其本身的 微碼開始轉變)以機成更清_狀態,以到達如可操作以這 ,做的程度’以及當禁止核心這樣做時,以—種中心協調的方式 完成°覺醒邏輯之實施例僅是用以選擇性地喚起無法喚起它們自 己核心之數個可能的實施例之說明與例示。 VI·延伸實施例及應用 雖然已說有-特定數目私106之實關,但可考料 虿“他數目核心106之其他實施例。舉例而言,雖然圖1〇、13以 201245948 及17所說明之微碼208被設計用以執行在八個核心之間的分配式 電源官理,但微碼208藉由包含檢查核心ι〇6之存在或缺席 (presence or absence)’而在一具有更少核心1〇6之系統中適當地 發生效用,例如相關於申請案序號61/426,47〇之段落所說明的, 前述申請案之申請曰為2010年12月22 :日,名稱為·•動態多核心 4處理 配置(Dynamic Multi-Core Microprocessor Configuration ) π,及其同時申請的非臨時申請案(CNTR2533),其揭露書係附屬 淤此。亦即,如果一核心106是缺席的,則微碼2〇8不會與缺席 核心106交換C-狀態資訊,並有效地假設缺席核心之^狀態是最 高的可能C-狀態(例如5之c·狀態因此,為了達到使製造能 力有效率的目的’核心1()6可能被製造成具有微碼遍,其被設計 3執行在八個核心間的分配式電源管理,縱使核心應可能包含 在具有更少核心106之系統中。再者,考慮到此系統包含八個以 上核。之貫猶彳’且於此所說明的微碼係被延伸以利用—種類似 於已經說日⑽那些方式_加核心進行通訊。經由前述的描 述,圖9及η之系統可被擴增以包含具有八個同伴之16個核心 106,而圖⑴讨及。之系統可被擴增以包含具有四個好友之π 個核心106,類似於圖9及U之系統在四個同伴之間同步化c-狀 w的方法,且圖16之系統可藉由具有16個夥伴(兩個晶片且每 個晶片具有八個核心、或四個晶片且每個晶片具有四個核心)而 被擴增以包含16個核心·,_ 4、1G、13以及17之方法之相 關特徵亦可獲得整合。 ΙΠ 201245948 獨立實現不同等級之電源狀態(例如,C-狀態、p_狀態、需 求的VID、需求的頻軸率,等)之賴之實施例亦被考量在内。 舉例而5 ’每個核心可為每個等級之電源狀態(例如,各別的應 用VID頻率比率、c_狀態以及p_狀態)而具有不同的應用電源 狀態,具有應駐不㈣定域之關,以及具有肋計算混合狀 態亚發現複合狀態(例如,C-狀態對所請求VID最大值的之最小 值)之不同極值。不同的階層式協調系統(例如,不同的域深度、 不同的域成員(d〇main c〇nstituencies)、不同的指定域管理者及/ 或不同的同屬性群組協調模型)可能為不同等級之電源狀態而建 立。此外,某些電源狀態可能只需要頂多與-域(例如晶片)上 之其他核’調,此域只包含微處理ϋ上之所有核w之子集。對 於這種電驗態’所考慮的階層式協⑽、統可以是只有節點地連 結5玄域、與在該域之内的核心進行協調、以及發現應用於該域或 在該域之内的複合電源狀態。 -般而言’實施利巾顯示的所有操作狀態係依-種漸進地上 升或下降’而且是域嚴格且線性順序之基礎,是,操作狀態 係排成層列(foed)且依順序沿著每鑛(㈣以上升或下降 式可訂^之其他實施例(數層_序獨立於其他層之實施例亦包 含在内)亦被本發明所考量1_言,—預定組之電源狀態可 不同的層級A.B ’ A.B.C ’等之複合形式敘述其繼,於此每一層 A、B、C係關於-不同的特徵或特徵之等級。舉例而言,一電源 狀態可能以C.P或P.C之複合形式敘述其特徵,於此p表示—種 Π2 201245948 AO^-狀態’而C表示-種Acpic•狀態。再者,受限制電源狀 態之等級可能姐合定魏·態之特定組成(例如a或b或⑺ 之紐所定義,而受關電源雜H級可由混合定義電源 h之另、.·且成之數值所定義。此外,在任何給定的受限制電源 狀態之層助,每-層對應於混合定義電源狀態之其中—個組成 之數值(例如C.P),除施加至此層之限制以外,對一既定核心而 言,另-種組成之數值(例如C.P中之P)可能不受限制、或受到 不同等級之限制。舉例而言,一個具有Cp之目標電源狀態之核 心可能受到關於其目標電源狀態之C及P部分之實施時各自的限 制及協調需求,於此P表示其p_狀態,而c表示其需求的c•狀態。 在複合電源狀態實施例中,對計算極值之一既定核心而言,任何 兩個電源狀態之-”極值”可能表示複合f‘源狀態之組成部分之極 值之一複合狀態、或複合電源狀態之少於所有組成部分之極值之 一複合狀態,與以別的方法選擇的或確定的數值(而對其他組成 部分而言)。 又,在一系統中之多重核心100執行分配式分散式電源管理 以明確地執行功率評價(p〇wer credit)功能性之實施例亦被考量 在内,如說明於美國申請案13/157,436 (CNTR.2517)中,申請曰 為2011年6月10曰,其全部於此併入作參考,但是此實施例使 用核心間通訊配線112、晶片間通訊配線118以及封裝體間通訊配 線1133,而非使用如CNTR.2517所說明的一共用的記憶體區域。 這種實施例之優點為其對於系統韌體(例如BIOS)及系統軟體是 113 201245948 =的’且並不需要依·_體或軟體以提供一共用的記憶體 =因為微處理器製造商可能未必具有控制_^ I佈此力,所以其是受歡迎的。 一 * 了彳*雜以外轉送其他值之同步邏輯實施例亦考 量在^。於一實施例中,相關於任何其他同時操作發現過程,- 时常式傳送可㈣地確認發現過歡—錄(其為發現過程之 一部分)。在另—實施财,同步常式傳送-數值,藉由此數值可 識別同步或尚未同步的核心。舉例而言,—種⑽心實施例可能 叙一 8位元值,於此每錄元代“核心處麵之—特定核心, =個位元表示如是否已湖步或是仍為該_舰過程之一 部分。同步常式亦可能傳送確認開始瞬間發現過程之核心之一數 值0 曰促進執彳了如之依料訪同步化魏触的額外實施例亦被 考里在冑例子中,每個核心儲存確認成員之位元遮蔽之同屬 性群組(它係為其之-部分)。舉例而言,在—種彻三個層級深 的階層式_構造之八核喊補巾,每健心儲存三個8位元&quot; 同屬^遮蔽、-”最接近,性遮蔽、—第二層闕性遮蔽以及 頂端層同屬性遮蔽’於此每個遮蔽之位元值確認屬於以遮蔽表 不之同屬性群組中的核心家族(如果有的話)。在另—例子中,每 個;L儲存地圖、一⑽e丨號碼或其之組合,由其可正確地及 唯一地決定核心、之節點階層,包含確認每個域管理者。在又另一 種例子巾,此核心儲存確認共用資源(例如,電_、、時脈源以 114 201245948 及快取),以及它們所屬且共用之特定核心或對應域之資訊。 又’雜歧明書之_主魏錢·態㈣,但吾人將 料上述階細⑽掀各輸贿_肋協調其他型 式之㈣與限制活動’而非只是電源狀態或電源相關的狀態資 訊。舉例㈣,在某些實施财,上述各種階層式協猶統係利 用與硬製在每個核心上之分散邏輯協調簡於動態發現,譬如在 CNTR.2533中之-多核心微處理器配置,例如如上所述。 此外,吾人應注意到除非有特別聲明,否則本發明並不需要 使用上述任何-鑛層式協⑽、統吨行預定的限.動。事實 上除非另有某種程度之特別規定,否則本發明適合於在核心間 的純粹對等協m統。然而,如本說明書可明顯看出,一種階層 式協調系統之使用可提供數個優點,尤其是在依賴旁路通訊時, 因為於此架構下’微處理器之旁路通訊線之構造並不允許一完全 相等的對等協調系統。 如可迠從上文觀察到,相較於例如上述包含集中化非核心硬 體協調邏輯(HCL)之Naveh續決錄,將鶴f理功能同等 分配在於此所說明的核心106間的分散實施例,好處是不需要額 外非核心邏輯。雖然非核心邏輯可被包含在一晶片1〇4裡,但於 所說明的實施例中’所需要的為實施分散分配式電源管理機制 疋·硬體及微碼係與多核心_每晶片(multi_c〇re_per_die)實施例中 之核心間通訊配線Π2、多晶片實施例中之晶片間通訊配線i丨8以 及多封裝體實施例中之封裝體間通訊配線1133在一起地、完全地 115 201245948 實體上及邏輯地在它們本身之核心1〇6之内。因為於此所說明之 執订分配在多重處理核心1〇6間的電源管理之分散實施例之結 果,核心106可能位於各別晶片或各別封裝體上。這潛在地降低 β曰片尺寸並改善良率,提供更多配置彈性,以及提供一高層級之 系統中核心數之可調(尺寸之)能力。 在又其他實施例中,核心1〇6在各種實施樣態方面與圖2之 代表實施例不同’並提供—種取代細加之高度平行的構造,例 如應用於-圖轉理單元(GPU)之構造,祕此職明的為各 種操作(例如電源狀態管理、核心配置發現、以及核心重新規劃) 所使用之協調系統亦可被應用。 雖然於此已說明本發明之各種實施例,但吾人應理解到已經 由舉例而雜制地提出它們。熟習相_電腦技藝者將日月白在不背 離本發明之範_之下,可作出各種在形式及細節方面的改變。舉 例而言,軟體可允許於此所說明之設備及方法之譬如功能、製造、 模擬試驗、模擬、說明及/或峨。這可經由使用一般程式設計語 言(例如C、C++) ’包含Verik)g HDL、VHDL等等之硬體記述語 吕(HDL) ’或其他可糊的程式來達成。這種軟體可被配置在任 何已知的電腦可用媒體中,例如半導體、磁碟或光碟(例如, CD-ROM ' DVD-ROM等)。於此所說明之設備及方法之實施例可 此包含在例如-微處理器核心之半導體智慧財產權核心(例如, 具體化在HDL中),並改變成在積體電路之產品中的硬體。此外, 於此所說明的設備及找可能具體化為硬體及軟體之組合。因 116 201245948 此本發明不應被任何—個於此所說明的例示實施例所限制,但 應該只域町申請翻制及如_魏料被衫。且體 f之,本發啊能在可能使用於_電腦之微處理H裝置之内被 貫現。最後,《本項技#者應明白他們可輕易地使用所揭露的 概念及具體的實施例作為用以設計雜改其他構造之基礎,用以 在不背離㈣町”翻麵解定林㈣ 本發明之相同目的。 F7G成 【圖式簡單說明】 ^圖1係為顯示—電腦系統之—個實施例之方塊圖,電腦系統 :丁刀配在-雙晶片四核心微處理器之多重處理核心之間之 式電源管理。 圖2係為詳細顯示圖1之代表的其中—個核心之方塊圖。 圖3係為顯魏行分配在多核心㈣期^域理核心之 式管理之—线之—電驗態管縣式之—個實施 史错由一核心之操作之流程圖。 圖4係為顯示整合 3之錢之複合電源狀態發現過程之 圖電源狀態同步常式之—個實施例之藉由—核心之操作之流程 波^5 tr—喚起與重新開始常式明應從—休眠狀態將 4醒之-觀-個細m叫作之流程圖。 圏6係為顯示-核心間中斷處理常式以因應接收一核心間中 117 201245948 斷之藉由一核心之操作之流程圖。 圖7係為顯示依據® 3至6之說明之—複合電源狀態發現過 程之操作之一例子之流程圖。 圖8係為顯示依據圖3至6之說明之—複合電源狀態發現過 程之操作之另一個例子之流程圖。 圖9係為顯示—電齡統之另—實施例之方塊圖,電腦系統 執行分配在i人核⑽處理^ (其在單一狀體上具有四個雙 核心晶片)之多重處理核心之間之分散式電源管理。 圖10係為顯示整合至圖9之系統之—複合魏狀態發現過程 之-電源狀態同步常式之-個實施例之料—核心、之操作之流程 圖。 圖11係為顯示一電腦系統之另一實施例之方塊圖,電腦系統 執行分配在一種八核心微處理器之多重處理核心之間之分散式電 源官理,八核心微處理器具有四個雙核心晶片,其使用圖10之電 源狀態同步常式而分配在兩個封裝體上。 圖12係為顯示一電腦系統之另一實施例之方塊圖,電腦系統 執行分配在一種八核心微處理器之多重處理核心之間的分散式電 源笞理,依據—較深的階層式協調系統,八核心微處理器像圖11 具有四個雙核心晶片,但其核心不像圖11而是彼此相互關連的。 圖13係為顯示整合至圖12之系統之一複合電源狀態發現過 程之一電源狀態同步常式之一個實施例之藉由一核心之操作之流 程圖。 118 201245948 圖14係顯示,電腦系統之另一實施例之方塊圖,電腦系統 執行分配在一種八核心微處理器之多重處理核心之間的分散式電 源管理,依據一較深的階層式協調系統,八核心微處理器像圖9 在單-封裝體上具有四個雙核心晶片,但其核心不像圖9而是彼 此相互關連的。 圖15係為顯示一電腦系統之另一實施例之方塊圖,電腦系統 執行分配在-種人核心微處理器(其在單—縣體上具有兩個四 核心晶片)之多重處理核心之間的分散式電源管理。 圖16係為顯示一電腦系統之又另一實施例之方塊圖,電腦系 統執行分配在一種八核心微處理器之多重處理核心之間的分散式 電源管理。 圖17係為顯示整合至圖16之系統之一複合電源狀態發現過 程之-電源狀態同步常式之—個實施例之藉由—核心之操作之流 程圖。 机 圖Μ係為顯示一電腦系統之又另一實施例之方塊圖,電腦系 、錢仃分配在—種雙核^、單-晶片微處理ϋ之核心之間的分散 式電源管理。 刀 έ 圖19係為顯示一電腦系統之又另一實施例之方塊圖,電腦系 订分配在具有兩個單核心晶片之一種雙核心微 之間的分料電丨騎理。 核 圖20係為顯示一電腦系統之又另一實施例之方 說執行分配右目+ 电腩示 仕一有兩個單核心、單一晶片封裝體之一雙核 處 119 201245948 理器之核心之間的分散式電源管理。 圖21係為顯不一電腦系統之又另一實施例之方塊圖,電腦系 ’’先執行刀配在種八核^微處理器之核^之間的分散式電源管 理,八核讀處理器具有兩個封裝體,其卜個具有三個雙核心 晶片,而其另一個具有單一雙核心晶片。 圖22係為顯示一電腦系統之又另一實施例之方塊圖,電腦系 統執行分配在一種八核心微處理器之核心之間的分散式電源管 理,八核心微處理器類似於圖21,但具有一較深的階層式協調系 統0 圖23係為顯示在一核心上實現之操作狀態同步邏輯之另一實 施例之流程圖,其支持一種域區別的(domain-differentiated )操作 狀態層次協調系統且對於不同的域深度是可計量的。 【主要元件符號說明】 P、P1-P8 :接腳 100、900、1100、1200、1400、15〇〇、1600 :電腦系統 102、902、1202、1402、1502 :多核心微處理器/封裝體 104 ·晶片 106 :核心 108 :接觸墊 112 :核心間通訊配線 114 :晶片組 120 201245948 116 :匯流排 118 :通訊配線 202 :指令快取 204 :指令譯碼器 206 :微序列器 207 :微碼記憶體 208 :微碼 212 :註冊別名表(RAT) 214 :保留站 216 :執行單元 218 :引退單元 222 :資料快取 224 :匯流排介面單元(BIU) 226 :鎖相迴路(PLL) 228 : BSP指示器 232 :管理者指示器Under the circumstance, the local core cannot unilaterally lead to the singularity of B. In such cases, Box Li represents the local core's request for its manager's core, requiring it to be implemented more fully, if possible. We will use the logical set proposed by _ _ _ 23 if the request is not with the manager as its own target status and with other potential influences of the state of the state (four), (four) the core will refuse this please The ruler will implement this request and reach the extent to which it corresponds to those states, the maximum value of the straightforward tree's maximum self-determination g Q_ (block 2340). According to the square 2346, the value of the BU (which can be equal to the B value of the original kernel) of the beginning of the core of the city is to request its own higher core (if any). In this way, if the application conditions are met, the sync_state logic 2300 will fully implement the final blend value B of the local core. In the second example, it is assumed that B is less than the maximum self-executable operational state 〇max of the local core (block 2340). Assuming that there is no higher interfering operational state other than the resources controlled by the local core, then in block 2340, the local core can fully follow B. However, if the higher interference operation state is in effect, the local core will not be able to unilaterally revoke the interference operation state. In this case, block 2346 represents the local core requesting one of its manager cores to require it to revoke an existing interfering operation state to a level that no longer interferes with the full implementation of B (ie, the local core's largest applicable domain) - Specific status MD). As will be appreciated, in accordance with the logical set proposed in Figure 23, the manager core will comply with the request to implement a state of MD that is no larger than and possibly less than the local core. We should note that block 2346 may or request 105 201245948 The administrator only implements B. But if B&lt;MD, this may allow the manager core to perform a more awake state than the local core needs to fully implement B. Therefore, it is preferred to use a detection value that is equal to the maximum value of the final mixed value B of the local core and the maximum applicable domain-specific state MD of the local core. Therefore, we will understand that sync-state 2302 supports a minimalist approach to both dormant and evoked states. Moving now to block 2348, sync_state logic 2300 passes a value back to the call or executes a procedure equal to the maximum of the core's final blend value 3 and the core's maximum applicable domain_specific state md. As illustrated by block 2346, we have noted that block 2348 may or may just return the value of b. But if Β &lt; Μ〇, then this may cause a awake core (block 2318) to perform a more awake L than is needed. Therefore, it is a better choice to return the final mixed value B of the core and the maximum applicable domain-specific state Md of the core. Furthermore, we will understand that in this manner sync-state 2302 supports a minimalist method for implementing both the sleep state and the evoked state. In another embodiment, one or more additional decision blocks are interposed between squares 2344 sk. * 2 and 6 to further set the condition of block 2346 for dependent synC-State and %. For example, under a suitable condition, if =, the process will continue to block 2346. Under another suitable condition, if only one of the interference states can be applied to the local core, then the flow will continue to block 2346. If both of the applied conditions are applied, then the flow will continue to block 2346. In this way, sync—she fiber 106 1064545945 supports a method for implementing the brewing state. In detail, we should observe that this alternative embodiment assumes that the local core can be applied to an embodiment of the local core non-energy side-interference operation state, as depicted in Figure 23. A lesser implementation method is preferred. . It will also be understood that in Figure 23, when it is necessary to implement a deeper operating state ^ or its _ type), the composite operating state is _ domain dependent - the species is as low as the most (or close to the most The order of the same attribute group that is far away is to search for the core level of the highest level domain (which contains its nested domain) in a progressively and more frequently-seeking order, and does not require financial correction. Affected by the target operating state. In addition, when the s Wei line - the _ operation state, the composite operation status _ process only needs to continue to search for a higher manager. Moreover, in the alternative embodiment described above, this extension of the search is to the state of the interfering operation currently implemented (if needed). Therefore, in applying the earlier example to Figure 23, the target 2 or 3 restricted power state will only trigger the core process in the application chip. (4) The target will only be the _ (four) composite power state discovery process. Figure 23 can further describe its features in a domain-specific (except core-specific) manner. Continuing the above illustrated example, - (9) can have an adaptation of 2 and 3 - a look-a-source state. For example, if the 电源(5)f face is found to be difficult through a local or external singular composite electric state, it is found that the composite power state of the crystal moon itself is only 1 because i is not a versatile domain. State, 107 201245948 It is expected to implement it. If the w management face is found to be connected to the core, the green state is 5 (or the composite state of the wafer and the state of the node = = = the state of the mixed state is equal to 5) as an alternative example and if the S is the core of the official Does not have any higher manager's credentials, then (assuming it does not do so) 曰h total strength can be the heart of the core # ^理者 core will be the real side of the power of 3 /, the system is 3 (the largest field of the chip • The minimum value of the specific power state) phantom (the composite power state of the wafer or its mixed state). Furthermore, I I noticed that in this solution, "manager nuclear" Jing _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ) What is the actual or partial composite power state (for example, 4 or 5). Continue with the above _ 'The controller found that the wafer composite power state or its mixed state is 5, "the manager will start with the system - The composite power state discovery process, which will need to include the search for the next higher level domain (eg, the package or the entire processor), which is side-by-side (9) intermediate implementation of the manager (if any) In addition to the electrical state of the wafer. This is because 5 is greater than 3 ("Maximum Range_Specific Power State"), so the implementation of a higher restricted power state needs to depend on the power state applied to one or more higher-level domains. In addition, down-her High-level domain-specific—highly enforced _ power state implementations may only be initiated and/or implemented by the domain administrator (eg, multi-package processor package manager or single-packet processing) It is worth reminding that the chip manager may also maintain the relevant package manager or 108 201245948 BSP certificate. Therefore, in the above example t, at some point in the discovery of the financial, the chip manager core will Exchange with the peers the composite electrical status (or its age). Under certain conditions, this discovery process will pass the higher-level (eg, package)-at least partial composite power state (which is less than 2) Going back to the chip manager core. Again, this will not cause the power state of 3 to be revoked, which is the chip manager core that has been implemented for the chip. Under other conditions, this discovery process will be on the package or micro Shame-composite power state (eg 4 or more), which corresponds to 4 or more restricted power states. If it is ' then the domain manager (eg package manager) will implement - compare A highly restricted power state, which is the minimum of the composite power state (age 4 or n5) of the higher level domain and the maximum restricted power state (here 5) applied to the higher level domain. The discovery of the application _ is testing a more advanced restricted power state' then this conditional domain_specific power_state implementation process will extend to a more advanced domain level (if any). As mentioned above, Figure 23 shows a hierarchical domain that is operable to incorporate domain-dependent embedding power states and associated thresholds. _Specific Restricted Powers, Responsibility, and Threads, which are appropriate for side cores and groups. Fine-timed domain-specific decentralized approach 〇 人 注意到 图 图 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到 注意到Logic. However, we will understand that some power state embodiments contain several power states, and in the absence of a chipset or other core, the "power" can not be powered from this power supply. The state is evoked. For example, in the above c_state structure, the state of 2 or higher may be removed from the tree, and the command may be transmitted through the system bus. , _ Wei Wei - clearer _ consumption. Power supply or clock source optional miscellaneous from the - the nuclear film is removed its recording processor configuration is also considered. ® 5 Gangjue series - the real customs to take care of these flights , by waking up all core switches STPCLK release settings. However, more selective embodiments of Weiluohong can be considered. In the case of an example, consider the (4) awakening logic implemented by the software (such as the operating system or the BIOS), where (4) the software will be released first - evoke or rush to ask for pleasure, and if within the time interval of the ship Not receiving - responding or if not complying, then will release the caller awakening request to the higher manager and chipset (possibly) as needed, until an expected response or detection is received Until appropriate compliance. This awakening logic executed by the software system coordinates the power-mode of the flip-flop 23, and the priority is the dispersion mode (where the core of each target starts to transform by using its own microcode). To achieve a clearer state, to reach the extent that it is operational, to do this, and when the core is prohibited from doing so, the embodiment of the awakening logic is done in a centrally coordinated manner only to selectively evoke the inability to evoke Description and illustration of several possible embodiments of their own core. VI. Extended Embodiments and Applications Although it has been said that there is a specific number of private 106s, it can be considered as "other embodiments of the number of cores 106. For example, although Figures 1 and 13 are 201245948 and 17 The illustrated microcode 208 is designed to perform distributed power management between the eight cores, but the microcode 208 has a more inclusive by including the presence or absence of the check core ι 6 Appropriate utility in systems with fewer cores, such as those described in paragraphs 61/426, 47 of the application, the application for the aforementioned application is December 22, 2010, the name is • Dynamic Multi-Core Microprocessor Configuration π, and its non-provisional application (CNTR2533), which is applied at the same time, reveals that the book is affiliated with it. That is, if a core 106 is absent, then micro Code 2〇8 does not exchange C-state information with absent core 106, and effectively assumes that the state of the absent core is the highest possible C-state (eg, c. state of 5, therefore, in order to achieve manufacturing efficiency) 'Core 1 () 6 can Can be manufactured with microcode passes, which is designed to perform distributed power management between the eight cores, even though the core should probably be included in a system with fewer cores 106. Again, considering that the system contains eight The above nucleus. The microcode system described here is extended to communicate with those methods similar to those already mentioned (10). Through the foregoing description, the systems of Figures 9 and η can be Amplification to include 16 cores 106 with eight companions, and the system of Figure (1) can be augmented to include π cores 106 with four friends, similar to the systems of Figures 9 and U in four companions A method of synchronizing c-like w between, and the system of Figure 16 can be by having 16 partners (two wafers and each wafer having eight cores, or four wafers and each wafer having four cores) The features associated with the method of augmenting 16 cores, _ 4, 1G, 13 and 17 can also be integrated. ΙΠ 201245948 Independently implement different levels of power state (eg, C-state, p_state, demand) VID, frequency axis rate of demand, etc.) Embodiments are also contemplated. For example, 5' each core may have different application power states for each level of power state (eg, respective application VID frequency ratios, c_states, and p_states). , having a fixed (4) localization, and a rib-calculated mixed state sub-discovery composite state (eg, the minimum value of the C-state to the minimum of the requested VID maximum). Different hierarchical coordination systems ( For example, different domain depths, different domain members (d〇main c〇nstituencies), different designated domain managers, and/or different homogeneous group coordination models may be established for different levels of power state. In addition, some power states may only require more than one of the other cores on the -domain (e. g., wafer), which contains only a subset of all cores w on the microprocessor. The hierarchical association (10) considered for this type of electromorphism may be that only nodes are connected to the 5 domain, coordinated with the core within the domain, and found to be applied to or within the domain. Composite power state. - Generally speaking, 'all operational states indicated by the implementation of the scarf are progressively ascending or descending' and are the basis of the strict and linear sequence of the domain, that is, the operational states are arranged in a foed and in sequence Other embodiments of the mine ((4) which can be ordered by ascending or descending type (the embodiment in which the number of layers is independent of the other layers) are also considered by the present invention. The power state of the predetermined group can be The composite form of different levels AB ' ABC ', etc. describes the succession, where each layer A, B, C is related to a different feature or feature level. For example, a power state may be a composite form of CP or PC. Describe the characteristics, where p denotes - Π 2 201245948 AO ^ - state ' and C denotes - Accic• state. Furthermore, the level of the restricted power state may be the specific composition of the Wei state (eg a or b) Or (7) is defined by the link, and the off-source H-class can be defined by the value of the hybrid definition power supply h. In addition, in any given restricted power state, each layer corresponds. In the mixed definition of the power state - one of the components Values (such as CP), except for the limits imposed on this layer, for a given core, the value of another component (such as P in CP) may be unrestricted or subject to different levels. For example The core of a target power state with Cp may be subject to the respective restrictions and coordination requirements for the implementation of the C and P parts of its target power state, where P represents its p_ state and c represents the c• state of its demand. In a composite power state embodiment, for a given core of a calculated extreme value, the "extreme value" of any two power states may represent a composite state of one of the extreme values of the components of the composite f' source state, or The composite power state is less than one of the extreme values of all components, and is selected or determined by other methods (and for other components). Also, multiple cores 100 perform allocation in a system. Embodiments of decentralized power management to explicitly perform the power evaluation (p〇wer credit) functionality are also contemplated, as illustrated in U.S. Application Serial No. 13/157,436 (CNTR.2517), the application of It is incorporated herein by reference in its entirety, the entire disclosure of which is incorporated herein by reference in its entirety in its entirety in the in the in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in A shared memory area. The advantage of this embodiment is that it is for the system firmware (such as BIOS) and the system software is 113 201245948 = and does not require the body or software to provide a shared memory. = Because the microprocessor manufacturer may not necessarily have the control to make this force, it is popular. A synchronous logic embodiment that transfers other values in addition to the * is also considered. In one embodiment, with respect to any other simultaneous operation discovery process, the time-of-day transmission may (4) confirm that the discovery has been recorded (which is part of the discovery process). In another implementation, the synchronous routine transmits a value by which the synchronized or unsynchronized core can be identified. For example, the (10) heart embodiment may describe an 8-bit value, where each record element "core-specific core, = bit indicates whether the lake step or the ship is still One part of the process. The synchronization routine may also transmit a value that confirms the beginning of the instant discovery process. 0 曰 Promote the implementation of the additional embodiment of the synchronization of the Wei-Touch, which is also used in the example, each The core store confirms that the member's bit masks the same attribute group (it is part of it). For example, in the three levels of deep hierarchical _ construction of the eight-core shouting towel, every heart Store three octets &quot; the same as ^ occlusion, - "closest, sexual occlusion, - the second layer of smear occlusion and the top layer with the same attribute occlusion" where the value of each occlusion bit is confirmed to belong to the occlusion table The core family (if any) in the same attribute group. In another example, each; L stores a map, a (10) e丨 number, or a combination thereof, which can correctly and uniquely determine the core, the node level, including the confirmation of each domain manager. In yet another example, the core store identifies shared resources (e.g., power__, clock source with 114 201245948 and cache), and information about the particular core or corresponding domain to which they belong and share. In addition, we are expected to use the above-mentioned order (10), each of the bribes, and other types of (4) and restricted activities, rather than just status information related to power status or power supply. For example (4), in some implementations, the above-mentioned various hierarchical cooperatives use the decentralized logic on each core to coordinate with the dynamic discovery, such as in CNTR.2533 - multi-core microprocessor configuration, For example, as described above. In addition, it should be noted that the invention does not require the use of any of the above-mentioned methods of mineralization (10) and the predetermined limits of movements unless specifically stated otherwise. In fact, the invention is suitable for purely peer-to-peer coordination between cores unless there is some other degree of special provision. However, as is apparent from this description, the use of a hierarchical coordination system can provide several advantages, especially when relying on bypass communication, because the structure of the microprocessor bypass communication line is not Allow a fully equal peer-to-peer coordination system. As can be observed from the above, compared to, for example, the above-mentioned Naveh continuation including centralized non-core hardware coordination logic (HCL), the crane function is equally distributed among the distributed implementations of the core 106 described herein. For example, the benefit is that no additional non-core logic is required. Although non-core logic can be included in a wafer 1 〇 4, what is required in the illustrated embodiment is the implementation of a decentralized distributed power management mechanism 硬 hardware and microcode system and multi-core _ per wafer ( Multi_c〇re_per_die) inter-core communication wiring in the embodiment, inter-chip communication wiring i丨8 in the multi-wafer embodiment, and inter-package communication wiring 1133 in the multi-package embodiment together, completely 115 201245948 entity Up and logically within their own core 1〇6. Because of the results of the distributed embodiments of power management assigned between the multiple processing cores 1-6, the cores 106 may be located on individual wafers or individual packages. This potentially reduces the size of the beta wafer and improves yield, provides more configuration flexibility, and provides the ability to adjust the size of the core in a high-level system. In still other embodiments, the cores 1-6 differ from the representative embodiment of FIG. 2 in various implementations and provide a highly parallel configuration that is superimposed, such as applied to a GPU. The coordination system used for various operations (such as power state management, core configuration discovery, and core re-planning) can also be applied. Although various embodiments of the invention have been described herein, it should be understood that Familiarity _ Computer Technician will make various changes in form and detail without departing from the scope of the present invention. For example, the software may permit functions, manufacturing, simulation tests, simulations, instructions, and/or defects of the devices and methods described herein. This can be achieved by using a general programming language (e.g., C, C++)&apos; containing Verik)g HDL, VHDL, etc., hardware descriptions (HDL) or other cloakable programs. Such software can be deployed in any known computer usable medium, such as a semiconductor, disk or optical disc (e.g., CD-ROM 'DVD-ROM, etc.). Embodiments of the apparatus and methods described herein may be included, for example, in the semiconductor intellectual property core of the microprocessor core (e.g., embodied in the HDL) and changed to hardware in the product of the integrated circuit. In addition, the devices and devices described herein may be embodied as a combination of hardware and software. 116 201245948 The present invention should not be limited by any of the illustrative embodiments described herein, but should only be applied to the domain and the quilt. And the body f, this hair can be used in the micro-processing H device that may be used in the computer. Finally, the applicants should understand that they can easily use the disclosed concepts and specific examples as the basis for designing other structures of miscellaneous changes, so as to solve the problem without turning away from the (four) towns. The same purpose. F7G into a simple description of the figure ^ Figure 1 is a block diagram of an embodiment of a computer system, computer system: Ding knife is placed between the multi-processing core of the - dual-chip quad-core microprocessor Figure 2 is a block diagram showing the core of the figure represented by Figure 1. Figure 3 is the management of the core of the multi-core (four) period. Figure 4 is a flow chart showing the operation of the composite power state discovery process of the integrated power of 3, which is the embodiment of the power state synchronization routine. - The core operation flow wave ^5 tr - evoke and restart the normal formula should be from - sleep state will wake up - view - a fine m called the flow chart. 圏 6 is the display - core interrupt processing routine In response to receiving a core room 117 2012459 Figure 48 is a flow chart showing an example of the operation of the composite power state discovery process according to the description of ® 3 to 6. Figure 8 is a diagram showing the operation according to Figure 3 6 is a flow chart of another example of the operation of the composite power state discovery process. FIG. 9 is a block diagram showing another embodiment of the electronic system, and the computer system performs the allocation in the i-core (10) processing ^ ( Decentralized power management between multiple processing cores with four dual-core chips on a single body. Figure 10 shows the power state synchronization routine for the composite Wei state discovery process integrated into the system of Figure 9. Figure 1 is a block diagram showing another embodiment of a computer system that is distributed between multiple processing cores of an eight core microprocessor. The distributed power supply principle, the eight core microprocessor has four dual core chips, which are distributed on the two packages using the power state synchronization routine of Figure 10. Figure 12 is another display of a computer system In the block diagram of the embodiment, the computer system performs distributed power processing disposed between multiple processing cores of an eight core microprocessor, according to a deep hierarchical coordination system, the eight core microprocessor has four as shown in FIG. Dual core chips, but the cores are not related to each other as in Figure 11. Figure 13 is a diagram showing one embodiment of a power state synchronization routine for one of the composite power state discovery processes integrated into the system of Figure 12. Flowchart of operation by a core. 118 201245948 FIG. 14 is a block diagram showing another embodiment of a computer system that performs distributed power management distributed between multiple processing cores of an eight-core microprocessor. According to a deep hierarchical coordination system, the eight core microprocessor has four dual core chips on a single-package like Figure 9, but the cores are not related to each other like Figure 9. Figure 15 is a block diagram showing another embodiment of a computer system that executes between multiple processing cores of a human core microprocessor (which has two quad core chips on a single-county) Decentralized power management. Figure 16 is a block diagram showing yet another embodiment of a computer system that performs distributed power management distributed between multiple processing cores of an eight core microprocessor. Figure 17 is a flow diagram showing the operation of the core-integrated power state synchronization routine of one of the systems of Figure 16 - a core state operation. The diagram is a block diagram showing another embodiment of a computer system in which the computer system and the money are distributed among the cores of the dual-core and single-chip microprocessors. Knife Figure 19 is a block diagram showing yet another embodiment of a computer system for dispensing a split electric raft between two dual core micro-chips having two single core chips. The core diagram 20 is shown as another embodiment of a computer system. The implementation of the distribution of the right eye + the electrical display shows that there is two single core, one of the single chip packages, the core of the dual core 119 201245948 Decentralized power management. Figure 21 is a block diagram showing still another embodiment of a computer system. The computer is a distributed power management between the core of the eight-core microprocessor and the eight-core read processing. The device has two packages, one with three dual core wafers and the other with a single dual core wafer. Figure 22 is a block diagram showing yet another embodiment of a computer system that performs distributed power management distributed between the cores of an eight core microprocessor, the eight core microprocessor being similar to Figure 21, but Having a deep hierarchical coordination system 0 FIG. 23 is a flow diagram showing another embodiment of operational state synchronization logic implemented on a core that supports a domain-differentiated operational state hierarchy coordination system And it is measurable for different domain depths. [Description of main component symbols] P, P1-P8: pins 100, 900, 1100, 1200, 1400, 15〇〇, 1600: computer systems 102, 902, 1202, 1402, 1502: multi-core microprocessor/package 104. Wafer 106: Core 108: Contact Pad 112: Intercore Communication Wiring 114: Chipset 120 201245948 116: Busbar 118: Communication Wiring 202: Instruction Cache 204: Command Decoder 206: Micro Sequencer 207: Microcode Memory 208: Microcode 212: Registration Alias Table (RAT) 214: Reservation Station 216: Execution Unit 218: Retirement Unit 222: Data Cache 224: Bus Interface Unit (BIU) 226: Phase Locked Loop (PLL) 228: BSP indicator 232: manager indicator

234 ' 236 : CSR 238 :特別模組暫存器(MSR) 242 :核心時脈信號 1102 :四核心德:處理器 1133 :封裝體間通訊配線 1201 :第二封裝體 121 201245948 1504 :晶片 1602 :多核心微處理器 1802、1902、2002 :雙核心微處理器 2202 :八核心處理器 2300 :邏輯 2302 : sync—state 122234 ' 236 : CSR 238 : Special Module Register ( MSR ) 242 : Core Clock Signal 1102 : Four Cores : Processor 1133 : Inter-Package Communication Wiring 1201 : Second Package 121 201245948 1504 : Wafer 1602 : Multi-core microprocessors 1802, 1902, 2002: dual core microprocessor 2202: eight core processor 2300: logic 2302: sync-state 122

Claims (1)

201245948 七、申請專利範圍: 1、一種多核心處理器,包含: 多個啟動的實體處理核心; 〇共用,其巾該共用資 利用其能$運作之電 -可配置的資源,由兩偏上的該等核 源之組態影響共享該資源之該核心 源、速度或效率,· 對每個核心而言,内部核心電源狀態管理邏輯 二树與在該核心之間被實現之-種分散式 狀t發現麵,而錢針式_心邏輯之協助:’、 其中’該畴核心電源管理顧麵崎個核心中. 其中,如果該核心為了設定該共用資源的組態之目的而被 為-官理者核心,且該複合目標電源狀態經由該分 間電源狀態發現過程而被發現,職電源管理邏= 設定該核心的組態以驅使設定該共用資源的組態之人 目標電源狀態之實現; 设5 其中’對該共用資源而言’該複合目標電源狀態係為—種最節 能型的電源狀態,其將不會干涉共享該資源之每個核心之^ 何對應的目標電源狀態。 如申凊專利範圍第1項所述之多核心處理器,更包含多條旁路 通訊配線’其不連接該些核心至組之—系統匯流 排’並在雜核妓間,且其衫個分散式核源狀態ς 現過程係透過該些旁路通魅線,而攜帶有找些核心間被交 123 201245948 換之多個電源狀態值。 士申°月專利範圍第1項所述之多核心處理器,其中. 該些目標電源狀態係為C-狀態; 該共用資源係由所有核心所共用競接至—晶片 匯流排;以及 ’ 系统 該管理者核心獲得與該晶片組協調之獨占權,以實行影 流排操作之一 C-狀態。 θ ^匯 4 、如申請專利範圍第1項所述之多核心處理器,其中: 該共用貪源係由該些核心之至少某些所共用之—電壓源; 目標電源狀態係以1求的·改變 共子蝴社㈣心之—_準的 5 如申請專利範圍第1項所述之多核心處理器,宜中. 些核心之至少某些所共用之,; 目=源狀編1求的時脈比率信號表示;以及 该官理者核心獨占多個時脈比率需求信號,其被送至 :導致藉由該時脈源提供給共享該;心:: 脈頻率的一改變。 —h U之時 6、如申請專利娜項所述之多核心處理器,並中. 共:f之只有單--個係為了驅使該複合目 電源狀紅貫現的目的而被指定為該管理者核心;以及 124 201245948 該些核心包含可程式化邏輯,其經 能 ^ 〜,先軟體允許其被設定組 =匕用:乂為了設定該共用_'_之目的增加或移除—管理 在孑日疋0 7 如申請專利範圍第6項所述之多核心處理器,其㈣内⑽心 :源官理邏論爾料物_雜,糾定 示為管理者。 秘叙實_目的而被標 8、 如申請專利範圍第w所述之多核心處理器 :態係多個預定電源狀態之其中-個,該電源二管= 仰 貝現其中该多個預定電源狀態包含於 二二之内部影響—未共用資源之至少—不受限制的電源狀 =母娜㈣嘛,相輪獅域用以在 調的情況下實施—本地核心目標電源狀態,如果 八係為-不魏制的電源狀態的話。 9、 :=Γ範圍第1項所述之多核心處理器,其中該電源狀態 ==轉_频㈣細錢如心間電源狀 ^以因應接收轉變成為將設定該共用資源的組態之 一目標電源狀態之—需求。 10、 ㈣請專利朗^項所述之多核心處 =理:輯設定每個核心的組態秦種外部= 式核心間電源狀態發現_。 申月專利|&amp;圍第1項所述之多核心處理器,其中-分散式 125 201245948 心間電源狀態發現微 電源狀癌之過程而在 核心間m狀態魏過程包含相同的核 碼之多侧錢實例,其參與交換多個 各該核心被執行。 12、如㈣專利範圍第11項所述之多核心處理n,射一既定核 心與f絲的核叫電雜祕現微碼之每個實例與另-個 核〜乂換-電源狀態;且其中該本地實例按照下述值之至少 兩個計算-混合電源狀態:該本地核心目標電源狀態,如果 有的話’·由-實施過程所接收之—探測電源狀態數值;以及 由該核心間電源狀態發現微碼之—從屬實麵實例傳回之一 電源狀態值。 13如申明專利範圍第n項所述之多核心處理器,其中該分散式 核心間電職態發現触包含浦㈣電源 ^财例,其在㈣-缸相聽核心之_式路彳== “用:貝源之其他核^上,遞歸地實施該核心間電源狀態發現 微碼之從屬實例且與其同步。 14、-種供-多核心處理㈣之管理電源狀態之分散方法,該多 核心處理器具有多倾動之實體核㈣及由該些核心之至少 某些所共用之一資源,該方法包含: —核心接收影響在本身及至少一其他核心之間所共用的一可 配置的資源之-本地如目標電源狀態,其愧本地核心 目標電源狀態定義將f彡響共享該資狀該些核^利用其能 夠運作之電源、速度或效率之該資源之一組態; 126 201245948 、&gt;與核心間電被態發現過程,其包含不透過任何集 十式非核,輯而與共享該資源之至少_其他核心之電= 狀悲之—交換;以及 如=核心係為了設㈣共用資源的組態之目的而被指定為 一官理者核心’且該複合目標電源狀態係經由該分散式核心 間電源狀树現過飾被發現,則雜㈣了設定該共用資 源的組相目的驅使—複合目標電源狀態之實現; 其▲中對妨㈣源^言,該複合目標電源狀祕為—種最節 月b里的電源狀態,其將不會干涉共享該資源之每個核心之任 何對應的目標電源狀態。 L·如申1利範圍第14項所述之方法,更包含經由在該些核心 之門的夕條方路通舰線參與該核心間電源狀態發現過程, 其中遠些旁路通訊配線係與賴些核心連接至m且之一 系,匯流排不同’且其中,在該些核心之間交換之多個電源 狀恶係透過該些旁路通訊配線而交換。 16、 如申請專利範_ Η項所述之方法,其中: 該些目標電源狀態係為C-狀態; &quot;、用貝源係、由所有核心、所共用並連接至H组之一系統 匯流排;以及 »亥g理者核〜獲付與該晶片組協調之獨占權以實行影響該匯 U非之操作之一 C-狀態。 17、 如中請專利範圍第14項所述之方法,其中: 127 201245948 該共用I原係由該些核心之至少某些所共用之—電壓源; 一目標電馳態細-需求的電變絲錢表示;以及 該管理者核心齡多個電壓改變需求信號,其被送至該電麼 源,其要求被提供至共享該·源之該些核心之一麵位 準的一改變。 18、如申請專利範圍第14項所述之方法,其中: 該共用資源係為由該些核心之至少某些所共用之—時脈源,· 一目標電源狀態係以-需求㈣脈比率信號表示;以及 該管理者核心獨占多辦脈轉絲錢,其被送至該時脈 源’其將導致藉由該時脈源提供給共享該時脈源之該些核 心之時脈頻率的一改變。 19、如申請專利範圍第Η項所述之方法,更包含: 經由參與由另-個核心所實施的—核心間電源狀態發現過 私,=官理者核心發現並驅使用以設定該共用資源的組態 之一複合目標電源狀態之實現。 夕/處理益之實體核心之一電腦可讀取的儲存媒體 中被編碼之_ ’該m包含㈣執行下述碼. 接收設定由兩個以上的該些核心所共用的一可配置資源的組 而東其中&lt;&gt; 亥共用資源之多個組態影響共享該資源 之該些核心利用其能夠運作之該電源、速度或效率; 參與在該些核心之·實現之1分散式核心間電源狀態發 現過程,而無須集中式非核心邏輯之協助;以及 128 201245948 如果°玄核’為了設定該共用資源的組態之目的而被指定為 一官理者核心,且該複合目標電源狀態係經由該分散式核 心間電源狀態發現過程而被發現,舰定該核^的組態以 為了叹疋》亥共用資源的組態驅使—複合目標電源狀態之實 現, 其:’對於該共时源而言,該複合目標電源狀態係為—種 取即能型的電源狀態,其將不會干涉共享該資源之每個核 心之任何對應的目標電源狀態。 21、 一種多核心處理器,包含: 多個貫體處理核心;以及 核心間狀態發赌碼,在各該_心巾啟_如,用以經 由不透過任何集中式非核心邏輯、而從其他核心接收或傳 达至其他核心的健來參與分散式核心間狀態發現過程。 22、 如中請專利範圍第21項所述之多核心處理器,其中: 。玄核心間狀態發現微碼,經由獨立於將該多核心、處理器連接 至-晶片組的-系統匯流排之多條旁路通訊配線來與其他 核心交換信號;以及 該核心間狀態發現微碼,無任何集中式非核心邏輯的協助下 判斷-可用的狀態值,其係—功能,至少是另一核心的一 狀態。 2:3、如中請專利範圍第21項所述之多核心處理器,其中: 該核心間狀態發現微瑪包括同步邏輯,提供至每個核心其 129 201245948 具有的同步貫例為了一核心間狀態發現過程之多個目的係 可操作的以在多核心上實施;以及 其中每個本地實_可__在其健心上實施該同步邏 輯的多個新實例,及響應實施於該本地實例的另一核心上 該同步邏輯的任何先前實例。 汝申》月專利範圍第23項所述之多核心處理器,其中: 每個核心具有一目標操作狀態; 。亥處理器包含一領域,其包括該微處理器的核心的至少其中 之—· I&quot;* , 處理器提供-資源至該領域,其資源係由該領域之該等核 心所共用; 该同步邏輯係組態成用以發現是否該領域係準備於實現一受 限電源節能操作狀態供該資源將限制共享該資源之該些核 心利用其能夠運作之電源、速度或效率;以及 其中該領域係準備於實職受限電源節能操作狀態若且為若 在該領域共享該資源的每個啟動核心具有至少限制性的作 為該受限操作狀態的一目標操作狀態。 25、 如申凊專利範圍第24項所述之多核心處理器,其中: °亥共旱資源係連接至一晶片組之一系統匯流排; D亥領域包含該多核心處理器的全部的啟動核心;以及 該受限操作狀態係一 C-狀態,其係禁能該系統匯流拆之一匯 流排時脈。 130 201245948 26、 如申請專利範圍第24項所述之多核心處理器,其中: 該共享資源係在該微處理器的一客 . h心晶片上的一鎖相迴 路; 該領域包含全部的啟動核心,其時脈信號由該鎖相迴路供 應;以及 〃 該受限操作狀H係可共享該鎖相迴路㈣等核心所使用的一 低於最大效能頻率比。 27、 如申請專利範圍第24項所述之多核心處理器,其中: 該共享資源係一電壓資源; 該領域包含全部並限於共⑽電壓#_該微處理器的啟動 核心;以及 該受限操作狀態係可共享該餅資源的料核心所使用的一 低於最大效能電壓位準。 28、 如申請專利範圍第24項所述之多核心處理器,其中: 同步邏輯之每個實例係組態為’除非由—终止條件早先地終 止,用以遞歸地在其他核心上實施該同步邏輯之多個實 例,直到該同步邏輯之同步實例已經實施在該處理器的一 可用的領域的全部核心;以及 其中該同步邏輯係組態為隨一終止條件用以停止在其他尚未 同步核心上同步邏輯的實例的實施,如果其發現一核心具 有的一目標操作狀態是較低限制性於該受限電源節能操作 狀態; 131 201245948 其中该同步邏輯係㈣為協調—最低限度足賊目的其他核 心用以發現是賴可㈣領域係準備於實現_受限電源節 能操作狀態。 29、如”侧顧苐23項所述之多核心處理器,其中: 每個核心具有一目標操作狀態; 4處理器包含’域,其包括該微處理器的核心的至少其尹 之二; 。亥處理器提供f源至該領域,其資源係由該領域之該等核 心所共用; 該同步邏輯係組態成用以: 發現疋否該領域共享該資源的其中—個該啟動核心、具有一目 標操作狀態較低限概於—目前實現電源節能猶狀態; H如果其係授細侧其魏,以撤銷對該資源 的一電源節能操作狀態,若該同步邏輯已經發現該領域的 一啟動核心、具有-目標操作狀態較低限制性於—目前實現 電源節能操作狀態。 3〇、”請專利範圍第23項所述之多核心處理器,其中該同步遇 2的每個實例係組態成根據在一階層式方式組織核心賴 “階層式協調系統用以在該多糾處理器的其他核心上, 施該同步邏輯的從屬實例。 31、如申請專利範圍第23項所述之多核心處理器,其中該階^ 協調系統係根據在該等領域内該等核心所共享的資源將該$ 132 201245948 核心聚集至該等領域,其中對每個領域而言,為了該等資源 的一協5周組態的目的,一個單一核心係被指定為該域之管理 者0 32、如申請專利範圍第23項所述之多核心處理器,其中: §玄階層式協調系統係將該等核心聚集至多個領域層級,至少 包含: 一最高地位的首要層級領域,具有全部的該等核心;以及 二或二個以上對等次一地位的第二層級領域,最緊接於該最 尚地位,其係該首要層級領域的組成者並成巢於内,每個 第二層級領域群組分別包括該等核心的獨伯副群組; 對每個多核心領域雜,—解—核㈣觀旨定為該領域的 一管理者; 該最低層級多核心領域料的每個多核^領域係定義一同屬 !生群組’其係由最緊隨町地位的組成者領域的管理者核 心所組成; 〆 &gt;最低層、.及夕核心領域定義一同屬性群組,其係由其全部 的核心所組成,· 每個核心屬於至少一同屬性群組;以及 該^邏輯的每财地實_受限於賴时轉的新實例 &amp;至非屬於—本地核心同屬性群組的多個核心。 申明專梅_ 23項所叙乡私處 處理器的多個椤心的豆由 一 亥夕核心 的其令一個係指定為對該階層式協調系統 133 201245948 的每個多核心領域的一管理者。 34如:了專利貌圍第23項所述之多核心處理器,其中每個核心 係組4為其分散式核心間狀紐職碼來發現是否 該多核心處理器的其他核心為禁能。 35如申:專利範圍第23項所述之多核心處理器,其中每個核心 係組癌為帛崎祕分散^核㈣狀紐職碼來發現該多 核心處理器具有多少個啟動核心。 36如申μ專利㈣㈣項所述之多核心處理器,其中每個核心 係組怨為用以制其分散式核心間狀態發現_來發現該多 核心處理器的—階層式協調系統。 37種多核心處理器的發現狀態的分散式微碼實現方法,該多 核心處理器包括多個實體處理核心,該方法包括: 至少二核心經由不透過任何集中式非核心邏輯、而由核心交 換的信號來倾—分散式核^間狀態發現過程。 % '如申請專利範圍第37項所述之方法,其中該方法施行於發現 下述狀態的至少其中之一: 對垓處理器的一複合電源狀態; 對該處理H的-領域的—複合電源狀態,該職包括多個核 〜的一群組,其係根據多個組態的其中之一為了電源節能 的目的而共旱可操作地被組態的一可組態資源; 另一核心的一目標電源狀態; 共子可組悲資源的多個核心的一群組任一者的一最低限制 134 201245948 性目標電源狀態; 一最高限雛目標電源狀態,其係由不妨礙其他核心的對應 目標操作狀態的一核心所實現; 一核心是否啟動或禁能; 該多核心處理器具有多少個核心為啟動; 共享資源及多個核心的領域的一識別,在其中各樣的可組態 資源係被共享; 5亥等核心的一階層式協調系統,用於經營共享資源; 在多核心處理器内多條旁路通訊配線以協調核心的—利用 率,其旁路通訊配線係獨立於將該多核心處理器連接至一 晶片組的一系統匯流排;以及 該等核心的-階層式協調系統,施行於旁路通訊配線上的核 心間通訊,旁路通訊配線係獨立於將該多核心處理器連接 至一晶片組的一系統匯流排。 39、 如中請專利範圍帛38項所述之方法,其中每個參與的核心使 用旁路通訊配線與另一參與的核心交換狀態相關信號,旁路 通訊配線係獨立於將該多核心處理器連接至—晶片組的一系 統匯流排。 40、 如申請專利範圍第38項所述之方法,更包括參與該分散式核 心間狀態發現過程來發現另一核心的一目標電源狀態。 41、 如申請專利範圍第38項所述之方法,更包括參與該分散式核 心間狀態發現過程來發現核心的群組的一複合電源狀態。 135 201245948 42、 如中料利辄圍第38項所述之方法,更包括有關—限制其係 該資源的組㈣影響該魏、速度、或效報共享資源能夠 操作的-核^ ’參與該分散式核心間狀態發現過絲限制操 作狀態的實現供組態-共享資源至一操作狀態,其係不再限 制於共旱該資_任何核叫該最低限制目標操作狀態。 43、 如申請專利範圍第38項所述之方法,更包括: 每個核心接收一目標操作狀態; 每個核心,回應於接收該目標操作狀態,實施同步邏輯的一 本地貫例具體化於s亥核心的微碼,用以發現一可用的狀 態; 其中該可㈣狀態係不大於雜辦擁有的目標操作狀態的 一最呵限制性操作狀態,其係由不妨礙其他核心的對應目 標操作狀態的該核心所實現; 同步邏輯的該本地實例在另一核心實施該同步邏輯讀至少一 新的從屬實例,及遞送該本地核心的目標操作狀態至該其 他核心;以及 該從屬實例計算-混合操作狀態為至少是目標操作狀態可用 於自身及從其他本地核心接收的該目標操作狀態的一功 能’及傳回該混合操作狀態至該本地核心。 44、如申請專利範圍第43項所述之方法,更包括: 同步邏輯的每個實例,除非由—終止條件早先轉止,遞歸 地在其他仍未同步的核心上實施該同步邏輯之多個實例, 136 201245948 直到該同步邏輯之同步實例已經實施在該處理器的一可用 的領域的全部核心。 45、如申請專利範圍第44項所述之方法,更包括: 同步邏輯的每個實例條件性地防止同步邏輯的從屬實例更在 其他尚未同步的核心上實施,如果其實例發現一核心具有 的目4 #作狀態是非較多限制性於該資源的最低受限操 作狀態; 其中該同步邏輯係組態為協調—最低限度足夠數目的其他核 心_發現衫-受限操作狀態能施行機共享資源。 6、i微碼常式被編碼在-多核心處麵的—實體處理核心的 、電腦可讀取的儲存舰巾,賴碼常式包括碼供經由不透 ^任何集中式非核心邏輯、而由核心交換的信號來使用一分 散式核心間狀態發現_,用以發現該多核心處理器的一可 用的狀態; 其中該可用的狀態為下述狀態的其中之一: 對该處理器的一複合電源狀態; 對錢理n的—領_—複合電職態,該躺包括多健 I的-群組’其係根據多個组態的其中之一為了電源節能 的目的而共享可操作地被組態的-可組態資源; 另一核心的一目標電源狀態; 、予可組態貪源的多個核心的一群組任一者的一最低限制 性目標電源狀態; 137 5? 201245948 一最高限制性目標電源狀態,其係由不妨礙其他核心的對應 目標操作狀態的一核心所實現; 一核心是否啟動或禁能; 該多核心處理器具有多少個核心為啟動; 共旱 &gt; 源及多個核心的領域的一識別,在其中各樣的可組態 資源係被共享; 該等核心的—階層式協調系統,用於經營共享資源; 在多核心處理器内多條旁路通訊配線以協調核心的一利用 率,其旁路通訊配線係獨立於將該多核心處理器連接至一 晶片組的一系統匯流排;以及 a等核^的階層式協㈣統’施行於旁路通訊配線上的核 心間通訊’旁路通訊配線係獨立於將該多如處理器連接 至一晶片組的一系統匯流排。 138201245948 VII. Scope of application for patents: 1. A multi-core processor, comprising: a plurality of activated entity processing cores; 〇 shared, the shared resources of the towel can utilize the electricity-configurable resources that can be operated by the two The configuration of such core sources affects the core source, speed or efficiency of sharing the resource, and for each core, the internal core power state management logic two tree is implemented between the cores Shape t found face, and money needle _ heart logic assistance: ', where 'the core power management of the domain is in the core. Among them, if the core is set for the purpose of setting the configuration of the shared resource - The core of the administrator, and the composite target power state is discovered through the inter-divided power state discovery process, the job power management logic = setting the core configuration to drive the implementation of the target power state of the configuration setting the shared resource; Let 5 set the 'complex target power state' as the most energy-efficient power state for the shared resource, which will not interfere with each core sharing the resource. Ho ^ corresponding target power state. For example, the multi-core processor described in claim 1 of the patent scope further includes a plurality of bypass communication wirings that do not connect the core to the group-system bus bar and are in the heterogeneous core, and the shirts thereof The decentralized nuclear source state process passes through these bypass pass-through lines, and carries a number of power state values that are exchanged for the cores to be exchanged for 2012. The multi-core processor of claim 1, wherein the target power states are C-states; the shared resources are shared by all cores to the wafer bus; and the system The manager core obtains exclusive rights in coordination with the chipset to perform one of the C-states of the streamer operation. The multi-core processor of claim 1, wherein: the shared source is a voltage source shared by at least some of the cores; the target power state is determined by 1 ·Change the commonality of the family (four) heart - _ quasi 5 of the multi-core processor as described in claim 1 of the patent scope, Yizhong. At least some of the cores are shared,; The clock ratio signal representation; and the official core exclusive of the plurality of clock ratio demand signals, which are sent to: cause a change in the pulse frequency provided by the clock source; -h U at time 6, as claimed in the multi-core processor described in the patent Na, and in the total: f only single - one is designed to drive the purpose of the composite power supply is specified as The core of the manager; and 124 201245948 These cores contain programmable logic, which can be set by the first software to be used: 增加 to increase or remove the purpose of setting the shared _'_孑日疋0 7 If you apply for the multi-core processor described in item 6 of the patent scope, (4) within (10): the source of the official logic is arbitrarily, and the correction is shown as the manager. The multi-core processor as described in the patent application scope w: the state is one of a plurality of predetermined power states, the power supply tube = the top of the predetermined power supply The state is contained in the internal influence of the second and second - at least the unshared resources - the unrestricted power supply = mother na (four) Well, the phase wheel lion domain is used in the case of adjustment - the local core target power state, if the eight systems are - If it is not the power state of the system. 9, := Γ range of the multi-core processor described in item 1, wherein the power state == turn_frequency (four) fine money such as the power supply of the heart ^ to respond to the transition into one of the configurations that will set the shared resource Target power state - demand. 10, (4) Please refer to the multi-cores described in the patent lang ^ item = rational: set each core configuration Qin type external = type core power state discovery _. Shenyue Patent|&A multi-core processor as described in item 1, wherein - decentralized 125 201245948 inter-cardiac power state discovers the process of micro-powered cancer and contains the same core code in the core m-state process An instance of side money, which participates in the exchange of multiple cores is executed. 12. The multi-core processing n as described in item 11 of the (4) patent scope, each instance of the microcode of a given core and f-wire, and another core-to-core-power state; Wherein the local instance is calculated according to at least two of the following values - the hybrid power state: the local core target power state, if any, - received by the implementation process - the detected power state value; and the inter-core power supply The state discovery microcode - the slave real instance returns a power state value. 13 A multi-core processor as claimed in item n of the patent scope, wherein the decentralized inter-core electrical status discovery touches a Pu (4) power supply, and the _-type path in the (four)-cylinder listening core == "Using: other cores of Beiyuan, recursively implement and synchronize the subordinate instances of the microcode between the core power states. 14. Distributing methods for managing power states of the multi-core processing (4), the multicore The processor has a multi-tilted physical core (four) and a resource shared by at least some of the cores, the method comprising: - the core receiving affects a configurable resource shared between itself and at least one other core - Local, such as the target power state, its local core target power state definition will be shared with the resource. The cores are configured with one of the resources of the power, speed or efficiency in which they can operate; 126 201245948 , &gt The process of discovering the core with the core, including the non-nuclear non-nuclear, not sharing the resources of at least _ other cores of electricity = sorrow - exchange; and = core system for the purpose of (four) sharing The purpose of the configuration of the source is designated as an official core 'and the composite target power state is found through the decentralized core power supply tree, then the fourth (four) set the purpose of the shared resource Driven - the realization of the composite target power state; its ▲ medium (4) source, the composite target power supply is the most power state in the month b, it will not interfere with each core sharing the resource Any corresponding target power state. The method of claim 14, wherein the method of participating in the inter-core power state discovery process is carried out via the eve of the core gates of the core gates, wherein The bypass communication wirings are connected to the cores of the cores, and the busbars are different, and wherein a plurality of power supply systems exchanged between the cores are exchanged through the bypass communication wirings. For example, the method described in the patent application is as follows: wherein: the target power states are C-states; &quot;, using the source system, all cores, sharing and connecting to one of the H groups of system busses ; and »hai g The core is deducted from the exclusive right of the chipset to perform a C-state that affects one of the operations of the sink. 17. The method of claim 14, wherein: 127 201245948 the share I Originally represented by at least some of the cores - a voltage source; a target electrical state - a demanding electrical variable; and a plurality of voltage change demand signals of the manager's core age, which are sent to the A method of claiming a change to a level of one of the cores of the source. 18. The method of claim 14, wherein: the shared resource is At least some of the cores are shared by the clock source, a target power state is represented by a - demand (four) pulse ratio signal; and the manager core monopolizes more money, which is sent to the clock source' It will result in a change in the clock frequency provided by the clock source to the cores sharing the clock source. 19. The method of claim 2, further comprising: discovering the private state by participating in the power state between the cores implemented by the other core, and verifying that the core is found and driven to set the shared resource. The configuration of one of the composite target power states is implemented. One of the physical cores of the processing core is encoded in the computer readable storage medium _ 'The m contains (4) executes the following code. Receives a set of configurable resources shared by more than two of these cores And multiple configurations of the East &lt;&gt; shared resources affect the power, speed, or efficiency with which the cores sharing the resource can operate; participate in the distributed core power supply of the core implementations State discovery process without the assistance of centralized non-core logic; and 128 201245948 If the "Xuan core" is designated as an official core for the purpose of setting the configuration of the shared resource, and the composite target power state is via The decentralized inter-core power state discovery process was discovered, and the configuration of the core was designed to drive the configuration of the composite target power state for the sigh of the configuration of the shared resource, which: 'for the co-time source In other words, the composite target power state is a power state of the ready-to-use type that will not interfere with any corresponding target power state of each core sharing the resource. 21. A multi-core processor, comprising: a plurality of processing cores; and a core-to-core status gambling code, in each of the _ heart towels, for not traversing any centralized non-core logic, and The core receives or communicates to other cores to participate in the distributed inter-core state discovery process. 22. The multi-core processor described in item 21 of the patent scope, wherein: The inter-core state discovers the microcode, exchanging signals with other cores via multiple bypass communication wires independent of the multi-core, processor-to-wafer-system bus; and the inter-core state discovery microcode Judging with the assistance of any centralized non-core logic - the available state value, its system-function, at least one state of the other core. 2:3. The multi-core processor as described in claim 21 of the patent scope, wherein: the inter-core state discovery micro-matrix includes synchronization logic provided to each core thereof 129 201245948 has a synchronization example for a core room A plurality of purposes of the state discovery process are operable to be implemented on multiple cores; and wherein each local real___ implements a plurality of new instances of the synchronization logic on its heartbeat, and the response is implemented on the local instance Any previous instance of the synchronization logic on another core. The multi-core processor of claim 23, wherein: each core has a target operational state; The processor includes a field that includes at least one of the cores of the microprocessor - I&quot;*, the processor provides - resources to the field, the resources of which are shared by the cores of the field; the synchronization logic Is configured to discover whether the field is ready to implement a limited power-saving operational state for the resource to limit the power, speed, or efficiency with which the cores sharing the resource can operate; and wherein the field is prepared In the case of a real-life limited power save operation state, if it is for each start core that shares the resource in the field, there is at least a restrictive target operating state as the restricted operational state. 25. The multi-core processor of claim 24, wherein: the _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ The core; and the restricted operational state is a C-state that disables the system from sinking one of the bus clocks. 130 201245948 26. The multi-core processor of claim 24, wherein: the shared resource is a phase-locked loop on a guest of the microprocessor; the field includes all startups The core, the clock signal is supplied by the phase locked loop; and 受限 the limited operation mode H can share a lower than maximum performance frequency ratio used by the core such as the phase locked loop (4). 27. The multi-core processor of claim 24, wherein: the shared resource is a voltage resource; the field includes all and is limited to a total (10) voltage #_ the startup core of the microprocessor; and the limited The operational state is a level below the maximum performance voltage level that can be used by the core of the pie resource. 28. The multi-core processor of claim 24, wherein: each instance of the synchronization logic is configured to 'unless terminated by a termination condition to recursively implement the synchronization on other cores Multiple instances of logic until a synchronized instance of the synchronization logic has been implemented on all cores of an available domain of the processor; and wherein the synchronization logic is configured to stop on other unsynchronized cores with a termination condition Implementation of an instance of synchronization logic if it finds that a core has a target operational state that is less restrictive to the limited power-saving operational state; 131 201245948 where the synchronization logic (4) is coordinated - the other core of the minimum thief It is used to discover that the Lai (4) field is ready to implement the _ limited power saving operation state. 29. A multi-core processor as described in paragraph 23, wherein: each core has a target operational state; 4 processor includes a 'domain, which includes at least the second of the core of the microprocessor; The processor provides the f source to the field, and its resources are shared by the cores in the field; the synchronization logic is configured to: discover whether the domain shares the boot core of the resource, Having a target operating state is lower than the current state of the power saving state; if it is the fine side of the system, to cancel the power saving operation state of the resource, if the synchronization logic has found one of the fields The startup core, the -target operation state is less restrictive than the current implementation of the power-saving operation state. 3〇, "Please refer to the multi-core processor described in the 23rd patent, wherein each instance of the synchronization encounter 2 The state is organized according to a hierarchical way. The hierarchical coordination system is used to apply the synchronization logic to other cores of the multi-correction processor. The multi-core processor of claim 23, wherein the order coordination system aggregates the $132 201245948 core into the fields based on resources shared by the cores in the fields, wherein each field For the purpose of a five-week configuration of the resources, a single core system is designated as the administrator of the domain. 32. The multi-core processor as described in claim 23, wherein: The hierarchical hierarchical coordination system aggregates the cores into a plurality of domain levels, including at least: a top-level domain of the highest status, having all of the cores; and a second-level domain of two or more peer-to-peer positions , most immediately adjacent to the most prominent position, which is composed of the members of the primary level domain, each of which includes a sub-group of the cores; The field miscellaneous, the solution-nuclear (four) concept is defined as a manager in the field; each multi-core field of the lowest-level multi-core field material defines a genus! The group is the most closely related to the town. The core of the manager of the adult domain; 〆&gt; The lowest layer, and the core domain define the same attribute group, which is composed of all its cores, · each core belongs to at least one attribute group; ^The logic of each financial field _ is limited by the new instance of Lai Shi &amp; to non-belonging to the core of the local core of the same attribute group. Declaring the special plum _ 23 items of the rural private processor The heart of the bean is designated by the core of a sea eve as a manager of each multi-core field of the hierarchical coordination system 133 201245948. 34 such as the multi-core processing described in the 23rd article of the patent appearance Each core group 4 has its decentralized core-line code to find out if the other cores of the multi-core processor are disabled. 35, Shen: The multi-core processor described in Item 23 of the patent scope, wherein each core group cancer is a 帛 秘 分散 分散 ^ 四 四 四 四 四 四 四 四 四 四 四 四 发现 发现 发现 发现 发现 发现 发现 发现 发现 发现 发现 发现 发现 发现 发现 发现 发现 发现36. The multi-core processor described in claim (4) (4) of the claim, wherein each core group complains of the hierarchical inter-core state discovery to discover the multi-core processor-hierarchical coordination system. A method for implementing a distributed microcode of a discovery state of 37 multi-core processors, the multi-core processor comprising a plurality of entity processing cores, the method comprising: at least two cores exchanged by the core without passing through any centralized non-core logic The signal comes to the process of decentralized nuclear state discovery. The method of claim 37, wherein the method is performed by at least one of the following states: a composite power state for the processor; a composite power source for the H-domain of the process State, the job includes a group of multiple cores, which is a configurable resource that is operatively configured according to one of a plurality of configurations for the purpose of power saving; another core A target power state; a minimum limit of any one of a plurality of cores of the comorable resources; 201245948 Sexual target power state; a maximum target power state, which does not interfere with the correspondence of other cores A core of the target operating state is implemented; whether a core is started or disabled; how many cores of the multi-core processor have startup; a shared resource and an identification of multiple core domains, among which various configurable resources Department is shared; 5 Hi-core core hierarchical coordination system for operating shared resources; multiple bypass communication wiring in multi-core processors to coordinate core-utilization The bypass communication wiring is independent of a system bus that connects the multi-core processor to a chipset; and the core-hierarchical coordination system performs inter-core communication on the bypass communication wiring, The road communication wiring is independent of a system bus that connects the multi-core processor to a chip set. 39. The method of claim 38, wherein each of the participating cores uses a bypass communication wiring and another participating core exchange state related signal, and the bypass communication wiring is independent of the multi-core processor. Connected to a system bus of the chipset. 40. The method of claim 38, further comprising participating in the decentralized core state discovery process to discover a target power state of another core. 41. The method of claim 38, further comprising participating in the decentralized core state discovery process to discover a composite power state of the core group. 135 201245948 42. The method described in item 38 of Zhongli Liwei, including the section on the restriction of the resource (IV) affecting the Wei, speed, or effective reporting of shared resources - The decentralized inter-core state finds that the implementation of the wire-limited operating state for configuration-shared resources to an operational state is no longer restricted to co-drying. Any nuclear call is called the minimum restricted target operational state. 43. The method of claim 38, further comprising: each core receiving a target operational state; each core, in response to receiving the target operational state, implementing a local instance of synchronization logic embodied in s The core code of the core is used to discover a usable state; wherein the (4) state is not greater than a most restrictive operational state of the target operating state of the miscellaneous, which is not hindered by the corresponding target operating state of other cores. The core instance of the synchronization logic implements the synchronization logic to read at least one new slave instance at another core, and deliver the target operational state of the local core to the other core; and the slave instance compute-mix operation The state is at least a function that the target operational state can use for itself and the target operational state received from other local cores' and return the hybrid operational state to the local core. 44. The method of claim 43, further comprising: each instance of the synchronization logic, wherein the synchronization logic is recursively implemented on other cores that are still unsynchronized unless previously terminated by the termination condition Example, 136 201245948 Until the synchronized instance of the synchronization logic has been implemented in all cores of an available field of the processor. 45. The method of claim 44, further comprising: each instance of the synchronization logic conditionally preventing a dependent instance of the synchronization logic from being implemented on other unsynchronized cores, if its instance finds a core having The status is not limited to the minimum restricted operational state of the resource; wherein the synchronization logic is configured to coordinate - a minimum sufficient number of other cores - discovery shirt - restricted operational state can implement shared resources . 6, i microcode routine is encoded in the - multi-core surface - the entity processing core, computer readable storage ship towel, the code routine includes code for the non-transparent ^ any centralized non-core logic A decentralized inter-core state discovery_ is used by the core-switched signal to discover an available state of the multi-core processor; wherein the available state is one of: a state of the processor Composite power state; for the numeracy of the n-collection _-composite electric service, the lie includes a multi-health-group" which is operatively shared according to one of a plurality of configurations for power saving purposes Configurable-configurable resource; a target power state of another core; a minimum restricted target power state of any one of a plurality of cores of a configurable source; 137 5? 201245948 A most restrictive target power state, which is implemented by a core that does not interfere with the corresponding target operating state of other cores; whether a core is activated or disabled; how many cores of the multi-core processor are activated; Gt; identification of the source and multiple core domains in which various configurable resources are shared; such core-hierarchical coordination systems for operating shared resources; multiple pieces within a multi-core processor Bypassing the communication wiring to coordinate the utilization of the core, the bypass communication wiring is independent of a system bus that connects the multi-core processor to a chip set; and the hierarchical coordination of the cores The inter-core communication on the bypass communication wiring 'bypass communication wiring is independent of a system bus that connects the processor to a chip set. 138
TW100148084A 2010-12-22 2011-12-22 Decentralized power management distributed among multiple processor cores TWI450084B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201061426470P 2010-12-22 2010-12-22
US13/299,059 US8782451B2 (en) 2010-12-22 2011-11-17 Power state synchronization in a multi-core processor
US13/299,122 US8635476B2 (en) 2010-12-22 2011-11-17 Decentralized power management distributed among multiple processor cores

Publications (2)

Publication Number Publication Date
TW201245948A true TW201245948A (en) 2012-11-16
TWI450084B TWI450084B (en) 2014-08-21

Family

ID=51332550

Family Applications (2)

Application Number Title Priority Date Filing Date
TW103115432A TWI531896B (en) 2010-12-22 2011-12-22 Power state synchronization in a multi-core processor
TW100148084A TWI450084B (en) 2010-12-22 2011-12-22 Decentralized power management distributed among multiple processor cores

Family Applications Before (1)

Application Number Title Priority Date Filing Date
TW103115432A TWI531896B (en) 2010-12-22 2011-12-22 Power state synchronization in a multi-core processor

Country Status (2)

Country Link
CN (2) CN104156055B (en)
TW (2) TWI531896B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI531896B (en) * 2010-12-22 2016-05-01 威盛電子股份有限公司 Power state synchronization in a multi-core processor
US10234932B2 (en) * 2015-07-22 2019-03-19 Futurewei Technologies, Inc. Method and apparatus for a multiple-processor system
CN106844258B (en) * 2015-12-03 2019-09-20 华为技术有限公司 Heat addition CPU enables the method and server system of x2APIC
US20170308153A1 (en) * 2016-04-25 2017-10-26 Mediatek Inc. Power delivery system for multicore processor chip
CN114270308A (en) * 2019-08-22 2022-04-01 谷歌有限责任公司 Compilation of synchronous processors
CN110716756B (en) * 2019-10-15 2023-03-14 上海兆芯集成电路有限公司 Multi-grain multi-core computer platform and starting method thereof
CN111506154B (en) * 2020-04-14 2021-05-25 深圳比特微电子科技有限公司 Method and system for increasing computing power and reducing computing power ratio of computer

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6665802B1 (en) * 2000-02-29 2003-12-16 Infineon Technologies North America Corp. Power management and control for a microcontroller
US6968467B2 (en) * 2000-10-26 2005-11-22 Matsushita Electric Industrial Co., Ltd. Decentralized power management system for integrated circuit using local power management units that generate control signals based on common data
US7337334B2 (en) * 2003-02-14 2008-02-26 International Business Machines Corporation Network processor power management
US7966511B2 (en) * 2004-07-27 2011-06-21 Intel Corporation Power management coordination in multi-core processors
CN1752893A (en) * 2004-09-24 2006-03-29 乐金电子(惠州)有限公司 Power source management method of mobile communication terminal machine
US7257679B2 (en) * 2004-10-01 2007-08-14 Advanced Micro Devices, Inc. Sharing monitored cache lines across multiple cores
US20060159170A1 (en) * 2005-01-19 2006-07-20 Ren-Wei Chiang Method and system for hierarchical search with cache
KR100663864B1 (en) * 2005-06-16 2007-01-03 엘지전자 주식회사 Apparatus and method for controlling processor mode in a multi-core processor
US7506184B2 (en) * 2006-05-09 2009-03-17 Intel Corporation Current detection for microelectronic devices using source-switched sensors
US7685441B2 (en) * 2006-05-12 2010-03-23 Intel Corporation Power control unit with digitally supplied system parameters
US8458498B2 (en) * 2008-12-23 2013-06-04 Intel Corporation Method and apparatus of power management of processor
TWI531896B (en) * 2010-12-22 2016-05-01 威盛電子股份有限公司 Power state synchronization in a multi-core processor

Also Published As

Publication number Publication date
CN104156055A (en) 2014-11-19
CN103955265A (en) 2014-07-30
CN103955265B (en) 2017-04-12
TWI531896B (en) 2016-05-01
TWI450084B (en) 2014-08-21
TW201430553A (en) 2014-08-01
CN104156055B (en) 2017-10-13

Similar Documents

Publication Publication Date Title
US10409347B2 (en) Domain-differentiated power state coordination system
CN102541237B (en) Decentralized power management distributed among multiple processor cores
TW201245948A (en) Decentralized power management distributed among multiple processor cores
US9009512B2 (en) Power state synchronization in a multi-core processor
TWI439853B (en) Distributed management of a shared power source to a multi-core microprocessor
US8930676B2 (en) Master core discovering enabled cores in microprocessor comprising plural multi-core dies
CN104049715B (en) The unknowable power management of platform
TW200842573A (en) Externally removable non-volatile semiconductor memory module for hard disk drives
CN107003971A (en) Method, device, the system of embedded stream passage in being interconnected for high-performance
Rotenberg et al. Rationale for a 3D heterogeneous multi-core processor
CN109564526A (en) Carry out the performance state of control processor using encapsulation and the combination of thread prompt information
KR20240055141A (en) Scalable system on a chip
US11782858B2 (en) Seamlessly integrated microcontroller chip
KR20130030683A (en) System-on chip for selectively performing heterogeneous power control and homegeneous power control, and method thereof
CN112771470A (en) System, apparatus and method for common power control of multiple intellectual property bodies and shared power rails
TW200825727A (en) Remote monitor module for power initialization of computer system
Dongare et al. Design of Shared Resource Based Multicore Embedded Controller Using LEON Processor
CN116070571A (en) SOC chip and electronic product