TW201133232A - Microprocessor and debugging method thereof - Google Patents

Microprocessor and debugging method thereof Download PDF

Info

Publication number
TW201133232A
TW201133232A TW100106953A TW100106953A TW201133232A TW 201133232 A TW201133232 A TW 201133232A TW 100106953 A TW100106953 A TW 100106953A TW 100106953 A TW100106953 A TW 100106953A TW 201133232 A TW201133232 A TW 201133232A
Authority
TW
Taiwan
Prior art keywords
core
microprocessor
instructions
heartbeat
instruction
Prior art date
Application number
TW100106953A
Other languages
Chinese (zh)
Other versions
TWI470421B (en
Inventor
Darius D Gaskins
Rodney E Hooker
Jason Chen
Original Assignee
Via Tech Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/964,949 external-priority patent/US8762779B2/en
Application filed by Via Tech Inc filed Critical Via Tech Inc
Publication of TW201133232A publication Critical patent/TW201133232A/en
Application granted granted Critical
Publication of TWI470421B publication Critical patent/TWI470421B/en

Links

Abstract

A method for debugging a multi-core microprocessor includes causing the microprocessor to perform an actual execution of instructions and obtaining from the microprocessor heartbeat information that specifies an actual execution sequence of the instructions by the plurality of cores relative to one another, commanding a corresponding plurality of instances of a software functional model of the cores to execute the instructions according to the actual execution sequence specified by the heartbeat information to generate simulated results of the execution of the instructions, and comparing the simulated results with actual results of the execution of the instructions to determine whether they match. Each core outputs an instruction execution indicator indicating the number of instructions executed by the core each core clock. A heartbeat generator generates a heartbeat indicator for each core on an external bus that indicates the number of instructions executed by each core during each external bus clock cycle.

Description

201133232 六、發明說明: 【發明所屬之技術領域】 [0001]本發明係有關一種多核心微處理器,特別是關於一種監 控多核心微處理器的指令執行及其除錯方法。 【先前技術] 剛目前的微處理器非常複雜’且對其進行除錯是個非常困 難的工作。微處理器的研發人員通常使用一軟體功能模 組(software functi〇nal m〇del)來模擬微處理器的 架構行為,以作為除錯工具。相較於Veril〇g模擬器等其 他軟體模組,軟體功能模組會更有甩,因為它更可迅迷、 地模擬大量指令的執行。軟體功能模組係根.據系統架構 來之定義每次執行單一指令,因此可有效地對單核心處 理器(single core processor)進行除錯。 C〇〇〇3]軟體功能模組也可用來對多核心處理器(mu 11 i-CC)I_e processor)進行除錯,軟體功能模組各別不同的範例會 在每核心上被用來模擬指令執行,只聲各核心彼此不互 相衫響,就可模擬的很好。然:石,..多核心處理器常產生 一些錯誤,而這些錯誤通常只會出現在多核心之間的圮 憶體存取時,或當各核心彼此共用同一個記憶體位址時 ’如共享一軟體信號(software semaph〇re)時。各核 心實質上會在不同時間存取共用的記憶體位址,例如, 第—核心讀取—信號(semaphore)並等待第二核心來寫 入該信號。除非軟體功能模組的兩範例執行指令時,非 常近似於實際處理器在發生錯誤時所執行指令的順序, 否則軟體功能模組就無法有效地對多核心處理器進行除 100106953 表單編號A0101 第4頁/共47頁 1002011767-0 201133232 • 錯。因此亟需提出一種控制被模擬的各核心彼此間執行 指令的順序,其近似於後晶片(post-silicon)之多核心 處理器的順序。 【發明内容】 [0004] 本發明揭露一微處理器的除錯方法’其中微處理器具有 複數個核心。所述之方法包括:使微處理器去執行指令 的一實際執行(actual execution) ’並從微處理器獲 得一心跳資訊,其明確指出各核心彼此間執行指令的— 實際執行順序(actual execution sequence)。所述 之方法也包括,命令一軟體功能模組的複數個相關範例 根據實際執行順序來執行指令,以產生指令執行的模擬 結果。所述之方法更包括比較模擬結果與指令執行的實 際結果,以判斷兩者是否符合。 [0005] 本發明復揭露一種微處理器,其包括複數個核心,每一 個核心會輸出一指令執行指示(instr;uct ion execu-tion indicator),用來指示各核心在每一時脈期間, Q 所執行的指令數目。微處理器更包括一心跳產生器 (heartbeat generator),其從每個核心接收指令執行 指示》心跳產生器係用來對每一個在外部匯流排上的處 理核心產生一心跳指示(heartbeat indicator),以回 應指令執行指示,而心跳指示則指出了每一核心在外部 匯流排的每個時脈週期中,所執行的指令數量。 [0006] 本發明又揭露一微處理器,其包括複數個核心,每一個 核心會產生一指令執行指示(instruction execution indicator),用來指示各核心在每一時脈週期期間,所 100106953 表單煸號A0101 1002011767-0 201133232 執行的指令數目。微處理器又包括一儲存陣列(memory array),其儲存在一段時脈週期期間中,各核心所產生 的指令執行指示。微處理器更包括一匯流排介面單元 (bus interface unit),其轉接於微處理器外部的一 匯流排。匯流排介面單元用來將儲存於儲存陣列中的指 令執行指示,寫入至微處理器外部的一記憶體中。 【實施方式】 [0007] 本發明實施例中提到的多核心處理器,係用來產生心跳 訊號(heartbeat signal),以指示各核心彼此間的執 行指令速率。處理器的研發人員可如同處理器運作般獲 得心跳訊號,並使用得到的心跳資訊,來動態地控制每 一核心的軟體功能模組對每一核心所之執行指令的速率 。藉此,心跳訊號提供明顯指標給軟體功能模組,使其 得以窺見所需的多核心處理器之内部操作,以控制被模 擬的各核心彼此間執行指令的順序,此順序會近似於實 際出現錯誤的多核心處理器之執行指令順序。在某些實 施例中,處理器會在系統匯流排(architectural processor bus) 上提供心跳訊號資訊 ,但這樣會影 響在多 核心處理器上執行程式的時序間,而導致當致能 (enable)心跳時,錯過一些錯誤。因此,在本發明實施 例中,處理器係非侵入性地在外部侧波帶匯流排 (external sideband bus)上提供心跳訊號,而不是在 系統匯流排上提供心跳訊號。 [0008] 請參考第一圖,係為本發明所揭示之具有雙核心處理器 102的計算機系統100之功能方塊圖,雙核心處理器 100106953 表單編號A0101 第6頁/共47頁 1002011767-0 201133232 ❹ [0009] (dual-core processor)102 會產生心跳訊號 106。計 算機系統(comput ing system)100包括一雙核心處理器 102,其包括兩個核心,如圖中的第一核心104A以及一第 二核心104B,兩者可合併以核心104統稱。在本發明實施 例中,雙核心處理器102的每一核心104係微處理器核心 ,其符合威盛電子股份有限公司(VIA Technologies, Inc)所設計的威盛凌礙處理器架構(VIA Nano™ architecture) 。 雖然本實施例係以雙核心處理器舉例 ,但其 他利用心跳訊號106提供資訊給多個核心之處理器,亦為 本發明所保護之範圍。 雙核心處理器102更包括一心跳產生器(heartbeat generator)103,其搞接於每一個核心104。具體來說 ,第一核心104A產生一指令執行指示(instruction execution indicator)105A,用來指示在一時脈週期 中所執行的指令數目;第二核心104B產生一指令執行指 示(i nstruct i on execution i ndi cator) 105B,用來 指示在一時脈週期中所執行的指令數目,心跳產生器10 3 則產生心跳訊號106來指示核心104已執行的指令,以回 應指令執行指示10 5。在本發明實施例中,核心10 4執行 指令的假設執行,且指令執行指示10 5告知心跳產生器 103指令已被完成,意即,不同於假設執行而已,其仍會 更新核心104的系統狀態。 計算機系統100更包括一記憶體112,其耦接於雙核心處 理器102,雙核心處理器102的每一核心104可編寫成週 期性地停止執行使用者程式指令,並儲存(dump)目前狀 100106953 表單編號A0101 第7頁/共47頁 1002011767-0 [0010] 201133232 態至記憶體112的一預定位址,以及讀出(flush)本身快 取記憶體的内容至記憶體112,在此視為一檢查點 (checkpoint)。核心1〇4的狀態包括其内部暫存器的狀 態’在此視為一檢查點狀態(checkp〇int state)。更 具體地說,每一核心104可藉由研發人員編寫成持續執行 一預定數量的指令(如丨〇〇, 〇〇〇個指令)後,便進行停止 執行指令、儲存檢查點狀態、讀出快取記憶體、重新執 行指令等動作,直到下次累積預定數量指令,再重覆上 述動作,以此類推。 [0011] 計算機系統10 0又包括一邏輯分析器(丨〇g i c a n a丨y z 一 er )1 08 ’在本發明實施例中,邏嵙分析器1〇8包括雙核 心處理器102裡其中一個核〇4。邏輯分析器108監控 處理器匯流排114以及取得上面的傳輸(transacti〇n) ,包括寫入檢查點狀態至記憶體1 1 2以及讀出快取記憶體 等資料傳輸。邏輯分析器108也監控並取得心跳訊號106 .. ::: ’其儲存所擷取到的資訊至一資料夾,如一 磁碟(disk drive)中,而資料夾116包括擷取到的處理 器匯流排傳輸資訊11 8以及心跳訊號資訊122。在本發明 實施例中,心跳訊號1〇6會在一侧波帶匯流排(Sideband Bus)上被提供給雙核心處理器1〇2,舉例來說,側波帶匯 流排係為JTAG匯流排,其可被雙核心處理器1〇2晶片内部 的一分散服務處理器(separate service processor) 使用。 计异機系統1 ο 〇又包括一軟體功能模組模擬環境 (software functional model simulation envir- 100106953 表單編號A0101 第8頁/共47頁 1002011767-0 [0012] 201133232 〇nment)l24,其包括一或多個模擬計算機系統,其有別 於包含微處理器102的計算機系統1〇〇。軟體功能模組模 擬環境124使用擷取到並儲存於資料夾116中的處理器匯 流排傳輸資訊118以及心跳訊號資訊122,來模擬雙核心 處理器102的操作情形,以下會有詳細說明。 [0013] 請再參考第二圖,係為本發明所揭示之軟體功能模組模 擬環境124之功能方塊圖。軟體功能模組模擬環境124包 Ο 括一模擬初始狀態產生器(simulated initial state generator)202、一速率控制器(rate controh ler)204、一第一核心之軟體场能模組範例206A、一第 二核心之軟體功能模組範例206B' —實際結果產生器208 、以及一比較單元(comparison function)226。雖然 這些元件都是由軟體實作出的,但部份或全部元件亦可 由硬體來實作,以增加執行速度》 [0014] ❹ 模擬初始狀態產生器202接取擷敢到的處理器匯流排資訊 118,以用來產生一模擬初為記憶體畴像(simulated initial memory image)212、一第一核心之模擬初始 狀態214A、以及一第二核心之模擬初始狀態214B。隨後 ,模擬初始記憶體映像212被複製為一模擬結果記憶體映 像(simulated result memory image)232,且第一 核心之模擬初始狀態214A到被複製為一第一核心之模擬 結果狀態234A、而第二核心之模擬初始狀態214B則被複 製為到一第二核心之模擬結果狀態234B。為方便說明, 假設每一核心1〇4都已儲存一第一檢查點狀態(包括上述 的内部暫存器的狀態)、並已讀出(flush)本身快取記憶 100106953 表單編號A0101 第9頁/共47頁 1002011767-0 201133232 體的内容、重新執行預定數量指令後、又儲存一第二檢 查點及讀出快取記憶體的内容。此外’更假設處理器匯 流排傳輸資訊118包括了第—與第二檢查點的匯流排傳輸 以及兩者之間的所有傳輸,這些傳輸藉由執行預定數量 指令後而產生。請見美國臨時專利申請案第61 /297, 505 號,其於西元2010年1月22曰申請,當中描述兩核心1〇4 之間的檢查點進行同步之方法。 [0015] [0016] 根據本發明實施例,模擬初始狀態產生器2〇2藉由以下方 法來產生模擬初始記憶體映像212 : (1) 偵測在雙核心處理器i 02中的第一檢查點以及第二檢 查點之間,讀取記憶體112中一位址的傳輸。 (2) 判斷上述的讀取傳輸是否為第一、第二檢查點之間 ’第一次對這個位址作讀取的傳輸。 (3) 如果是’則對此傳輸產生一記憶體位址記錄 (memory location record),其包括記憶體位址以及 ξ;..:」 .. 讀取的資料值。藉由上述方法,稹擬初始狀態產生器2〇2 便產生少量的模擬初始記憶體蜂像212。然而,少量的記 憶體映像已能滿足軟體功能模組範例2 〇 6需求,因為在第 一、第二檢查點之間’軟體功能模組範例206只需要讀取 之前產生的記憶體位址,若否,即表示在實際的雙核心 處理器102上有錯誤產生。 模擬初始狀態產生器202可直接從擷取於處理器匯流排傳 輸資訊118的第一檢查點狀態,以產生第一核心之模擬初 始狀態214A。在本發明實施例中,如上所述,在每個檢 查點上,每個核心104根據一預定格式來分別寫入自身狀 100106953 表單編號A0101 第10頁/共47頁 1002011767-0 201133232 態資訊至記憶體112中的一預定位址,其使得模擬初始狀 態產生器202能在處理器匯流排傳輸資訊118的内部找到 第一核心104Α之第一檢查點狀態。同樣地,模擬初始狀 態產生器202亦直接從擷取於處理器匯流排傳輸資訊118 的第一檢查點狀態,來產生第二核心之模擬初始狀態 214Β。 [0017] Ο ο 實際結果產生器208接收第一圖中的處理器匯流排傳輸資 訊118,用以產生一第一核心之實際結果狀態224Α、一第 二核心之實際結果狀態224Β以及π實際結果記憶體映像 222 〇實際結果產生器2〇8直揍從擷取#處理器匯流排傳 輸資訊118的苐二檢查點狀態’來產生第一核心之實際結 果狀態224Α。根據一在本發明實施例中,如上所述,在 每個檢查點上’每個核心104根據一預定格式來分別寫入 自身檢查點狀態至記憶體112中一預定位址,其致能使得 實際結果產生器208來發現能在處理器匯流排傳輸資訊 118的内部的找到第一核心104Α之第二檢查點狀態。同樣 地,實際結果產生器208亦直搔從擷取於處理器匯流排傳 輸資訊118的第二檢查點狀態,來產生第二核心之實際結 果狀態224Β。比較單元226將第一核心之實際結果狀態 224Α與一第一核心之模擬結果狀態234Α進行比較,並將 第二核心之實際結果狀態224Β與一第二核心之模擬結果 狀態234Β進行比較的情形,將在後面做更進一步的討論 〇 在本發明實施例中,實際結果產生器2 0 8藉由以下方法來 產生實際結果記憶體映像222 : 100106953 表單編號Α0101 第11頁/共47頁 1002011767-0 [0018] 201133232 (1 )偵測在雙核心處理器1 0 2中的第一檢查點以及第二檢 查點之間,寫入記憶體112中一位址的傳輸,所述傳輸包 括每個核心104在第二檢查點將自身快取記憶體内容寫入 至記憶體112中。 (2) 判斷上述寫入傳輸是否為第一、第二檢查點之間, 最後一次對這個位址做寫入傳輸。 (3) 如果是,則對此傳輸產生一記憶體位址記錄 (memory location record),其包括記憶體位址以及 寫入的資料值。藉由上述方法,實際結果產生器208便產 生少量的實際結果記憶體映像222。然而,少量的記憶體 映像已能滿足軟體功能模組範例206需求,因為在第一、 第二檢查點之間,軟體功能模組範例206只需要寫入之前 產生的記憶體位址。若否,則表示在實際的雙核心處理 器102上有錯誤產生。至於比較單元226將會拿實際結果 記憶體映像222與一模擬結果記憶體映像232進行比較的 情形,將在後面做更進一步的討論。 [0019] 速率控制器204接收第一圖中所擷取到的心跳訊號資訊 122,用以產生命令218A至第一核心之軟體功能模組範例 206A,以及產生命令218B至第二核心之軟體功能模組範 例206B。命令218A動態控制軟體功能模組範例206彼此 間執行指令的速率。在本發明實施例中,每一命令218控 制軟體功能模組範例執行N個指令,其中N是定義於命令 中。在另一實施例中,軟體功能模組範例206為多執行緒 (multi-threaded)且透過諸如信號(semaphore)來相 互通訊。在本發明實施例中,命令會控制一個核心104的 100106953 表單編號A0101 第12頁/共47頁 1002011767-0 201133232 • 軟體功能模組範例206去執行X個指令,直到另一核心l〇4 的軟體功能模組範例206執行Y個指令後才停止。接下來 的圖示將詳細說明速率控制器204如何使用心跳訊號資訊 122來發出命令218,以動態地控制軟體功能模組範例 206彼此間執行指令的速率。 [0020] 每個軟體功能模組範例206模擬了核心104的系統行為。 第一核心之軟體功能模組範例206A存取(read/write) 第一核心之模擬結果狀態234A,而第二核心之軟體功能 0 模組範例206B存取(read/write)第二核心之模擬結果 狀態234B。此外,每個軟體功能模組範例206也會在執行 記憶體存取指令時,依據速率控制器204的命令來讀取且 /或寫入模擬結果記憶體映像232,»特別是,由第一核心 之軟體功能模組範例206A寫入資料至模擬結果記憶體映 像232時,第二核心之軟體功能模組範例206B會知道,反 之亦然’如此便分別影響了軟體功能模組範例206的模擬 結果狀態234。每個軟體功能模組範例206執行完預定數 〇 量(例如100, 〇〇〇個)指令後,被複製到第一核心之模擬 結果狀態234A的第一核心之模擬初始狀態214A,將會被 更新變成真正的第一核心之模擬結果狀態234A,且被複 製到第二核心之模擬結果狀態234B的第二核心之模擬初 始狀態214B,也將會被更新變成真正的第二核心之模擬 結果狀態234B。比較單元226將第一核心之模擬結果狀態 234A與第一核心之實際結果狀態224A進行比較,並將第 二核心之模擬結果狀態234B與第二核心之實際結果狀態 224B進行比較’以判斷真正的雙核心處理器1〇2是否在第 100106953 表單編號A0101 第13頁/共47頁 1002011767-0 201133232 一及第二檢查點之間出現了錯誤,而比較結果則由取否 指示器(pass/fai 1 indicator)228所指出。此外,每 個軟體功能模組範例206執行完預定數量(例如100, 〇〇〇 個)指令後,被複製到模擬結果記憶體映像232的模擬初 始記憶體映像212之值,將會被更新而變成真正的模擬結 果記憶體映像232。比較單元226將模擬結果記憶體映像 232與實際結果記憶體映像222進行比較,以判斷真正的 雙核心處理器102是否在第一及第二檢查點之間出現了錯 誤,比較結果則由取/否指示器所指出。 [0〇21] 因此,經由使用速率控制[黯'2:0:4作中介的..優勢,心跳訊號 資訊122可用來動態地控制每個軟體功能模組範例206執 行指令的速率。也就是說,迷專控制器204可控制軟體功 能模組範例206彼此間執行指令的順序,如此而指令可按 照核心10 4存取記憶體的適當順序來定適當的來執行順序 ,於是能以精確地從各核心1 04與記憶體11 2的實際初始 丨' ;:::;:,.. 狀態來模擬實際的雙核心處理器102之運作行為,此即比 較單元226可將實際雙核心處理器102之運作行為與其模 . 擬之運作行為進行比較的原因。 [0022] 請參考第三圖,係為本發明所揭示之第二圖之操作模擬 環境124之方法流程圖。如第三圖所示,軟體功能模組模 擬環境124首先根據第十四圖中的步驟S1406,來產生一 模擬結果記憶體映像232以及模擬結果狀態234,其會分 別與實際結果記憶體映像222以及進行實際結果狀態224 進行比較(步驟S1408)。流程開始於步驟S302。 [0023] 在步驟S302中,速率控制器204從資料夹116接收心跳訊 100106953 表單編號 A0101 第 14 頁/共 47 頁 1002011767-0 201133232 號資訊122。接著流程前往步驟S304。 步驟S304中,在心跳訊號資訊122所指出之心跳訊號1〇6 的下一個時脈週期t,速率控制器204檢查每個核心 的心跳訊號106之值。心跳訊號106之值會於下列各實施 例與圖示中再作說明◊接著流程前往步驟S306。 在步驟S306中,速率控制器204判斷核心N(第一核心 W4A或第二核心i〇4B)是否產生心跳。如果是,則執行 步驟S308 ;否則回到步驟304去檢測下一個時脈週期。 Ο 在步驟S308中,速率控制器2Q4發出命令218以驅使核心 N的軟體功能模組範斜2 0 6根據所判斷出的心跳資訊來執 行一或多個指令,這部份後面會再詳述'接著流程前往 步驟S312。 、 ! , 接著,步驟S312中,核心N的軟體功能模铒範例206執行 下—指令或與模擬結果記憶體映像232及模擬結果狀態 234相關的指令。如果是執行一記憶體讀取指令,則核心 N的軟體功能模組範例2〇 6將讀取模凝_慕記憶體映像232 。如果是執行一祀憶鲽寫入指令,則軟體功能模組範例 206會對核心n更新模擬结果記憶體映像232。之後又回到 步驟S304繼續檢查下一時脈週期。 以下將說明指令執行指示105、心跳產生器1〇3、心跳訊 號106、及其被速率控制器204所使用的各種實施例。 請參考第四圖,係為本發明所揭示之雙核心處理器之一 具體實施例之功能方塊圖。在第四、五圖中,核心104與 心跳訊號106的時脈速率相同。此外,核心104在每一核 心時脈週期可執行完一個指令。如圖所示,每個核心104 100106953 的指令執行指示1 〇 5都是一個位元’若核心1 〇 4在核心時 表單編號A0101 第15頁/共47頁 1002011767-0 201133232 脈週期内完成〜彻201133232 VI. Description of the Invention: [Technical Field of the Invention] [0001] The present invention relates to a multi-core microprocessor, and more particularly to an instruction execution and a debugging method thereof for monitoring a multi-core microprocessor. [Prior Art] The current microprocessor is very complicated' and it is a very difficult task to debug it. Microprocessor developers often use a software functi〇nal m〇del to simulate the microprocessor's architectural behavior as a debugging tool. Compared to other software modules such as the Veril〇g simulator, the software function module is even more awkward because it can simulate the execution of a large number of instructions. The software function module is rooted. According to the system architecture, a single instruction is executed each time, so that the single core processor can be effectively debugged. The C〇〇〇3] software function module can also be used to debug the multi-core processor (mu 11 i-CC) I_e processor. The different examples of the software function modules are used to simulate on each core. The instruction is executed, and the cores can be simulated very well without the sound of each other. Of course: stone, .. multi-core processors often produce some errors, and these errors usually only occur when accessing multiple cores, or when cores share the same memory address with each other. A software signal (software semaph〇re). The cores will essentially access the shared memory address at different times, for example, the first core read-semaphore and wait for the second core to write the signal. Unless the two examples of the software function module execute the instruction, which is very similar to the order of the instructions executed by the actual processor when the error occurs, the software function module cannot effectively perform the multi-core processor except 100106953 Form No. A0101 No. 4 Page / Total 47 pages 1002011767-0 201133232 • Wrong. It is therefore desirable to have a sequence that controls the execution of instructions by the cores being simulated, which approximates the order of the core processor of the post-silicon. SUMMARY OF THE INVENTION [0004] The present invention discloses a method of debugging a microprocessor in which a microprocessor has a plurality of cores. The method includes: causing a microprocessor to perform an actual execution of the instruction 'and obtaining a heartbeat information from the microprocessor, which clearly indicates that the cores execute instructions with each other - the actual execution sequence ). The method also includes commanding a plurality of related examples of the software function module to execute the instructions according to the actual execution order to generate a simulation result of the instruction execution. The method further includes comparing the simulation result with the actual result of the instruction execution to determine whether the two are in conformity. [0005] The present invention discloses a microprocessor including a plurality of cores, each of which outputs an instruction execution instruction (instr; uct ion execu-tion indicator) for indicating each core during each clock period, Q The number of instructions executed. The microprocessor further includes a heartbeat generator that receives an instruction execution instruction from each core. The heartbeat generator is configured to generate a heartbeat indicator for each processing core on the external bus. In response to the instruction execution indication, the heartbeat indication indicates the number of instructions executed by each core in each clock cycle of the external bus. The present invention further discloses a microprocessor comprising a plurality of cores, each core generating an instruction execution indicator for indicating each of the cores during each clock cycle, the 100106953 form nickname A0101 1002011767-0 201133232 Number of instructions executed. The microprocessor further includes a memory array that stores instruction execution instructions generated by the cores during a clock cycle. The microprocessor further includes a bus interface unit that is coupled to a busbar external to the microprocessor. The bus interface unit is used to write an instruction execution instruction stored in the storage array to a memory external to the microprocessor. [Embodiment] The multi-core processor mentioned in the embodiment of the present invention is used to generate a heartbeat signal to indicate the execution instruction rate of each core. The processor developer can obtain the heartbeat signal as the processor operates, and use the obtained heartbeat information to dynamically control the rate at which each core software function module executes instructions for each core. In this way, the heartbeat signal provides a clear indicator to the software function module, so that it can glimpse the internal operations of the required multi-core processor to control the order in which the simulated cores execute instructions with each other. This sequence will approximate the actual occurrence. The order of execution instructions for the wrong multi-core processor. In some embodiments, the processor provides heartbeat information on the system processor bus, but this affects the timing of executing the program on the multi-core processor, resulting in an enable heartbeat. When I missed some errors. Thus, in an embodiment of the invention, the processor non-invasively provides a heartbeat signal on the external sideband bus instead of providing a heartbeat signal on the system bus. Please refer to the first figure, which is a functional block diagram of a computer system 100 having a dual core processor 102 disclosed in the present invention, a dual core processor 100106953, a form number A0101, page 6 of 47 pages 1002011767-0 201133232 ❹ [0009] (dual-core processor) 102 generates a heartbeat signal 106. The computing system 100 includes a dual core processor 102 that includes two cores, such as a first core 104A and a second core 104B, which may be collectively referred to as a core 104. In the embodiment of the present invention, each core 104 of the dual core processor 102 is a microprocessor core that conforms to the VIA NanoTM architecture designed by VIA Technologies, Inc. ). Although the present embodiment is exemplified by a dual core processor, other uses of the heartbeat signal 106 to provide information to a plurality of core processors are also within the scope of the present invention. The dual core processor 102 further includes a heartbeat generator 103 that interfaces with each core 104. Specifically, the first core 104A generates an instruction execution indicator 105A for indicating the number of instructions executed in one clock cycle; the second core 104B generates an instruction execution instruction (i nstruct i on execution i) The ndi cator 105B is used to indicate the number of instructions executed in a clock cycle, and the heartbeat generator 103 generates a heartbeat signal 106 to indicate that the core 104 has executed an instruction in response to the instruction execution indication 105. In the embodiment of the present invention, the core 104 executes the hypothetical execution of the instruction, and the instruction execution instruction 105 informs the heartbeat generator 103 that the instruction has been completed, that is, unlike the hypothesis execution, it still updates the system state of the core 104. . The computer system 100 further includes a memory 112 coupled to the dual core processor 102. Each core 104 of the dual core processor 102 can be programmed to periodically stop executing user program instructions and dump the current state. 100106953 Form No. A0101 Page 7 of 47 1002011767-0 [0010] 201133232 A predetermined address to the memory 112, and flushing the contents of the memory itself to the memory 112, here Is a checkpoint. The state of core 1〇4, including the state of its internal register, is treated here as a checkpoint state. More specifically, each core 104 can be executed by the developer to continuously execute a predetermined number of instructions (eg, one instruction), then stop executing the instruction, store the checkpoint status, and read out Flash memory, re-execute instructions, etc. until the next time a predetermined number of instructions are accumulated, repeat the above actions, and so on. [0011] The computer system 100 further includes a logic analyzer (丨〇gicana丨yz-er) 080. In the embodiment of the present invention, the logic analyzer 1〇8 includes one of the dual-core processors 102. 4. Logic analyzer 108 monitors processor bus 114 and retrieves the above transfers, including writing checkpoint status to memory 1 1 2 and reading cache memory. The logic analyzer 108 also monitors and obtains the heartbeat signal 106..::: 'which stores the retrieved information into a folder, such as a disk drive, and the folder 116 includes the retrieved processor. The bus transmission information 11 8 and the heartbeat signal information 122. In the embodiment of the present invention, the heartbeat signal 1〇6 is provided to the dual core processor 1〇2 on the sideband bus, for example, the sideband bus is a JTAG bus. It can be used by a separate service processor inside the dual core processor 1〇2 chip. The computer system 1 ο 〇 also includes a software functional model simulation environment (software functional model simulation envir-100106953 form number A0101 page 8 / 47 pages 1002011767-0 [0012] 201133232 〇nment) l24, which includes one or A plurality of analog computer systems are distinguished from computer systems including microprocessor 102. The soft function module simulation environment 124 simulates the operation of the dual core processor 102 using the processor bus transfer information 118 and heartbeat signal information 122 retrieved and stored in the folder 116, as will be described in more detail below. [0013] Referring again to the second figure, it is a functional block diagram of the software function module simulation environment 124 disclosed in the present invention. The software function module simulation environment 124 includes a simulated initial state generator 202, a rate controller 204, a first core software field energy module example 206A, a first The two core software function module example 206B' - the actual result generator 208, and a comparison function 226. Although these components are all made by software, some or all of the components can also be implemented by hardware to increase the execution speed. [0014] ❹ The simulation initial state generator 202 picks up the processor bus that dares to reach. The information 118 is used to generate a simulated initial memory image 212, a simulated initial state 214A of a first core, and a simulated initial state 214B of a second core. Subsequently, the simulated initial memory image 212 is copied as a simulated result memory image 232, and the simulated initial state 214A of the first core is copied to a first core simulation result state 234A, and The second core simulation initial state 214B is then copied to a second core simulation result state 234B. For convenience of explanation, it is assumed that each core 1〇4 has stored a first checkpoint state (including the state of the internal scratchpad described above), and has flushed itself to cache memory 100106953 Form No. A0101 Page 9 / Total 47 pages 1002011767-0 201133232 The contents of the body, after re-executing the predetermined number of instructions, store a second checkpoint and read the contents of the cache memory. Further, it is assumed that the processor bus transmission information 118 includes the bus transmission of the first and second checkpoints and all transmissions between the two, which are generated by executing a predetermined number of instructions. See U.S. Provisional Patent Application No. 61/297, No. 505, which was filed on January 22, 2010, in which the method of synchronizing checkpoints between two cores and 1.4 is described. [0016] According to an embodiment of the invention, the simulated initial state generator 2〇2 generates the simulated initial memory map 212 by: (1) detecting the first check in the dual core processor i 02 Between the point and the second checkpoint, the transfer of the address in the memory 112 is read. (2) It is judged whether or not the above-mentioned read transmission is the transmission of reading the address for the first time between the first and second checkpoints. (3) If yes, then a memory location record is generated for this transmission, which includes the memory address and ξ;.:".. The data value read. By the above method, the virtual initial state generator 2〇2 generates a small amount of the simulated initial memory bee image 212. However, a small amount of memory image can meet the requirements of the software function module example 2 〇6, because between the first and second checkpoints, the 'software function module example 206 only needs to read the previously generated memory address, if No, it means that there is an error on the actual dual core processor 102. The simulated initial state generator 202 can directly derive the first checkpoint state from the processor bus transfer information 118 to generate the simulated initial state 214A of the first core. In the embodiment of the present invention, as described above, at each checkpoint, each core 104 writes its own shape according to a predetermined format, 100106953, form number A0101, page 10, total page 47, 1002011767-0, 201133232 state information to A predetermined address in the memory 112 enables the analog initial state generator 202 to find the first checkpoint state of the first core 104 within the processor bus transfer information 118. Similarly, the simulated initial state generator 202 also generates the simulated initial state 214 of the second core directly from the first checkpoint state retrieved from the processor bus transfer information 118. [0017] The actual result generator 208 receives the processor bus transmission information 118 in the first figure for generating an actual result state 224 of the first core, an actual result state 224 of the second core, and an actual result of π. The memory image 222 〇 the actual result generator 2〇8 directly generates the actual result state 224Α of the first core from the second checkpoint state of the #processor bus transfer information 118. According to an embodiment of the present invention, as described above, at each checkpoint, 'each core 104 writes its own checkpoint state to a predetermined address in the memory 112 according to a predetermined format, which enables The actual result generator 208 finds a second checkpoint state that can be found within the processor bus transfer information 118 to find the first core 104. Similarly, the actual result generator 208 also generates the actual result state 224 of the second core from the second checkpoint state retrieved from the processor bus transfer information 118. The comparing unit 226 compares the actual result state 224 of the first core with the simulation result state 234 of the first core, and compares the actual result state 224 of the second core with the simulation result state 234 of the second core. Further discussion will be made later. In the embodiment of the present invention, the actual result generator 2 0 8 generates an actual result memory map 222 by the following method: 100106953 Form number Α 0101 Page 11 / Total 47 pages 1002011767-0 [0018] 201133232 (1) detecting transmission of an address in the memory 112 between the first checkpoint and the second checkpoint in the dual core processor 102, the transmission including each core 104 writes its own cache memory content to the memory 112 at the second checkpoint. (2) It is judged whether the above write transmission is between the first and second checkpoints, and the address transmission is performed for the last time. (3) If yes, a memory location record is generated for this transmission, which includes the memory address and the data value written. With the above method, the actual result generator 208 produces a small number of actual result memory maps 222. However, a small amount of memory image can satisfy the software function module example 206 requirement, because between the first and second checkpoints, the software function module example 206 only needs to write the previously generated memory address. If not, it indicates that an error has occurred on the actual dual core processor 102. The comparison unit 226 will compare the actual result memory map 222 with a simulation result memory map 232, which will be discussed further below. [0019] The rate controller 204 receives the heartbeat signal information 122 captured in the first figure, to generate the command 218A to the first core software function module example 206A, and generates the command 218B to the second core software function. Module example 206B. Command 218A Dynamic Control Software Function Module Example 206 The rate at which instructions are executed between each other. In the embodiment of the present invention, each command 218 control software function module example executes N instructions, where N is defined in the command. In another embodiment, the software function module example 206 is multi-threaded and communicates with each other through, for example, a semaphore. In the embodiment of the present invention, the command will control a core 104 100106953 Form number A0101 Page 12 / Total 47 page 1002011767-0 201133232 • Software function module example 206 to execute X instructions until another core l〇4 The software function module example 206 stops after executing Y commands. The following illustration will detail how the rate controller 204 uses the heartbeat signal information 122 to issue commands 218 to dynamically control the rate at which the software function module instances 206 execute instructions with each other. [0020] Each software function module example 206 simulates the system behavior of the core 104. The first core software function module example 206A access (read/write) the first core simulation result state 234A, and the second core software function 0 module example 206B access (read/write) the second core simulation Result state 234B. In addition, each software function module example 206 also reads and/or writes the simulation result memory image 232 according to the command of the rate controller 204 when executing the memory access instruction, » in particular, by the first When the core software function module example 206A writes data to the simulation result memory image 232, the second core software function module example 206B will know, and vice versa, thus affecting the simulation of the software function module example 206 respectively. Result state 234. After each software function module example 206 executes a predetermined number of (eg, 100, one) instructions, the simulated initial state 214A of the first core that is copied to the first core simulation result state 234A will be The update becomes the true first core simulation result state 234A, and the simulated initial state 214B of the second core copied to the second core simulation result state 234B will also be updated to become the true second core simulation result state. 234B. The comparison unit 226 compares the simulation result state 234A of the first core with the actual result state 224A of the first core, and compares the simulation result state 234B of the second core with the actual result state 224B of the second core to determine the true Whether the dual-core processor 1〇2 is in the 100106953 Form No. A0101 Page 13/47 page 1002011767-0 201133232 There is an error between the first checkpoint and the second checkpoint, and the comparison result is taken by the pass indicator (pass/fai) 1 indicator) 228 pointed out. In addition, after each predetermined number (eg, 100) of instructions is executed by each software function module instance 206, the value of the simulated initial memory image 212 copied to the simulation result memory image 232 will be updated. Becomes a real simulation result memory image 232. The comparison unit 226 compares the simulation result memory map 232 with the actual result memory map 222 to determine whether the true dual core processor 102 has an error between the first and second checkpoints, and the comparison result is taken from / No indicator indicated. [0〇21] Thus, by using the rate control [黯'2:0:4 intermediary.. advantage, the heartbeat signal information 122 can be used to dynamically control the rate at which each software function module instance 206 executes the instructions. That is to say, the controller 204 can control the order in which the software function module examples 206 execute instructions between each other, so that the instructions can be executed in an appropriate order according to the proper order of the core memory, so that The actual operating behavior of the dual-core processor 102 is simulated accurately from the actual initial 丨 ';:::::,.. state of each core 104 and memory 11 2, ie the comparison unit 226 can actually double core The reason why the operational behavior of the processor 102 is compared with its operational behavior. [0022] Please refer to the third figure, which is a flowchart of a method for operating the simulation environment 124 of the second figure disclosed in the present invention. As shown in the third figure, the software function module simulation environment 124 first generates a simulation result memory map 232 and a simulation result state 234 according to step S1406 in FIG. 14 which will respectively correspond to the actual result memory map 222. And the actual result state 224 is compared (step S1408). The flow begins in step S302. [0023] In step S302, the rate controller 204 receives the heartbeat 100106953 form number A0101 page 14/47 page 1002011767-0 201133232 information 122 from the folder 116. The flow then proceeds to step S304. In step S304, the rate controller 204 checks the value of the heartbeat signal 106 of each core in the next clock cycle t of the heartbeat signal 1〇6 indicated by the heartbeat signal information 122. The value of the heartbeat signal 106 will be further described in the following embodiments and diagrams, and the flow proceeds to step S306. In step S306, the rate controller 204 determines whether the core N (the first core W4A or the second core i〇4B) generates a heartbeat. If so, step S308 is performed; otherwise, returning to step 304 to detect the next clock cycle. Ο In step S308, the rate controller 2Q4 issues a command 218 to drive the software function module of the core N to perform one or more commands according to the determined heartbeat information, which will be detailed later. 'The flow then proceeds to step S312. , ! Then, in step S312, the software function module example 206 of the core N executes the instruction-related instruction or the simulation result memory image 232 and the simulation result state 234. If a memory read command is executed, the core N software function module example 2 〇 6 will read the modulo _ memory image 232. If a memory write command is executed, the software function module example 206 updates the simulation result memory map 232 to the core n. Then, it returns to step S304 to continue checking the next clock cycle. The instruction execution instructions 105, the heartbeat generator 1〇3, the heartbeat signal 106, and various embodiments used by the rate controller 204 will be described below. Please refer to the fourth figure, which is a functional block diagram of a specific embodiment of the dual core processor disclosed in the present invention. In the fourth and fifth figures, the core 104 has the same clock rate as the heartbeat signal 106. In addition, core 104 can execute an instruction at each core clock cycle. As shown in the figure, each core 104 100106953 instruction execution instruction 1 〇 5 is a bit 'if the core 1 〇 4 at the core form number A0101 page 15 / total 47 pages 1002011767-0 201133232 pulse cycle completed ~ thorough

個扣々時,則該位元為真(true),否則 為假(f a 1 s e)。ρη 说 t L 问樣地,心跳產生器103產生一位元的心 跳訊號106A,甚每 軍一核心104A有在核心時脈週期内完成 一個指令時,則含务〜-认士 、'^位兀為真,否則為假;且心跳產生器 103產生位疋的心跳訊號1_,若第二核心104B有在 核。時脈週期内完成__個指令時,則該位元為真,否則 為假,、、'而,須注意在指令執行指示105及其相關心跳訊 號106產生時可能會有延遲時間。在實施例中,由於核心 104和心跳訊號106會根據不同的時脈來源運作,因此心 跳產生器103包括同步邏輯電路,用以同步指令執行指示 105及心跳訊號1〇6。 °月參考第五圖’其揭不在第四圖實施例中,速率控制器 204之操作例示表。圖表中包括六個時脈週期,標示為 0-5 ’其對應於速率控制器204在第三圖步驟S302中,所 收到的心跳訊號資訊12 2中的六個時脈。速率控制器2 〇 4 從心跳訊號資訊122中收到*的_一核心1〇4Α的心跳訊號 106A以及第二核心1〇4B的心跳訊號ΪΘ6Β,亦如圖所示。 此外’在每個時脈週期中,根據第三圖中的步驟S306之 判斷’圖表標示了速率控制器204在所對應的模擬時脈週 期期間,是否控制第一核心之軟體功能模組範例2〇6A去 執行指令,以及是否控制第二核心之軟體功能模組範例 206B去執行指令。在本例示中,第一核心1〇6A在時脈l 3-5的心跳訊號ι〇6Α為真,因此速率控制器2〇4在模擬的 時脈1,3-5期間,將命令第一核心之軟體功能模組範例 206片執行指令。另一方面,第二核心106B在時脈0_2, 4 的心跳訊號106B為真,因此速率控制器2〇4在模擬的時脈 100106953 1002011767-0When the button is deducted, the bit is true (true), otherwise it is false (f a 1 s e). Ρη says t L. The heartbeat generator 103 generates a one-bit heartbeat signal 106A. Whenever a military core 104A has completed an instruction in the core clock cycle, it has a service ~-recognition, '^ bit兀 is true, otherwise it is false; and the heartbeat generator 103 generates a heartbeat signal 1_ located if the second core 104B is in the core. When the __ instruction is completed in the clock cycle, the bit is true, otherwise it is false, , ', and it should be noted that there may be a delay time when the instruction execution instruction 105 and its associated heartbeat signal 106 are generated. In the embodiment, since the core 104 and the heartbeat signal 106 operate according to different clock sources, the heartbeat generator 103 includes synchronization logic for synchronizing the instruction execution indication 105 and the heartbeat signal 〇6. The month is referred to the fifth figure, which is not an operation example table of the rate controller 204 in the fourth embodiment. The chart includes six clock cycles, labeled 0-5' which correspond to the six clocks in the received heartbeat information 12 2 of the rate controller 204 in the third step S302. The rate controller 2 〇 4 receives the heartbeat signal 106A of the _1 core 1〇4Α and the heartbeat signal ΪΘ6Β of the second core 1〇4B from the heartbeat signal information 122, as shown in the figure. In addition, in each clock cycle, according to the judgment of step S306 in the third figure, the graph indicates whether the rate controller 204 controls the first core software function module during the corresponding analog clock cycle. 〇6A goes to execute the instruction, and whether to control the second core software function module example 206B to execute the instruction. In this example, the first core 1〇6A is true at the heartbeat signal ι〇6Α of the clocks 3-5, so the rate controller 2〇4 will command the first during the simulated clock 1,3-5. The core software function module example 206 pieces of execution instructions. On the other hand, the second core 106B is true at the heartbeat signal 106B of the clock 0_2, 4, so the rate controller 2〇4 is in the simulated clock 100106953 1002011767-0

表單編號A010】 第〗6頁/共47頁 201133232 0-2,4期間,將命令第二核心之軟體功能模組範例206B 執行指令。Form No. A010】第6页/total 47 pages 201133232 During 0-2,4, the second core software function module example 206B will be executed to execute the command.

請再參考第六圖’係為本發明所揭示之雙核心處理器另 一實施例之功能方塊圖。第六、七圖相似於第四、五圖 ,核心104與心跳訊號的時脈速率相同。然而,在本 實施例中,核心1 〇4在每一核心時脈週期可執行完多個指 令,例如三個,亦可為其他數量,不以揭露者為限。如 圖所示,每個核心104的指令執行指系105都是兩個位元 ,用來指出核心104在一個榨心時脈遞期内執行完的指令 數量。同樣地,心跳產生器103產生兩俾元的心跳訊號 106A,用來指出第一核心104A在一個核:心時脈週期内所 執行完的指令數量’並產生兩位元的心跳訊號106B ’用 來指出第二核心104B在一個核心時脈週期内執行完的指 令數量。Please refer to the sixth figure again for a functional block diagram of another embodiment of the dual core processor disclosed in the present invention. The sixth and seventh graphs are similar to the fourth and fifth graphs, and the core 104 has the same clock rate as the heartbeat signal. However, in this embodiment, the core 1 可执行 4 can execute multiple instructions in each core clock cycle, for example, three, or other numbers, not limited to the exposer. As shown, the instruction execution fingers 105 of each core 104 are two bits that indicate the number of instructions that the core 104 has executed during a heartbeat. Similarly, the heartbeat generator 103 generates a two-dimensional heartbeat signal 106A for indicating the number of instructions executed by the first core 104A in a core: cardiac clock cycle and generating a two-dimensional heartbeat signal 106B. To indicate the number of instructions that the second core 104B has executed in one core clock cycle.

請參考第七圖,係為本發明所揭示之依據第六圖實施例 之速率控制器之操作例示表,其與第五圖的圖表相似。 然而,如圖所示,在每個時脈週期中,速率控制器204從 心跳訊號資訊122中收到的第一核心104A的心跳訊號 106A之數值會包含0-3,而不是〇(假)或1(真)。同樣地 ,在每個時脈週期中,根據第三圖中的步驟S306之判斷 ,圖表標示了速率控制器204在所對應的模擬時脈週期期 間,是否根據第三圖中的步驟s3〇8,會控制第一核心之 軟體功能模組範例206A去執行指令(如果是,會標示有多 少指令被執行)、以及是否控制第二核心之軟體功能模組 範例206B去執行指令(如果是’會標示有多少指令被執行 2的心跳訊號 )。本例示中,第一核心104A在時脈〇, 100106953 表單煸號A0101 第17黃/共47頁 1002011767-0 201133232 6A疋〇在%脈4時疋1 ’在時脈3時是2,以及在時脈 1’ 5時是3,因此,速率控制器2〇4將命令第一核心之軟 體功能模組範例206Α在模擬的時脈〇,2期間,執行〇個 指令;在模擬的時脈4期間,執行1個指令;在模擬的時 脈3期間,執行2個指令;以及在模擬的時脈丨,5期間, 執行3個指令。另一方面,第二核心1()4Β在時脈3,$的 心跳訊號1G6B是〇,在時脈〇,4時是卜在時脈〗,2時是 2,而沒有任—時脈的心跳訊號1〇6Β為3,因此,速率控 制器204命令第二核心之軟體功能模組範例2_在模擬工的 夺脈3 ’ 5期間,執行〇個指令;在模擬的時脈〇,4期間 •’執行1個指令;在模擬的時脈1,2期間,執行2個指令 ’而不會有任—時脈執行3個指令。 明參考第人圖’係為本發明所揭示之雙核心處理器之又 -具體實施例之功能方塊圖。第八、九圖相似於第六、 七2 ’母個核心1()4在每—核心'時脈週期可完成多個指令 副^時施例中,核心軸時脈速率是心跳訊號 6之時脈速率的好幾倍,例如鳴, 不以揭露者Am 為其他數篁, 為限。此外,每個核心1〇4的指令執行指示 該位元Ur立疋’當核心104完成—預訂數量指令,則 所述的 ),否則為假(falSe)。在實施例中, ,預訂量為32個’但不以揭露者為限。具體來說 量至少要與時脈比率(cl 核心104可於—n± , ratl0)以及母一 大。_實…;7脈週期中完錢最大數量指令之積-樣 實施例中,核心104的完成單 . 包括一計數哭Γ X早凡(retlre unit) 100106953 第18頁/共47頁 成的指令^ Hnte〇 ’隸計數―期中所完 表單編號峨 指令執行指示1G5是計數器的有效位元 100201176 201133232 (effectively bit)M(M = log2N),其中 Ν是時脈比率 以及核心104可於一時脈週期中完成的最大數量指令之積 °同樣地,心跳產生器103依據指令執行指示ι05Α產生一 位元的心跳訊號106Α ’以及依據指令執行指示1〇53產生 —位元的心跳訊號106Β。 ΟPlease refer to the seventh figure, which is an operation example of the rate controller according to the sixth embodiment of the present invention, which is similar to the chart of the fifth figure. However, as shown, in each clock cycle, the value of the heartbeat signal 106A of the first core 104A received by the rate controller 204 from the heartbeat signal information 122 will include 0-3 instead of 假 (false). Or 1 (true). Similarly, in each clock cycle, according to the judgment of step S306 in the third figure, the graph indicates whether the rate controller 204 is in accordance with the steps s3 〇 8 in the third figure during the corresponding analog clock cycle. , will control the first core software function module example 206A to execute the instruction (if it will indicate how many instructions are executed), and whether to control the second core software function module example 206B to execute the instruction (if it is 'will Indicates how many instructions are executed 2 heartbeat signals). In this example, the first core 104A is in the clock, 100106953, the form nickname A0101, the 17th yellow/total 47 pages, 1002011767-0, 201133232, 6A 疋〇 at the time of the pulse 4, 疋 1 ' at the time of the clock 3 is 2, and in The clock 1' 5 o'clock is 3, therefore, the rate controller 2 〇 4 will command the first core software function module example 206 Α during the simulated clock 〇, 2, execute one instruction; in the simulated clock 4 During the execution, one instruction is executed; during the analog clock 3, two instructions are executed; and during the simulated clock period, five, three instructions are executed. On the other hand, the second core 1 () 4 Β at clock 3, $ heartbeat signal 1G6B is 〇, in the clock 〇, 4 is the clock at 2, 2 is 2, and there is no - clock The heartbeat signal 1〇6Β is 3, therefore, the rate controller 204 commands the second core software function module example 2_ during the simulation of the pulse 3 '5, executing one instruction; in the simulated clock, 4 During the period • 'execute 1 instruction; during the analog clock 1, 2, execute 2 instructions' without any task - the clock executes 3 instructions. BRIEF DESCRIPTION OF THE DRAWINGS The present invention is a functional block diagram of a dual core processor disclosed in the present invention. The eighth and ninth diagrams are similar to the sixth and seventy-two 'mother cores 1 () 4 in the per-core 'clock cycle can complete multiple instruction sub-times, the core axis clock rate is the heartbeat signal 6 The speed of the clock is several times, for example, it is not limited to the number of other people. In addition, the instruction execution of each core 1.4 indicates that the bit Ur is "received when the core 104 completes - the number of reservation instructions", otherwise it is false (falSe). In an embodiment, the booking amount is 32 'but not limited to the disclosure. Specifically, the amount must be at least the ratio of the clock (cl core 104 can be -n±, ratl0) and mother. _ real...; the product of the maximum number of instructions in the 7-pulse cycle - in the example embodiment, the completion list of the core 104. Includes a count of crying X. The first (retlre unit) 100106953 page 18 / 47 pages of instructions ^ Hnte〇's count - the form number in the period 峨 instruction execution instruction 1G5 is the valid bit of the counter 100201176 201133232 (effectively bit) M (M = log2N), where Ν is the clock ratio and the core 104 can be in a clock cycle Similarly, the heartbeat generator 103 executes the heartbeat signal 106'' which generates the one-bit element according to the instruction, and the heartbeat signal 106Β which generates the bit according to the instruction execution instruction 1〇53. Ο

請參考第九圖,係為本發明所揭示之依據第八圖實施例 之操作速率控制器之操作例示表。本實施例的圖表與第 五圖相似’但須注意第五圖和第七圖中,因為核心1 〇 4之 時脈速率與心跳訊號106之時脈速岸是一樣的,每時脈週 期的心跳訊號資訊122指出在相對應的拔心時脈週期中完 成—或多個指令’因此,圖中.的核心和心跳_訊號1 〇 6的時 脈週期是相互對應的(時脈0,..-5然.而在..第九圖中,每 時脈週期的心跳訊號資訊122係指出在多個時脈週期中所 疋成的指令數量,因此,圖中所示的時睬週期係僅與心 跳Λ號1 〇6相對應。此外’在每個時脈週期中,根據第三 圖中的步驟S306之判斷,圖表指出了速率控制器2〇4在所 對應的模擬時脈週期期間’是否如第三圖的步驟S308所 述,係控制透過命令218Α指未第一核心之軟體功能模組 範例206Α去執行32個指令,以及是否控制透過命令21祁 指示第二核心之軟體功能模組範例20 6Β去執行32個指令 。本例示中,第一核心106Α在時脈1,5的心跳訊號ι〇6α 為真’因此’在心跳訊號106的時脈1,5時會模擬出32 個時脈’速率控制器2〇4便在模擬出的32個時脈期間,透 過命令218Α指示第一核心之軟體功能模組範例2〇6Α執行 32個指令。另一方面’由於第二核心106Β在時脈〇,2 4的心跳訊號1〇6Β為真,因此’在心跳訊號106的時脈〇 100106953 表單編號Α0101 第19頁/共47頁 1002011767-0 201133232 ,t A擬出32個時脈’而速率控制器2Q4便在模擬 出的32個時_間,透過命令2Ub指示第—核心之軟體 功能模組範例206B執行32個指令。 請參考第十圖,係為本發明所揭*之雙核心處理器之再 -具體實施例之功能方塊圖。第十、十—圖相似於第八 、九圖,核_4在每-核4脈週期可完舒個指令, 且當核心m完成-預訂數量指令(如32個),則—位元的 心跳訊號⑽為真,否則指令執行指示1〇5為假 。然而’本實施例中與第六圖相似,每個指令執行指示 105是兩位元,用來指出在每時脈坶期中,核心ι〇4完成 的指令數量。本實施例中’ -s核心叫完成預定數量的 指令,心跳產生器103便在心跳訊號1〇6上產生一真值。 在實施例中’心跳產生器1()3包括—計數器(cGunter), 用來計數每時脈週期中已完成的指令數量,而心跳訊號 106是計數器的有效位元M(M = i0g2Nh 請參考第十-,係為本發明所揭示之依據第十圖實施 例之速率控制器之操作例料。树_圖表與第九圖 相似,所接收到的心跳訊號資訊122、圖示意思也相同, 在此就不予贅述。 請參考第十二圖,係為本發明所揭示之雙核心處理器之 100106953 更—具體實施例之功能方塊圖。第十二、十三圖相似於 第十、十—圖’核心、1〇4在每—核心時脈週期可完成多個 指令,且核心104之時脈速率是心跳訊號1〇6之時脈速率 的好幾倍。每個指令執行指示105是兩位元,用來指出在 每時脈週期中,核心104所完成的指令數量。然而,本實 施例更包括一除錯記憶體陣列(debug mem〇ry打 表單編號A0101 第20頁/共47頁 1002011767-0 201133232 ray)1212,心跳產生器l〇3根據指令執行指示1〇5來寫入 心跳訊號資訊122至除錯記憶體陣列1212。一在實施例中 ,心跳產生器103是將每個時脈週期所收到的指令執行指 示105寫進除錯記憶體陣列1212。心跳產生器1〇3隨後再 從除錯記憶體陣列1212讀出心跳訊號資訊122,並在透過 處理器匯流排114上將其寫入系統記憶體112中。當心跳 说说為訊122寫進系統記憶體112時,邏輯分析器log便 擷取了心跳訊號資訊122。心跳產生器103係將心跳訊號 資訊122寫入系統記憶體112的一豫定位址,如此便致能 一來邏輯分析器108僳能將其儲存至資料失116,而寫入 的心跳訊號資訊122在之後會被速率控制器204使用,一 如第三圖之所示。本實施例的心跳訊號資訊122係與第六 、七圖的相似,亦即每時脈的心跳訊號資訊122係用以指 出核心104在該時脈所完成的指令數量。心跳產生器1 〇3 會對雙核心處理器102的一匯流排介面單元(bus interface unit)1216 產生需求 (reques:ts) , 匯流排介面單 元1216係使雙核心處理器1〇2連接至焉理器匯流排η4的 介面。根據一在實施例中,心跳產生器103產生的需求是 最低優先權的需求,其可傳至匯流排介面單元1216,並 當處理器匯流排114閒置(idle)時,匯流排介面單元 1216才試著產生在處理器匯流排114上的傳輸,以將心跳 訊號資訊122從除錯記憶體陣列1212寫進系統記憶體112 。如此可能減少因侵入性地在處理器匯流排114寫入心跳 訊號資訊122(相對於第四至十一圖以及第十五至十六圖 中,在非侵入式的側波帶匯流排(noninvasive sideband bus) 上寫入心跳訊號資訊 122) , 因 而影響雙核心 100106953 表單編號A0101 第21頁/共47頁 1002011767-0 201133232 處理器1 02操作時序之可能性,是以當心跳特徵被啟動後 錯誤不再出現。心跳產生為1 〇 3監控整個除錯記憶體陣列 1212,當發現除錯δ己憶體陣列1212快被寫滿時,便與匯 流排介面單元121 6通訊來提高需求的優先權,以盡快寫 入心跳訊號資訊122。在實施例中’除錯記憶體陣列1212 係類似於雙核心處理器1〇2的L2快取記憶體(L2 cache memory)1214的記憶體陣列,都被核心1〇4共享。雙核心 處理器1〇2可更佳地設計成將除錯記憶體陣列1212設計成 為L2快取記憶體1214的附屬物備份,藉此可共享通用的 控制邏輯與電路佈局。一在實施例中,除錯記憶體陣列 1212的容量是夠大的,以至於在兩個檢查點之間隔便相 對夠小,所以在檢查點到達之前心跳產生器1〇3都不需要 再寫入心跳訊號資訊!22,這個好處是基於利用非侵入性 的方式來寫入心跳資訊而來。 請參考第十三圖,係為本發明所揭示之依據第十二圖實 施例之速率控制器之操作翁示表。錢_圖表與第七 圖相似,所接收到的心跳訊號資訊122、圖示意思也相同 ,在此就料贅述。但須注意相對於第十二圖而言本實 施例沒有心跳⑽1Q6 ’因此每時脈週期的心跳訊號資訊 122指出每時脈所完成的指令數量,其中圖中所示的時脈 週期0-5係與核心時脈週期相對應。 上述實施例的相對優缺點將在以下詳述。第四、五圖以 及第八至十三_優點是’僅需在多核心處理器封裝上 100106953 亦即每核心104—個接 七圖在晶片大小上更 七圖的其每個核心1〇4 1002011767-0 安置較小數量的外部接腳(pin), 腳。因此,這些實施例會比第六、 具優勢(scalable),因為第六、 表單編號A〇1〇1 第22頁/共47頁 201133232 需要多個接腳,若核心104數量更多,這個問題更明顯。 ❹ 但有鑒於目前時下的微處理器都是超純量架構 (superscalar),且能於每時脈週期執行多個指令,第 四、五圖之實施例卻限制每核心1 0 4只能在每核心時脈週 期中完成單一指令。相反地,第六至十三圖之實施例的 優點在於,可支援核心104於每時脈週期中完成多個指令 。此外,第四至七圖的實施例限制核心週期與心跳訊號 106匯流排之週期速率相同,但有鑒於許多時下的微處理 器之時脈速率都很高,使得實作上無法隨核心時脈速率 來運作一外部匯流排。相反地,第八至十三圖以及第十 五至十六圖(描述在後)的優點在於,可支援核心週期頻 率為心跳訊號106的匯流排頻率的倍數之架構。如上所述 ,第十二、十三圖的實施例之缺點是,其為具侵入性而 可能會影響多核心處理器之程式執行的時序,如此當致 能心跳時,可能會導致錯過一些錯誤。然而,第十二、 十三圖的實施例之優點在於,不需要額外的外部接腳, Ο 而這個優點在某些應用上卻是必須的。 請參考第十四圖,係為本發明所揭示之第二圖之操作模 擬環境之方法流程圖。軟體功能模組模擬環境124被用來 判斷在第一及第二檢查點之間是否有發生錯誤,亦可被 運用在系統100操作了 一段時間且產生許多組檢查點之後 ,用以判斷儲存在第一圖的資料夾116中的多組第一及第 二檢查點之間是否有發生錯誤。流程起始於步驟S1402。 首先,步驟S1402中,實際結果產生器208使用處理器匯 流排傳輸資訊118來產生由雙核心處理器102在兩檢查點 100106953 之間,執行預定數量指令的實際結果記憶體映像222以及 表單編號A0101 第23頁/共47頁 1002011767-0 201133232 實際結果狀態224,一如第二圖之所示。流程接著前往步 驟S1404 。 接著,步驟S1404中,模擬初始狀態產生器202使用處理 器匯流排傳輸資訊11 8來產生模擬初始記憶體映像21 2以 及模擬初始狀態214,一如第二圖之所示。流程接著前往 步驟S1406。 步驟S1 406中,複製模擬初始記憶體映像212到模擬結果 記憶體映像232,且複製第一核心之模擬初始狀態214A到 第一核心之模擬結果狀態234A,以及複製第二核心之模 擬初始狀態214B到第二核心之模擬結果狀態234B。隨後 ,速率控制器204以及軟體功能模組範例206使用複製的 映像來更新模擬結果記憶體映像232以及模擬結果狀態 234,一如第三圖之所示。須注意在第三圖的操作中,於 每個核心的指令執行裡,可能發生其中一個核心104執行 一記憶體寫入指令、而另一個核心104執行一記憶體讀取 指令,諸如上述的信號寫入與讀取(semaphore write and read)的情形。如果在這類的讀貧動作之間的指令 數量小於心跳訊號資訊122的間隔(granularity),則 在雙核心處理器102的實際執行期間,會存在多種可能影 響實際記憶體存取的順序。因此,軟體功能模組模擬環 境124若偵測到這個情形,將假設記憶體存取的可能順序 ,再依照該可能順序來執行步驟S1406,並記錄操作情形 。須注意第八至十一圖之實施例所產生的心跳訊號資訊 122,其間隔大於第六至七圖以及第十二至十三圖;而第 十二至十三圖之實施例所產生的心跳訊號資訊1 2 2之間隔 又大於第四至五圖。較大的間隔會使軟體功能模組模擬 100106953 表單編號A0101 第24頁/共47頁 1002011767-0 201133232 環境124需要更多的時間來完成第十四圖的操作程序因 為記憶體存取順序之數量可能會很多。流程接著前往步 驟S1408 。 步驟S1408中,比較單元226將步驟S1406產生的模擬結 果與步驟S1402產生的實際結果進行比較,流程接著前往 步驟S1412。 在步驟S1412中,將判斷比較結果是否相符合,如果是, 則進行步驟S1414,否則進行步驟S1416。 步驟S1414中,比較單元226在取否指示器228上產生一 取值(pass value) ’流程中止於步驟S1414。 在步驟S1416中,軟體功能模組模擬環境124判斷是否有 其他可能的記憶體存取順序:未被執行,如果有,則回到 步驟S1406,並使用不同的記憶體存取順序來操作;否則 ,則進行步驟S1418。 在步驟S1418中,比較單元226在取否指示器228上產生 一否值(fail value)。流程中止於步驟S1418。 請參考第十五圖,係為本發明第一圖所揭示雙核心處理 器102之另一實施例之功能方塊圖。第十五、十六圖相似 於第十、十一圖,每個核心104在每一核心時脈週期讦70 成多個指令’且核心104之時脈速率是心跳訊號之時 脈速率的好幾倍,每個指令執行指示1〇5是兩位元’用來 指出每時脈週期中,核心1〇4所完成的指令數量。然而’ 本在第十五圖的實施例裡,的在每個匯流排時腺週期中 ,心跳產生器103產生兩位元的心跳訊號l〇6A,106B ’ 其分別指出第一核心104A及第二核心104B執行所元成的 指令數量為0,8,16,或32。 100106953 1002011767*0 表單編號A0101 第25頁/共47頁 201133232 乂考第十六圖,係為本發明第二圖所揭示之速率控制 益 :笛」 啦·弟十五圖實施例下之操作例示表。本實施例圖 表部份與第七、九十一圖相似如圖所示每核心時 脈中’速率控制器204從心跳訊號資訊122中接收的心跳 亿號106之值從〇至3。同樣地,在每個時脈週期申,根據 第—圖中的步驟S306之判斷,圖表標示了速率控制器2〇4 根據第二圖中的步驟S308,在所對應的模擬時脈週期期 間,是否透過命令218A以控制第一核心之軟體功能模組 =例206A去執行指令(如果是,會標示有多少指令被執行 从及是否透過命令218β來控制第二核心之軟體功能模 二範例206Β去執行指令(如果是,會標示有多少指令被執 仃)。在本例示中,第一核心1〇4Α在時脈〇,2的心跳訊 號是〇,在時脈4時是!,在時脈3時是2,以及在時 脈1,5時是3。因此,速率控制器2〇4透過命令218八指示 第核心之軟體功能模組範例206Α在模擬的時脈〇,2期 間執行0憾令;在模擬的時脈_眺_個指令;在模 擬的時脈3期間執行1 6個指令;,及職擬的時脈】,5期 間執行32個指令。另一方面,第二核心mB在時脈3 5 的心跳訊號是〇,在時脈〇, 4時是】,在時則,’2時 是2 ’而沒有任—時脈的心跳訊號為3。因此’速率控制 器204會透過命令218B控制第二核心之軟體功能模組範例 206B在模擬的時脈3, 5期間執行〇個指令;在模擬的時 脈〇’ 4期間執行8個指令;在模擬的時脈u 2期間執行 16個指令;而不會有任一時脈執行”個指令。 第十五'十六圖之實施例的優點, 表單編號A0101 100106953 第26頁/共47頁 疋在第十四圖的操作 下,可提供的間隔比第八至十1之實施例更好。此外 100201171 201133232 ,本發明實施例中的心跳訊號106,亦可用在具有更細間 隔的例子中,而在這種情形下的錯誤模式並不需要讓每 個核心104皆重新產生錯誤。例如,假設在側波帶匯流排 上有八個心跳訊號106,多核心處理器包括四個核心104 ,但只必須運作兩個核心1 04來重新產生錯誤。這個情況 下,心跳產生器1 03編寫成只使用八個心跳訊號1 06位元 中的四個,用來分別指出每個核心104完成的指令數量 (0,2,4,6,8,10,12,14,16,18,20,22, 24,26,28,或32個)。 ❹ 雖然上述實施例中,每核心104係執行單一執行緒,但亦 可設計成同時執行多個執行緒,並由心跳資訊指出執行 緒所完成的指令。 此外,雖然上述實施例中,兩核心104具有相同的核心時 脈週期,但亦可具有不同的核心時脈週期,而由心跳訊 號資訊122指出兩核心速率,而速率控制器204在產生命 令218時便可以列入考量。 [0024] 另一實施例是使用Veri log模擬器來模擬實際的處理器, 其致能除錯器,以在任何時間存取處理器的任何通信連 結(net),包括用來標示每核心執行指令及存取記憶體的 次數之訊號。這些訊號可致能除錯器來提供資訊給軟體 功能模組,如此便如同實際的處理器(或至少如石繼處理 器的Veri log模擬),可同時執行指令及存取記憶體。然 而,這樣會帶來三個缺點:第一,取決於模擬中的時脈 週期/指令的數量,Veri log模擬器會耗費非常大量的系 統資源及時間,因而使Verilog模擬變成是不太可能實施 100106953 表單編號A0101 第27頁/共47頁 1002011767-0 201133232 的方式,而且無法解決某些類型的錯誤。第二,Veri l〇g 模擬器模擬出的運作行為,一定與實際的處理器之運作 行為不同。第三’ Veri log模擬器模擬出的解決方法,需 要把處理器設計成具有理想每時脈一狀態重現功能 (perfect state-pei-clock replay)的能力,這是 很難實作出的。一般來說,具有理想每時脈一狀態重現 功能之微處理器可隨一輸入狀態(input state)被載入 ,該輸入狀態定義整個處理器的狀態,意即,沒有任何 處理器的狀態不能藉由載入輸入狀態來獲得初始化。而 本發明所提的實施聽轉有上述VeH〗。謹擬^的缺點 〇 [0025] 儘管本發明描述各種實施例’但不以揭露者為限,任何 熟習電腦相關技術領域者,皆可依據需求修改本發明所 揭露之實施例,然所有不脫離本發明精神之變更仍應包 含在後續的專利範圍中。例如,軟體可實作功能、架構 、核組i擬且/或上述各裝置由使用一般程 式語言(如C’ C++)、硬體描述語言(一一― scription languages,hdl),包括v 語言等,或其他可用 肌)匕括、il〇g硬體描述 、 之程式,來實作本發明所述之軟體Please refer to the ninth figure, which is an operation example of the operation rate controller according to the eighth embodiment of the present invention. The chart of this embodiment is similar to the fifth figure 'but note that in the fifth and seventh figures, since the clock rate of the core 1 〇 4 is the same as the clock speed of the heartbeat signal 106, per clock cycle The heartbeat signal information 122 indicates that the corresponding heartbeat cycle is completed - or multiple instructions'. Therefore, the core period of the graph and the heartbeat_signal 1 〇6 clock cycles correspond to each other (clock 0,. . . . However, in the ninth diagram, the heartbeat signal information 122 per clock cycle indicates the number of instructions formed in a plurality of clock cycles, and therefore, the time period shown in the figure It only corresponds to the heartbeat 11 〇 6. In addition, in each clock cycle, according to the judgment of step S306 in the third figure, the graph indicates that the rate controller 2 〇 4 is during the corresponding analog clock cycle. 'Whether as described in step S308 of the third figure, the control passes the command 218 to refer to the software function module example 206 of the first core to execute 32 instructions, and whether to control the software function module of the second core through the command 21祁Group example 20 6 to perform 32 instructions. In this example, The first core 106 is at the heartbeat 1, 5 of the heartbeat signal ι〇6α is true 'so' at the clock of the heartbeat signal 106, 5 will simulate 32 clocks 'rate controller 2 〇 4 is simulated During the 32 clock periods, 32 commands are executed by the command 218Α indicating the first core software function module example 2〇6Α. On the other hand, 'because the second core 106 is at the clock, the heartbeat signal of the 4 4 is 6〇 True, so 'in the heartbeat signal 106 〇100106953 form number Α0101 page 19 / total page 471002011767-0 201133232, t A draws 32 clocks' and the rate controller 2Q4 is in the simulation of 32 At the time _, the command 2Ub indicates that the first core software function module example 206B executes 32 instructions. Please refer to the tenth figure, which is a functional block of the dual core processor of the present invention. Fig. Tenth, tenth-graph is similar to the eighth and ninth diagrams, and the core_4 can complete the instruction in the 4-pulse cycle of each-core, and when the core m completes-subscribes the number of instructions (such as 32), then The heartbeat signal (10) of the element is true, otherwise the instruction execution instruction 1〇5 is false. However, In the example, similar to the sixth figure, each instruction execution instruction 105 is a two-digit number for indicating the number of instructions that the core 〇4 completes in each clock cycle. In this embodiment, the '-s core is called to complete a predetermined number. The heartbeat generator 103 generates a true value on the heartbeat signal 1〇6. In the embodiment, the heartbeat generator 1()3 includes a counter (cGunter) for counting the completion of each clock cycle. The number of instructions, and the heartbeat signal 106 is the effective bit M of the counter (M = i0g2Nh, please refer to the tenth-, which is an operation example of the rate controller according to the tenth embodiment disclosed in the present invention. The tree_chart is similar to the ninth figure, and the received heartbeat signal information 122 has the same meaning and will not be described here. Please refer to the twelfth figure, which is a functional block diagram of a more specific embodiment of the dual-core processor disclosed in the present invention. The twelfth and thirteenth figures are similar to the tenth and thirteenth graphs. The core, the first four can complete multiple instructions in each core clock cycle, and the clock rate of the core 104 is the clock of the heartbeat signal 1〇6. Several times the rate. Each instruction execution indication 105 is a two-bit element that indicates the number of instructions that the core 104 has completed in each clock cycle. However, the embodiment further includes a debug memory array (debug mem 〇 ry form number A0101 page 20 / total page 47 1002011767-0 201133232 ray) 1212, the heartbeat generator l 〇 3 according to the instruction execution instruction 1 〇 5 The heartbeat signal information 122 is written to the debug memory array 1212. In one embodiment, the heartbeat generator 103 writes the instruction execution instructions 105 received for each clock cycle into the debug memory array 1212. The heartbeat generator 1〇3 then reads the heartbeat signal information 122 from the debug memory array 1212 and writes it to the system memory 112 on the processor bus 114. When the heartbeat is said to be written into the system memory 112, the logic analyzer log retrieves the heartbeat signal information 122. The heartbeat generator 103 writes the heartbeat signal information 122 to a location of the system memory 112, so that the logic analyzer 108 can store it to the data loss 116, and the heartbeat signal information 122 is written. It will then be used by the rate controller 204 as shown in the third figure. The heartbeat signal information 122 of this embodiment is similar to that of the sixth and seventh figures, that is, the heartbeat signal information 122 of each clock is used to indicate the number of instructions that the core 104 completes at the clock. The heartbeat generator 1 〇3 generates a demand (reques:ts) for a bus interface unit 1216 of the dual core processor 102, and the bus interface unit 1216 connects the dual core processor 1〇2 to the 焉The interface of the processor bus η4. According to an embodiment, the demand generated by the heartbeat generator 103 is the lowest priority requirement, which can be passed to the bus interface unit 1216, and when the processor bus 114 is idle, the bus interface unit 1216 is A transmission on processor bus 114 is attempted to write heartbeat signal information 122 from debug memory array 1212 into system memory 112. It is thus possible to reduce the invasive write of the heartbeat signal information 122 in the processor bus 114 (relative to the non-invasive sideband busbars (relative to the fourth to eleventh and fifteenth to sixteenth figures). Sideband bus) writes heartbeat signal information 122), thus affecting dual core 100106953 Form No. A0101 Page 21 / Total 47 Page 1002011767-0 201133232 The possibility of processor 1 02 operation timing is that the error is when the heartbeat feature is activated No longer appear. The heartbeat generates 1 〇 3 to monitor the entire debug memory array 1212. When the debug δ replied array 1212 is found to be full, it communicates with the bus interface unit 121 6 to increase the priority of the request, so as to write as soon as possible. Into the heartbeat information 122. In the embodiment, the 'debug memory array 1212 is a memory array similar to the L2 cache memory 1214 of the dual core processor 1〇2, and is shared by the cores 〇4. The dual core processor 201 can be better designed to back up the debug memory array 1212 as an appendix to the L2 cache memory 1214, thereby sharing common control logic and circuit layout. In an embodiment, the capacity of the debug memory array 1212 is large enough that the interval between the two checkpoints is relatively small, so that the heartbeat generator 1〇3 does not need to be written before the checkpoint arrives. Into the heartbeat information! 22. This benefit is based on the use of a non-intrusive way to write heartbeat information. Please refer to the thirteenth figure, which is an operation table of the rate controller according to the twelfth embodiment of the present invention. The money_chart is similar to the seventh picture, and the received heartbeat information 122 and the meaning of the icon are also the same, and are described here. However, it should be noted that the present embodiment has no heartbeat (10) 1Q6 with respect to the twelfth figure. Therefore, the heartbeat signal information 122 per clock cycle indicates the number of instructions completed per clock, wherein the clock period shown in the figure is 0-5. It corresponds to the core clock cycle. The relative advantages and disadvantages of the above embodiments will be described in detail below. The fourth and fifth figures and the eighth to thirteenth _ the advantage is that 'only need to be 100106953 on the multi-core processor package, that is, each core 104-to-seven-seven-figure in the chip size, each of the cores of the seven figures is 1〇4 1002011767-0 Place a smaller number of external pins (pins). Therefore, these embodiments will be more scalable than sixth, because the sixth, form number A〇1〇1 page 22 / total page 47, 201133232 requires multiple pins, if the number of cores 104 is more, this problem is more obvious. ❹ However, in view of the fact that current microprocessors are superscalar and can execute multiple instructions per clock cycle, the fourth and fifth embodiments limit only 10 4 per core. A single instruction is completed in each core clock cycle. Conversely, an advantage of the sixth to thirteenth embodiment is that the core 104 can be supported to complete multiple instructions per clock cycle. In addition, the embodiments of the fourth to seventh embodiments limit the core cycle to the same cycle rate as the heartbeat signal 106 bus, but in view of the fact that the clock rate of many current microprocessors is high, the implementation cannot be performed with the core. The pulse rate is used to operate an external bus. Conversely, the eighth to thirteenth and fifteenth to sixteenth drawings (described later) have the advantage of supporting an architecture in which the core periodic frequency is a multiple of the bus frequency of the heartbeat signal 106. As mentioned above, the disadvantages of the embodiments of the twelfth and thirteenth figures are that they are intrusive and may affect the timing of execution of programs of the multi-core processor, so that when the heartbeat is enabled, some errors may be missed. . However, the embodiment of the twelfth and thirteenth figures has the advantage that no additional external pins are required, and this advantage is necessary in some applications. Please refer to the fourteenth figure, which is a flow chart of the method for operating the simulation environment of the second figure disclosed in the present invention. The software function module simulation environment 124 is used to determine whether an error has occurred between the first and second checkpoints, and may also be used after the system 100 has been operating for a period of time and generates a plurality of sets of checkpoints to determine storage. Whether there is an error between the plurality of sets of first and second check points in the folder 116 of the first figure. The flow starts in step S1402. First, in step S1402, the actual result generator 208 uses the processor bus transfer information 118 to generate an actual result memory map 222 and a form number A0101 that are executed by the dual core processor 102 between the two checkpoints 100106953 with a predetermined number of instructions. Page 23 of 47 1002011767-0 201133232 Actual result status 224, as shown in the second figure. The flow then proceeds to step S1404. Next, in step S1404, the simulated initial state generator 202 uses the processor bus transfer information 11 8 to generate the simulated initial memory map 21 2 and the simulated initial state 214, as shown in the second figure. The flow then proceeds to step S1406. In step S1 406, the simulated initial memory image 212 is copied to the simulation result memory map 232, and the simulated initial state 214A of the first core is copied to the simulation result state 234A of the first core, and the simulated initial state 214B of the second core is copied. The simulation result state 234B to the second core. The rate controller 204 and the software function module example 206 then use the copied image to update the simulation result memory map 232 and the simulation result state 234, as shown in the third figure. It should be noted that in the operation of the third figure, in each core instruction execution, it may happen that one core 104 executes a memory write instruction and the other core 104 executes a memory read instruction, such as the above signal. Semaphore write and read. If the number of instructions between such poor read operations is less than the granularity of the heartbeat information 122, during the actual execution of the dual core processor 102, there may be multiple sequences that may affect the actual memory access. Therefore, if the software function module simulation environment 124 detects this situation, it will assume the possible order of memory access, and then execute step S1406 according to the possible order, and record the operation situation. It should be noted that the heartbeat signal information 122 generated by the embodiment of the eighth to eleventh embodiments is greater than the sixth to seventh maps and the twelfth to thirteenth graphs; and the embodiments of the twelfth to thirteenth graphs The interval between heartbeat signal information 1 2 2 is greater than the fourth to fifth map. Larger intervals will cause the software function module to simulate 100106953 Form No. A0101 Page 24 / Total 47 Page 1002011767-0 201133232 Environment 124 takes more time to complete the operation of the fourteenth figure because of the number of memory access sequences There may be a lot. The flow then proceeds to step S1408. In step S1408, the comparison unit 226 compares the simulation result generated in step S1406 with the actual result generated in step S1402, and the flow proceeds to step S1412. In step S1412, it is judged whether or not the comparison result is coincident, and if so, step S1414 is performed, otherwise step S1416 is performed. In step S1414, the comparison unit 226 generates a pass value on the take-off indicator 228. The flow terminates in step S1414. In step S1416, the software function module simulation environment 124 determines whether there are other possible memory access sequences: not executed, if yes, returns to step S1406 and operates using different memory access sequences; otherwise Then, step S1418 is performed. In step S1418, the comparison unit 226 generates a fail value on the take-off indicator 228. The flow is terminated in step S1418. Please refer to the fifteenth figure, which is a functional block diagram of another embodiment of the dual core processor 102 disclosed in the first figure of the present invention. The fifteenth and sixteenth diagrams are similar to the tenth and eleventh diagrams. Each core 104 has a number of instructions in each core clock cycle 且 70 and the clock rate of the core 104 is the clock rate of the heartbeat signal. Times, each instruction execution indication 1〇5 is a two-digit 'used to indicate the number of instructions completed by the core 1〇4 per clock cycle. However, in the embodiment of the fifteenth embodiment, in each bus cycle gland cycle, the heartbeat generator 103 generates a two-dimensional heartbeat signal l〇6A, 106B' which indicates the first core 104A and the The number of instructions that the second core 104B executes is 0, 8, 16, or 32. 100106953 1002011767*0 Form No. A0101 Page 25 of 47 201133232 Reference to the sixteenth figure is the rate control benefit disclosed in the second figure of the present invention: the flute" table. The diagram portion of this embodiment is similar to the seventh and ninety-first diagrams. The value of the heartbeat number 106 received by the rate controller 204 from the heartbeat signal information 122 in each core clock is from 〇 to 3. Similarly, at each clock cycle, according to the judgment of step S306 in the first figure, the graph indicates that the rate controller 2〇4 is in accordance with step S308 in the second figure, during the corresponding analog clock cycle, Whether to execute the command by command 218A to control the first core software function module = example 206A (if yes, it will indicate how many instructions are executed and whether the second core software function mode 2 is controlled by the command 218β) Execute the command (if it is, it will indicate how many instructions are executed). In this example, the first core is 时4Α at the clock, the heartbeat signal of 2 is 〇, at clock 4 is!, at the clock 3 o'clock is 2, and when the clock is 1, 5 o'clock is 3. Therefore, the rate controller 2 〇 4 through the command 218 eight indicates the first core software function module example 206 Α in the simulated clock 〇, 2 during the implementation of 0 regret Order; in the simulated clock _眺_ instructions; during the simulated clock 3 execution of 16 instructions;, the clock of the job preparation], the execution of 32 instructions during the period 5. On the other hand, the second core mB The heartbeat signal at the clock 3 5 is 〇, at the clock 〇, 4 o'clock ], at the time, '2 is 2' and there is no - the heartbeat signal of the clock is 3. Therefore, the rate controller 204 controls the second core software function module example 206B in the simulated clock through the command 218B. Execute one instruction during 3, 5; execute 8 instructions during the simulated clock 〇 '4; execute 16 instructions during the simulated clock u 2; no one clock will execute "1" instruction. Advantages of the embodiment of the fiveteenth figure, Form No. A0101 100106953 Page 26 of 47 可 Under the operation of the fourteenth figure, the interval that can be provided is better than that of the eighth to tenth embodiment. In addition, 100201171 201133232, the heartbeat signal 106 in the embodiment of the present invention can also be used in the example with finer spacing, and the error mode in this case does not need to cause each core 104 to regenerate an error. For example, suppose the side There are eight heartbeat signals 106 on the band bus, and the multicore processor includes four cores 104, but only two cores 104 must be operated to regenerate the error. In this case, the heartbeat generator 103 is programmed to use only eight. Heartbeat Four of the 1 06 bits are used to indicate the number of instructions completed by each core 104 (0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, or 32) ❹ Although in the above embodiment, each core 104 executes a single thread, it can also be designed to execute multiple threads at the same time, and the heartbeat information indicates the instructions completed by the thread. In the above embodiment, the two cores 104 have the same core clock cycle, but may also have different core clock cycles, and the heartbeat signal information 122 indicates the two core rates, and the rate controller 204 can generate the command 218. Take into consideration. [0024] Another embodiment is to use a Veri log simulator to emulate an actual processor that enables a debugger to access any communication link (net) of the processor at any time, including to indicate per core execution. The signal of the number of instructions and access memory. These signals enable the debugger to provide information to the software function module, so that the actual processor (or at least the Veri log simulation of the stone processor) can execute instructions and access memory simultaneously. However, this has three drawbacks: First, depending on the number of clock cycles/instructions in the simulation, the Veri log simulator consumes a very large amount of system resources and time, making Verilog simulations less likely to implement. 100106953 Form number A0101 page 27 of 47 page 1002011767-0 201133232 way, and can not solve some types of errors. Second, the operational behavior of the Veri l〇g simulator must be different from the actual processor behavior. The solution developed by the third 'Veri log simulator' requires the processor to be designed with the ability to have a perfect state-pei-clock replay, which is difficult to implement. In general, a microprocessor with an ideal per-cycle one-state reproduction function can be loaded with an input state that defines the state of the entire processor, meaning that there is no state of any processor. Initialization cannot be obtained by loading the input state. However, the implementation of the present invention has the above-mentioned VeH. </ RTI> </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; Modifications of the spirit of the invention should still be included in the scope of subsequent patents. For example, the software can be implemented as a function, an architecture, a core group, and/or the above devices are used by a general programming language (such as C' C++), a hardware description language (a scription languages, hdl), including a v language, etc. , or other available muscles, il〇g hardware description, the program to implement the software described in the present invention

。這樣_可儲存於任何電腦可用之餘存=二 帶⑽聊〜tape)、 U 、或光碟(_ical d· v ^agnetlc chsk) lsc)(例如 CD-ROM, DVD_R0M 等) 、'I法〜無線或其他通訊媒體。本發明所提之裝置 及方法貝施例可包含於—半導體智能核心 (se_nduct。〜1 一〜…), 100106953 表單編號A0101 第28頁/共47頁 1002011767-0 201133232 如一為處理器核心(例如内嵌於硬體描述語言中),且可 轉成硬體形式,以生產於積體電路上。此外,本發明所 述之裝置及方法亦可是硬體和軟體的組合,不以揭露者 為限。 [0026] 以上所述僅為本發明之較佳實施例而已,並非用以限定 本發明之申請專利範圍;凡其它未脫離發明所揭示之精 神下所完成之等效改變或修飾,諸如將本發明應用在一 般用途電腦的處理器裝置等,均應包含在下述之申請專 0 利範圍内。 【圖式簡單說明】 [0027] 第一圖係為本發明所揭示之具有雙核心處理器的計算機 系統之功能方塊圖。 第二圖係為本發明所揭示之軟體功能模組模擬環境之功 能方塊圖。 第三圖係為本發明所揭示之第二圖之操作模擬環境之方 法流程圖。 Q 第四圖係為本發明所揭示之雙核心處理器之一具體實施 例之功能方塊圖。 第五圖係為本發明所揭示之依據第四圖實施例之速率控 制器之操作例示表。 第六圖係為本發明所揭示之雙核心處理器之另一具體實 施例之功能方塊圖。 第七圖係為本發明所揭示之依據第六圖實施例之速率控 制器之操作例示表。 第八圖係為本發明所揭示之雙核心處理器之又一具體實 100106953 表單編號A0101 第29頁/共47頁 1002011767-0 201133232 施例之功能方塊圖。 第九圖係為本發明所揭示之依據第八圖實施例之速率控 制器之操作例示表。 第十圖係為本發明所揭示之雙核心處理器之再一具體實 施例之功能方塊圖。 第十一圖係為本發明所揭示之依據第十圖實施例之速率 控制器之操作例示表。 第十二圖係為本發明所揭示之雙核心處理器之更一具體 實施例之功能方塊圖。 第十三圖係為本發明所揭示之依據第十二圖實施例之速 率控制器之操作例示表。 第十四圖係為本發明所揭示之第二圖之操作模擬環境之 方法流程圖。 第十五圖係為本發明所揭示之雙核心處理器之更一具體 實施例之功能方塊圖。 第十六圖係為本發明所揭示之依據第十五圖實施例之速 率控制器之操作例示表。 【主要元件符號說明】 [0028] 100 計算機系統 102 雙核心處理器 103 心跳產生器 104A 第一核心 104B 第二核心 105, 105A, 105B 指令執行指示 106, 106A, 106B 心跳訊號 108 100106953 表單編號 A0101 第30 邏輯分析器 頁/共47頁 1002011767-0 201133232. This way _ can be stored in any computer available for the remaining = two (10) chat ~ tape), U, or CD (_ical d · v ^ agnetlc chsk) lsc) (such as CD-ROM, DVD_R0M, etc.), 'I method ~ wireless Or other communication media. The apparatus and method of the present invention can be included in a semiconductor intelligent core (se_nduct.~1~~...), 100106953 Form No. A0101, page 28/47, 1002011767-0, 201133232, such as a processor core (for example Embedded in the hardware description language), and can be converted into a hardware form for production on integrated circuits. Furthermore, the apparatus and method of the present invention may also be a combination of a hardware and a soft body, and are not limited to those disclosed. The above description is only the preferred embodiment of the present invention, and is not intended to limit the scope of the claims of the present invention; any other equivalent changes or modifications which are not made without departing from the spirit of the invention, such as The invention is applied to a general-purpose computer processor device or the like, and should be included in the following application. BRIEF DESCRIPTION OF THE DRAWINGS [0027] The first figure is a functional block diagram of a computer system having a dual core processor disclosed in the present invention. The second figure is a functional block diagram of the software functional module simulation environment disclosed by the present invention. The third figure is a flow chart of the method for operating the simulation environment of the second diagram disclosed in the present invention. Q is a functional block diagram of one embodiment of a dual core processor disclosed in the present invention. The fifth figure is an operational example of the rate controller according to the fourth embodiment of the present invention. Figure 6 is a functional block diagram of another embodiment of a dual core processor disclosed herein. The seventh figure is an operational example of the rate controller according to the sixth embodiment of the present invention. The eighth figure is another specific embodiment of the dual core processor disclosed in the present invention. 100106953 Form No. A0101 Page 29 of 47 1002011767-0 201133232 Functional block diagram of the example. The ninth drawing is an operational example of the rate controller according to the eighth embodiment of the present invention. The tenth figure is a functional block diagram of still another embodiment of the dual core processor disclosed in the present invention. The eleventh figure is an operational example of the rate controller according to the tenth embodiment of the present invention. Figure 12 is a functional block diagram of a more specific embodiment of the dual core processor disclosed herein. Figure 13 is a diagram showing the operation of the rate controller according to the embodiment of the twelfth embodiment disclosed in the present invention. Fig. 14 is a flow chart showing the method of operating the simulation environment of the second diagram disclosed in the present invention. The fifteenth figure is a functional block diagram of a further embodiment of the dual core processor disclosed in the present invention. Figure 16 is a diagram showing the operation of the rate controller according to the fifteenth embodiment of the present invention. [Main Component Symbol Description] [0028] 100 Computer System 102 Dual Core Processor 103 Heartbeat Generator 104A First Core 104B Second Core 105, 105A, 105B Instruction Execution Indication 106, 106A, 106B Heartbeat Signal 108 100106953 Form Number A0101 30 Logic Analyzer Page / Total 47 Pages 1002011767-0 201133232

112 記憶體 114 處理器匯流排 116 資料爽 118 處理器匯流排傳輸資訊 122 心跳訊號資訊 124 軟體功能模組模擬環境 202 模擬初始狀態產生器 204 速率控制器 206A 第一核心之軟體功能模組範 例 206B 第二核心之軟體功能模組範 例 208 實際結果產生器 226 比較單元 212 模擬初始記憶體映像 214A 第一核心之模擬初始狀態 214B 第二核心之模擬初始狀態 222 實際結果記憶體映像 224A 第一核心之實際結果狀態 224B 第二核心之實際結果狀態 232 模擬結果記憶體映像 234A 第一核心之模擬結果狀態 234B 第二核心之模擬結果狀態 218A, 218B 命令 228 取否指示器 1212 除錯記憶體陣列 表單編號A0101 第31頁/共47頁 100106953 1002011767-0 1214 201133232 1216 S302-S312 S1402-S1418 L 2快取記憶體 匯流排介面單元 步驟 步驟 100106953 表單編號A0101 第32頁/共47頁 1002011767-0112 Memory 114 Processor Bus 116 Data Cool 118 Processor Bus Transfer Information 122 Heartbeat Information 124 Software Function Module Simulation Environment 202 Analog Initial State Generator 204 Rate Controller 206A First Core Software Function Module Example 206B Second core software function module example 208 actual result generator 226 comparison unit 212 simulation initial memory image 214A first core simulation initial state 214B second core simulation initial state 222 actual result memory image 224A first core Actual result state 224B Actual result state of the second core 232 Simulation result Memory image 234A Simulation result state of the first core 234B Simulation result state of the second core 218A, 218B Command 228 No indicator 1212 Debug memory array form number A0101 Page 31 of 47 100106953 1002011767-0 1214 201133232 1216 S302-S312 S1402-S1418 L 2 Cache Memory Bus Interface Unit Steps Steps 100106953 Form No. A0101 Page 32 of 47 1002011767-0

Claims (1)

201133232 七、申請專利範圍· 1 . 一種微處理器的除錯方法,該微處理器具有複數個核心, 包含: 使微處理器去執行指令的一實際執行(actual execution); 從該微處理器獲得一心跳資訊,其指出該些核心彼此間執 行指令的一實際執行順序(actual execution sequence); 命令一軟體功能模組的複數個相關範例根據該實際執行順 ❹ 序來執行指令,以產生執行指令的一模擬結果;及 比較該模擬結果與執行指令的一實際結果*以判斷兩者是 否符合。 2 .如申請專利範圍第1項所述之方法,其中該實際結果以及 該模擬結果包含該些核心在執行完後的一狀態。 3 .如申請專利範圍第2項所述之方法,其中該實際結果以及 該模擬結果更包含執行完後的一記憶體狀態。 0 4 .如申請專利範圍第1項所述之方法,更包含: 若該模擬結果與該實際結果不符合,則標示一錯誤。 5 .如申請專利範圍第1項所述之方法,其中在對同一個記憶 體位址進行一記憶體寫入指令以及一記憶體讀取指令之間 執行的指令數量,小於該心跳資訊的間隔 (granularity),則存在複數個可能影響記憶體存取的 可能執行順序,其中命令該軟體功能模組的複數個相關範 例執行指令之步驟中,係根據該些可能執行順序來命令該 軟體功能模組的該些相關範例執行指令,直到該模擬結果 100106953 表單編號A0101 第33頁/共47頁 1002011767-0 201133232 與該實際結果符合為止。 6 .如申請專利範圍第5項所述之方法,更包含: 若所有該些執行順序的該模擬結果都與該實際結果不符合 ,則標示一錯誤。 7 .如申請專利範圍第1項所述之方法,其中該心跳資訊包含 複數個記錄,其為在指令的該實際執行期間之複數個心跳 ,其中每一該記錄指出該些核心實際執行的指令數量,而 命令該軟體功能模組的複數個相關範例執行指令之步驟中 ,包含: 對每筆之該些記錄,命令該軟體功能模組的每一該相關範 例去實際執行該些記錄中記錄的指令數量。 8.如申請專利範圍第1項所述之方法,其中從該微處理器獲 得該心跳資訊之步驟中,包含擷取在一外部匯流排上實際 執行指令期間,該微處理器產生的心跳訊號。 9 . 一種微處理器,包含: 複數個核心,每一該些核心輸出一指令執行指示 (instruction execution indicator),用來指出該 些核心在每一時脈週期中所執行的指令數目;及 一心跳產生器(heartbeat generator),其從每一該些 核心接收該指令執行指示,並對每一個在一外部匯流排上 的該些核心產生一心跳指示(heartbeat indicator), 其中該心跳指示指出了每一該些核心在該外部匯流排的每 個時脈中,所執行的指令數量。 10 .如申請專利範圍第9項所述之微處理器,其中該些核心的 時脈週期速率與該外部匯流排的時脈週期速率相同。 11 .如申請專利範圍第9項所述之微處理器,其中該些核心的 100106953 表單編號A0101 第34頁/共47頁 1002011767-0 201133232 12 . 13 .Ο 14 . 15 . G 16 . 17 · 時脈週期速率大於該外部匯流排的時脈週期速率,其中每 一該心跳指示指出的指令執行數量比在每一該指令執行指 示指出的可完成指令的最大數量還大。 如申請專利範圍第11項所述之微處理器,其中該些核心的 時脈週期速率與該外部匯流排的時脈週期速率之比率為J ,每一該指令執行指示指出的可完成指令的最大數量為K ,每一該心跳指示指出的指令執行數量為L,其中L大於或 等於J和K之積。 如申請專利範圍第11項所述之微處理器,其中每一該心跳 指示包含一單一位元。 如申請專利範圍第9項所述之微處理器,其中每一該些核 心包含一計數器,用來計數每時脈週期中已執行的指令數 量,其中該指令執行指示係為該計數器之計數值的一輸出 位元。 如申請專利範圍第14項所述之微處理器,其中該計數器之 計數值的該輸出位元是位元Μ,.且Μ = : .1 og2N,其中N是 該核心的時脈與該外部匯流排的時脈之比率以及該核心於 每時脈週期中可完成的最大數量指令之積。 如申請專利範圍第9項所述之微處理器,其中該心跳產生 器包含一與每一該些核心相關的計數器,用來計數每時脈 週期中已執行的指令數量,其中該指令執行指示係為該計 數器之計數值的一輸出位元。 如申請專利範圍第16項所述之微處理器,其中該計數器之 計數值的該輸出位元是位元Μ,且M = log2N,其中N是 該核心的時脈與該外部匯流排的時脈之比率以及該核心於 每時脈週期中可完成的最大數量指令之積。 100106953 表單編號A0101 第35頁/共47頁 1002011767-0 201133232 18 19 20 . 21 . 22 . 23 . 24 . 100106953 .如申請專利範圍第1 7項所述之微處理器,其中每一該指令 執行指示係為一位元,且每一該心跳指示包含—單一位元 〇 .如申請專利範圍第9項所述之微處理器,其中該外部匯流 排包含一側波帶匯流排(sideband bus),其耦接於該微 處理器,該側波帶匯流排不同於耦接於微處理器的主處理 器匯流排(main processor bus)。 如申請專利範圍第19項所述之微處理器,其中至少有部份 的該侧波帶匯流排係為一jTAG匯流排(JTAG bus)。 如申請專利範圍第1 9項所述之微處理器,其中該侧波帶匯 流排係為一服務處理器匯流排(service pr〇cess〇r bus),其耦接於該微處理器内部的一服務處理器。 一種微處理器,包含: 複數個核心,每一個核心會產生一指令執行指示 (instruction execution indicat〇r),用來指示各 核心在每一時脈期間,所執行的指令數目; 一儲存陣列(memory array),其儲存在一段時脈期間中 ,由該些核心所產生的指令執行指示;及 一匯流排介面單元(bus interface unit) ,其耦接於 該微處理器的-外部匯流排,其中該匯流排介面單元用來 將儲存於儲存陣列中的該指令執行指示寫入至該微處理器 的一外部記憶體中。 如申凊專利範圍第22項所述之微處理器,其中該匯流排介 面單元以相較於其他處理該外部匯流排上的傳輸之最低優 先權來將該指令執行指示寫入至該外部記憶體中 如申请專利範圍第19項所述之微處理器,更包含 1002011767-0 表單編號A0101 第36頁/共47頁 201133232 一心跳產生器,耦接於該儲存陣列以及該匯流排介面單元 ,用來從每一該些核心接收該指令執行指示,其中該心跳 產生器係在該段時脈期間中寫入該指令執行指示至儲存陣 列,並從該儲存陣列讀出該指令執行指示,使該匯流排介 面單元將其寫至該外部記憶體。 25 .如申請專利範圍第24項所述之微處理器,其中該心跳產生 器等候從該儲存陣列中讀出該指令執行指示,並在該段時 脈期間結束後以使該匯流排介面單元將該指令執行指示寫 至該外部記憶體。 26 .如申請專利範圍第24項所述之微處理器,其中該心跳產生 器週期性地從該儲存陣列中讀出該指令執行指示,以使該 匯流排介面單元將其寫至該外部記憶體。 100106953 表單編號A0101 第37頁/共47頁 1002011767-0201133232 VII. Patent application scope 1. A method for debugging a microprocessor, the microprocessor has a plurality of cores, including: an actual execution of the microprocessor to execute the instruction; Obtaining a heartbeat information indicating an actual execution sequence in which the cores execute instructions with each other; a plurality of related examples of the command-software function module executing instructions in accordance with the actual execution sequence to generate execution A simulation result of the instruction; and comparing the simulation result with an actual result of the execution instruction* to determine whether the two are in conformity. 2. The method of claim 1, wherein the actual result and the simulation result include a state in which the cores are executed. 3. The method of claim 2, wherein the actual result and the simulation result further comprise a memory state after execution. 0 4. The method of claim 1, further comprising: if the simulation result does not match the actual result, an error is indicated. 5. The method of claim 1, wherein the number of instructions executed between a memory write command and a memory read command for the same memory address is less than the interval of the heartbeat information ( Granularity), there are a plurality of possible execution sequences that may affect the memory access, wherein the step of executing the instruction of the plurality of related examples of the software function module is to execute the software function module according to the possible execution order The relevant examples execute the instructions until the simulation result 100106953 Form No. A0101 Page 33/47 pages 1002011767-0 201133232 is consistent with the actual result. 6. The method of claim 5, further comprising: indicating an error if the simulation results of all of the execution sequences do not match the actual result. 7. The method of claim 1, wherein the heartbeat information comprises a plurality of records that are a plurality of heartbeats during the actual execution of the instructions, wherein each of the records indicates instructions actually executed by the cores The number, and the step of instructing the plurality of related examples of the software function module to execute the instruction, includes: for each of the records, commanding each of the related examples of the software function module to actually perform the recording in the records The number of instructions. 8. The method of claim 1, wherein the step of obtaining the heartbeat information from the microprocessor comprises extracting a heartbeat signal generated by the microprocessor during actual execution of an instruction on an external bus. . 9. A microprocessor comprising: a plurality of cores, each of the core outputs an instruction execution indicator indicating a number of instructions executed by the core in each clock cycle; and a heartbeat a heartbeat generator that receives the instruction execution indication from each of the cores and generates a heartbeat indicator for each of the cores on an external bus, wherein the heartbeat indication indicates each The number of instructions executed by each of the cores in each clock of the external bus. 10. The microprocessor of claim 9, wherein the core clock cycle rate is the same as the clock cycle rate of the external bus. 11. The microprocessor of claim 9, wherein the cores of the 100106953 form number A0101 page 34 / total page 47 1002011767-0 201133232 12 . 13 .Ο 14 . 15 . G 16 . 17 · The clock cycle rate is greater than the clock cycle rate of the external bus, wherein each of the heartbeat indications indicates that the number of instructions executed is greater than the maximum number of achievable instructions indicated by each of the instruction execution instructions. The microprocessor of claim 11, wherein a ratio of a clock cycle rate of the cores to a clock cycle rate of the external bus is J, each of the instructions of the instruction execution indicating the achievable instruction The maximum number is K, and each of the heartbeat indications indicates that the number of instructions executed is L, where L is greater than or equal to the product of J and K. The microprocessor of claim 11, wherein each of the heartbeat indications comprises a single bit. The microprocessor of claim 9, wherein each of the cores includes a counter for counting the number of instructions executed in each clock cycle, wherein the instruction execution indication is the counter value of the counter. An output bit. The microprocessor of claim 14, wherein the output bit of the counter value of the counter is a bit Μ, and Μ = : .1 og2N, where N is the clock of the core and the external The ratio of the clocks of the bus and the product of the maximum number of instructions that the core can complete per clock cycle. The microprocessor of claim 9, wherein the heartbeat generator includes a counter associated with each of the cores for counting the number of instructions executed in each clock cycle, wherein the instruction execution indication Is an output bit of the counter value of the counter. The microprocessor of claim 16, wherein the output bit of the count value of the counter is a bit Μ, and M = log2N, where N is the clock of the core and the time of the external bus The ratio of the pulse and the product of the maximum number of instructions that the core can complete per clock cycle. 100106953 Form No. A0101 Page 35 of 47 1002011767-0 201133232 18 19 20 . 21 . 22 . 23 . 24 . 100106953. The microprocessor of claim 17 wherein each of the instructions is executed The indication is a one-bit, and each of the heartbeat indications includes a single-bit 〇. The microprocessor of claim 9 wherein the external bus includes a sideband bus And coupled to the microprocessor, the sideband busbar is different from a main processor bus coupled to the microprocessor. The microprocessor of claim 19, wherein at least a portion of the sideband busbars are a JTAG bus. The microprocessor of claim 19, wherein the sideband busbar is a service processor bus (service pr〇cess〇r bus) coupled to the microprocessor A service processor. A microprocessor comprising: a plurality of cores, each core generating an instruction execution indication (instruction execution indicat〇r) for indicating the number of instructions executed by each core during each clock; a storage array (memory Array), which is stored during a period of time, an instruction execution instruction generated by the cores; and a bus interface unit coupled to the external bus of the microprocessor, wherein The bus interface unit is configured to write the instruction execution indication stored in the storage array to an external memory of the microprocessor. The microprocessor of claim 22, wherein the bus interface unit writes the instruction execution instruction to the external memory in accordance with a lowest priority of other transmissions on the external bus. The microprocessor according to claim 19, further comprising 1002011767-0 Form No. A0101, page 36/total 47, 201133232, a heartbeat generator coupled to the storage array and the bus interface unit, Used to receive the instruction execution indication from each of the cores, wherein the heartbeat generator writes the instruction execution indication to the storage array during the segment clock period, and reads the instruction execution indication from the storage array, so that The bus interface unit writes it to the external memory. 25. The microprocessor of claim 24, wherein the heartbeat generator waits to read the instruction execution indication from the storage array and after the end of the clock period to cause the bus interface unit The instruction execution instruction is written to the external memory. The microprocessor of claim 24, wherein the heartbeat generator periodically reads the instruction execution indication from the storage array to cause the bus interface unit to write to the external memory. body. 100106953 Form No. A0101 Page 37 of 47 1002011767-0
TW100106953A 2010-03-16 2011-03-02 Microprocessor and debugging method thereof TWI470421B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US31425310P 2010-03-16 2010-03-16
US12/964,949 US8762779B2 (en) 2010-01-22 2010-12-10 Multi-core processor with external instruction execution rate heartbeat

Publications (2)

Publication Number Publication Date
TW201133232A true TW201133232A (en) 2011-10-01
TWI470421B TWI470421B (en) 2015-01-21

Family

ID=46751117

Family Applications (1)

Application Number Title Priority Date Filing Date
TW100106953A TWI470421B (en) 2010-03-16 2011-03-02 Microprocessor and debugging method thereof

Country Status (1)

Country Link
TW (1) TWI470421B (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI344595B (en) * 2003-02-14 2011-07-01 Advantest Corp Method and structure to develop a test program for semiconductor integrated circuits
GB0420442D0 (en) * 2004-09-14 2004-10-20 Ignios Ltd Debug in a multicore architecture
JP2008176453A (en) * 2007-01-17 2008-07-31 Nec Electronics Corp Simulation device
US20100008464A1 (en) * 2008-07-11 2010-01-14 Infineon Technologies Ag System profiling

Also Published As

Publication number Publication date
TWI470421B (en) 2015-01-21

Similar Documents

Publication Publication Date Title
US7689867B2 (en) Multiprocessor breakpoint
JP4717492B2 (en) Multi-core model simulator
US8725485B2 (en) Simulation method and simulation apparatus
US6539500B1 (en) System and method for tracing
JP3175757B2 (en) Debug system
US8762779B2 (en) Multi-core processor with external instruction execution rate heartbeat
CN107111546B (en) System and method for generating cross-core breakpoints in a multi-core microcontroller
JP6653756B2 (en) Method and circuit for debugging a circuit design
US20110185153A1 (en) Simultaneous execution resumption of multiple processor cores after core state information dump to facilitate debugging via multi-core processor simulator using the state information
WO2009123848A2 (en) Apparatus and method for low overhead correlation of multi-processor trace information
JP2006313521A (en) Method and apparatus for modeling programmable device
Hedde et al. A non intrusive simulation-based trace system to analyse multiprocessor systems-on-chip software
US7607047B2 (en) Method and system of identifying overlays
CN102073480B (en) Method for simulating cores of multi-core processor by adopting time division multiplex
JP2008140405A (en) Co-validation method between electronic circuit and control program
JP5336228B2 (en) Techniques for promoting determinism in multiple clock domains
US10970442B1 (en) Method of debugging hardware and firmware of data storage
Lakis et al. An SDRAM controller for real-time systems
CN102096607B (en) Microprocessor and debugging method thereof
US20110197182A1 (en) Debugging parallel software using speculatively executed code sequences in a multiple core environment
US7992049B2 (en) Monitoring of memory and external events
US7231568B2 (en) System debugging device and system debugging method
TW201310241A (en) A full bus transaction level modeling approach for fast and accurate contention analysis
WO2009123952A2 (en) Apparatus and method for condensing trace information in a multi-processor system
TW201133232A (en) Microprocessor and debugging method thereof