TWI470421B - Microprocessor and debugging method thereof - Google Patents

Microprocessor and debugging method thereof Download PDF

Info

Publication number
TWI470421B
TWI470421B TW100106953A TW100106953A TWI470421B TW I470421 B TWI470421 B TW I470421B TW 100106953 A TW100106953 A TW 100106953A TW 100106953 A TW100106953 A TW 100106953A TW I470421 B TWI470421 B TW I470421B
Authority
TW
Taiwan
Prior art keywords
core
microprocessor
instructions
heartbeat
instruction
Prior art date
Application number
TW100106953A
Other languages
Chinese (zh)
Other versions
TW201133232A (en
Inventor
Darius D Gaskins
E Hooker Rodney
Jason Chen
Original Assignee
Via Tech Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/964,949 external-priority patent/US8762779B2/en
Application filed by Via Tech Inc filed Critical Via Tech Inc
Publication of TW201133232A publication Critical patent/TW201133232A/en
Application granted granted Critical
Publication of TWI470421B publication Critical patent/TWI470421B/en

Links

Description

微處理器及其除錯方法Microprocessor and its debugging method

本發明係有關一種多核心微處理器,特別是關於一種監控多核心微處理器的指令執行及其除錯方法。The present invention relates to a multi-core microprocessor, and more particularly to an instruction execution and debug method for monitoring a multi-core microprocessor.

目前的微處理器非常複雜,且對其進行除錯是個非常困難的工作。微處理器的研發人員通常使用一軟體功能模組(software functional model)來模擬微處理器的架構行為,以作為除錯工具。相較於Verilog模擬器等其他軟體模組,軟體功能模組會更有用,因為它更可迅速地模擬大量指令的執行。軟體功能模組係根據系統架構來之定義每次執行單一指令,因此可有效地對單核心處理器(single core processor)進行除錯。The current microprocessor is very complex and debugging it is a very difficult task. Microprocessor developers often use a software functional model to simulate the architectural behavior of a microprocessor as a debugging tool. The software function module is more useful than other software modules such as the Verilog simulator, because it can more quickly simulate the execution of a large number of instructions. The software function module is defined according to the system architecture, and each time a single instruction is executed, the single core processor can be effectively debugged.

軟體功能模組也可用來對多核心處理器(multi-core processor)進行除錯,軟體功能模組各別不同的範例會在每核心上被用來模擬指令執行,只要各核心彼此不互相影響,就可模擬的很好。然而,多核心處理器常產生一些錯誤,而這些錯誤通常只會出現在多核心之間的記憶體存取時,或當各核心彼此共用同一個記憶體位址時,如共享一軟體信號(software semaphore)時。各核心實質上會在不同時間存取共用的記憶體位址,例如,第一核心讀取一信號(semaphore)並等待第二核心來寫入該信號。除非軟體功能模組的兩範例執行指令時,非常近似於實際處理器在發生錯誤時所執行指令的順序,否則軟體功能模組就無法有效地對多核心處理器進行除錯。因此亟需提出一種控制被模擬的各核心彼此間執行指令的順序,其近似於後晶片(post-silicon)之多核心處理器的順序。Software function modules can also be used to debug multi-core processors. Different examples of software function modules are used to simulate instruction execution on each core, as long as the cores do not interact with each other. It can be simulated very well. However, multi-core processors often generate some errors, and these errors usually only occur when memory accesses between multiple cores, or when cores share the same memory address, such as sharing a software signal (software) Semaphore). Each core will essentially access a shared memory address at different times, for example, the first core reads a semaphore and waits for the second core to write the signal. Unless the two examples of the software function module execute instructions, which is very similar to the order of the instructions executed by the actual processor when an error occurs, the software function module cannot effectively debug the multi-core processor. It is therefore desirable to have a sequence that controls the execution of instructions by the cores being simulated, which approximates the order of the multi-core processors of the post-silicon.

本發明揭露一微處理器的除錯方法,其中微處理器具有複數個核心。所述之方法包括:使微處理器去執行指令的一實際執行(actual execution),並從微處理器獲得一心跳資訊,其明確指出各核心彼此間執行指令的一實際執行順序(actual execution sequence)。所述之方法也包括,命令一軟體功能模組的複數個相關範例根據實際執行順序來執行指令,以產生指令執行的模擬結果。所述之方法更包括比較模擬結果與指令執行的實際結果,以判斷兩者是否符合。The invention discloses a debugging method of a microprocessor, wherein the microprocessor has a plurality of cores. The method includes: causing a microprocessor to perform an actual execution of the instruction, and obtaining a heartbeat information from the microprocessor, which clearly indicates an actual execution sequence of execution of the instructions between the cores (actual execution sequence) ). The method also includes commanding a plurality of related examples of the software function module to execute the instructions according to the actual execution order to generate a simulation result of the instruction execution. The method further includes comparing the simulation result with the actual result of the instruction execution to determine whether the two are in conformity.

本發明復揭露一種微處理器,其包括複數個核心,每一個核心會輸出一指令執行指示(instruction execution indicator),用來指示各核心在每一時脈期間,所執行的指令數目。微處理器更包括一心跳產生器(heartbeat generator),其從每個核心接收指令執行指示。心跳產生器係用來對每一個在外部匯流排上的處理核心產生一心跳指示(heartbeat indicator),以回應指令執行指示,而心跳指示則指出了每一核心在外部匯流排的每個時脈週期中,所執行的指令數量。The present invention discloses a microprocessor comprising a plurality of cores, each core outputting an instruction execution indicator for indicating the number of instructions executed by each core during each clock. The microprocessor further includes a heartbeat generator that receives an instruction execution indication from each core. The heartbeat generator is configured to generate a heartbeat indicator for each processing core on the external busbar in response to the instruction execution indication, and the heartbeat indication indicates each core of each clock in the external busbar. The number of instructions executed during the cycle.

本發明又揭露一微處理器,其包括複數個核心,每一個核心會產生一指令執行指示(instruction execution indicator),用來指示各核心在每一時脈週期期間,所執行的指令數目。微處理器又包括一儲存陣列(memory array),其儲存在一段時脈週期期間中,各核心所產生的指令執行指示。微處理器更包括一匯流排介面單元(bus interface unit),其耦接於微處理器外部的ㄧ匯流排。匯流排介面單元用來將儲存於儲存陣列中的指令執行指示,寫入至微處理器外部的ㄧ記憶體中。
The invention further discloses a microprocessor comprising a plurality of cores, each core generating an instruction execution indicator for indicating the number of instructions executed by each core during each clock cycle. The microprocessor further includes a memory array that stores instruction execution instructions generated by the cores during a clock cycle. The microprocessor further includes a bus interface unit coupled to the bus bar external to the microprocessor. The bus interface unit is used to write an instruction execution instruction stored in the storage array to the internal memory of the microprocessor.

本發明實施例中提到的多核心處理器,係用來產生心跳訊號(heartbeat signal),以指示各核心彼此間的執行指令速率。處理器的研發人員可如同處理器運作般獲得心跳訊號,並使用得到的心跳資訊,來動態地控制每一核心的軟體功能模組對每一核心所之執行指令的速率。藉此,心跳訊號提供明顯指標給軟體功能模組,使其得以窺見所需的多核心處理器之內部操作,以控制被模擬的各核心彼此間執行指令的順序,此順序會近似於實際出現錯誤的多核心處理器之執行指令順序。在某些實施例中,處理器會在系統匯流排(architectural processor bus)上提供心跳訊號資訊,但這樣會影響在多核心處理器上執行程式的時序間,而導致當致能(enable)心跳時,錯過一些錯誤。因此,在本發明實施例中,處理器係非侵入性地在外部側波帶匯流排(external sideband bus)上提供心跳訊號,而不是在系統匯流排上提供心跳訊號。The multi-core processor mentioned in the embodiment of the present invention is used to generate a heartbeat signal to indicate the execution instruction rate of each core. The processor developer can obtain the heartbeat signal as the processor operates, and use the obtained heartbeat information to dynamically control the rate at which each core software function module executes instructions for each core. In this way, the heartbeat signal provides a clear indicator to the software function module, so that it can glimpse the internal operations of the required multi-core processor to control the order in which the simulated cores execute instructions with each other. This sequence will approximate the actual occurrence. The order of execution instructions for the wrong multi-core processor. In some embodiments, the processor provides heartbeat information on the system processor bus, but this affects the timing of execution of the program on the multi-core processor, resulting in an enable heartbeat. When I missed some errors. Therefore, in the embodiment of the present invention, the processor non-invasively provides a heartbeat signal on the external sideband bus instead of providing a heartbeat signal on the system bus.

請參考第一圖,係為本發明所揭示之具有雙核心處理器102的計算機系統100之功能方塊圖,雙核心處理器(dual-core processor)102會產生心跳訊號106。計算機系統(computing system)100包括一雙核心處理器102,其包括兩個核心,如圖中的第一核心104A以及一第二核心104B,兩者可合併以核心104統稱。在本發明實施例中,雙核心處理器102的每一核心104係微處理器核心,其符合威盛電子股份有限公司(VIA Technologies, Inc)所設計的威盛凌瓏處理器架構(VIA Nano™ architecture)。雖然本實施例係以雙核心處理器舉例,但其他利用心跳訊號106提供資訊給多個核心之處理器,亦為本發明所保護之範圍。Please refer to the first figure, which is a functional block diagram of a computer system 100 having a dual core processor 102 disclosed in the present invention. A dual core processor 102 generates a heartbeat signal 106. The computing system 100 includes a dual core processor 102 that includes two cores, such as a first core 104A and a second core 104B, which may be collectively referred to as a core 104. In the embodiment of the present invention, each core 104 of the dual core processor 102 is a microprocessor core that conforms to the VIA NanoTM architecture designed by VIA Technologies, Inc. ). Although the present embodiment is exemplified by a dual core processor, other processors that use the heartbeat signal 106 to provide information to multiple cores are also within the scope of the present invention.

雙核心處理器102更包括一心跳產生器(heartbeat generator)103,其耦接於每一個核心104。具體來說,第一核心104A產生一指令執行指示(instruction execution indicator)105A,用來指示在一時脈週期中所執行的指令數目;第二核心104B產生一指令執行指示(instruction execution indicator)105B,用來指示在一時脈週期中所執行的指令數目,心跳產生器103則產生心跳訊號106來指示核心104已執行的指令,以回應指令執行指示105。在本發明實施例中,核心104執行指令的假設執行,且指令執行指示105告知心跳產生器103指令已被完成,意即,不同於假設執行而已,其仍會更新核心104的系統狀態。The dual core processor 102 further includes a heartbeat generator 103 coupled to each of the cores 104. Specifically, the first core 104A generates an instruction execution indicator 105A for indicating the number of instructions executed in a clock cycle; the second core 104B generates an instruction execution indicator 105B. Used to indicate the number of instructions executed in a clock cycle, the heartbeat generator 103 generates a heartbeat signal 106 to indicate that the core 104 has executed an instruction in response to the instruction execution indication 105. In an embodiment of the invention, core 104 performs the hypothetical execution of the instructions, and instruction execution indication 105 informs heartbeat generator 103 that the instructions have been completed, that is, unlike the hypothetical execution, it still updates the system state of core 104.

計算機系統100更包括一記憶體112,其耦接於雙核心處理器102,雙核心處理器102的每一核心104可編寫成週期性地停止執行使用者程式指令,並儲存(dump)目前狀態至記憶體112的一預定位址,以及讀出(flush)本身快取記憶體的內容至記憶體112,在此視為一檢查點(checkpoint)。核心104的狀態包括其內部暫存器的狀態,在此視為一檢查點狀態(checkpoint state)。更具體地說,每一核心104可藉由研發人員編寫成持續執行一預定數量的指令(如100,000個指令)後,便進行停止執行指令、儲存檢查點狀態、讀出快取記憶體、重新執行指令等動作,直到下次累積預定數量指令,再重覆上述動作,以此類推。The computer system 100 further includes a memory 112 coupled to the dual core processor 102. Each core 104 of the dual core processor 102 can be programmed to periodically stop executing user program instructions and dump the current state. A predetermined address to the memory 112, and the contents of the cache memory itself are flushed to the memory 112, which is considered a checkpoint. The state of core 104 includes the state of its internal register, which is considered herein as a checkpoint state. More specifically, each core 104 can be programmed to continue executing a predetermined number of instructions (eg, 100,000 instructions), then stop executing instructions, store checkpoint status, read cache memory, and re- Execute instructions and other actions until the next time a predetermined number of instructions are accumulated, repeat the above actions, and so on.

計算機系統100又包括一邏輯分析器(logic analyzer)108,在本發明實施例中,邏輯分析器108包括雙核心處理器102裡其中一個核心104。邏輯分析器108監控處理器匯流排114以及取得上面的傳輸(transaction),包括寫入檢查點狀態至記憶體112以及讀出快取記憶體等資料傳輸。邏輯分析器108也監控並取得心跳訊號106,其儲存所擷取到的資訊至一資料夾(file)116,如一磁碟(disk drive)中,而資料夾116包括擷取到的處理器匯流排傳輸資訊118以及心跳訊號資訊122。在本發明實施例中,心跳訊號106會在一側波帶匯流排(Sideband Bus)上被提供給雙核心處理器102,舉例來說,側波帶匯流排係為JTAG匯流排,其可被雙核心處理器102晶片內部的ㄧ分散服務處理器(separate service processor)使用。Computer system 100 in turn includes a logic analyzer 108, which in one embodiment of the present invention includes one of cores 104 of dual core processor 102. The logic analyzer 108 monitors the processor bus 114 and retrieves the above transactions, including writing checkpoint status to the memory 112 and reading data from the cache. The logic analyzer 108 also monitors and retrieves the heartbeat signal 106, which stores the retrieved information into a file 116, such as a disk drive, and the folder 116 includes the retrieved processor sink. The transmission information 118 and the heartbeat signal information 122 are transmitted. In the embodiment of the present invention, the heartbeat signal 106 is provided to the dual core processor 102 on a sideband bus. For example, the sideband busbar is a JTAG busbar, which can be The dual core processor 102 is used internally by a separate service processor.

計算機系統100又包括一軟體功能模組模擬環境(software functional model simulation environment)124,其包括一或多個模擬計算機系統,其有別於包含微處理器102的計算機系統100。軟體功能模組模擬環境124使用擷取到並儲存於資料夾116中的處理器匯流排傳輸資訊118以及心跳訊號資訊122,來模擬雙核心處理器102的操作情形,以下會有詳細說明。Computer system 100 further includes a software functional model simulation environment 124 that includes one or more analog computer systems that are distinct from computer system 100 that includes microprocessor 102. The software function module simulation environment 124 simulates the operation of the dual core processor 102 using the processor bus transmission information 118 and the heartbeat signal information 122 captured and stored in the folder 116, as will be described in more detail below.

請再參考第二圖,係為本發明所揭示之軟體功能模組模擬環境124之功能方塊圖。軟體功能模組模擬環境124包括一模擬初始狀態產生器(simulated initial state generator)202、一速率控制器(rate controller)204、一第一核心之軟體功能模組範例206A、一第二核心之軟體功能模組範例206B、一實際結果產生器208、以及一比較單元(comparison function)226。雖然這些元件都是由軟體實作出的,但部份或全部元件亦可由硬體來實作,以增加執行速度。Please refer to the second figure again, which is a functional block diagram of the software function module simulation environment 124 disclosed by the present invention. The software function module simulation environment 124 includes a simulated initial state generator 202, a rate controller 204, a first core software function module example 206A, and a second core software. A functional module example 206B, an actual result generator 208, and a comparison function 226. Although these components are made by software, some or all of the components can also be implemented by hardware to increase the execution speed.

模擬初始狀態產生器202接收擷取到的處理器匯流排資訊118,以用來產生一模擬初始記憶體映像(simulated initial memory image)212、一第一核心之模擬初始狀態214A、以及一第二核心之模擬初始狀態214B。隨後,模擬初始記憶體映像212被複製為一模擬結果記憶體映像(simulated result memory image)232,且第一核心之模擬初始狀態214A到被複製為一第一核心之模擬結果狀態234A、而第二核心之模擬初始狀態214B則被複製為到一第二核心之模擬結果狀態234B。為方便說明,假設每一核心104都已儲存一第一檢查點狀態(包括上述的內部暫存器的狀態)、並已讀出(flush)本身快取記憶體的內容、重新執行預定數量指令後、又儲存一第二檢查點及讀出快取記憶體的內容。此外,更假設處理器匯流排傳輸資訊118包括了第一與第二檢查點的匯流排傳輸以及兩者之間的所有傳輸,這些傳輸藉由執行預定數量指令後而產生。請見美國臨時專利申請案第61/297,505號,其於西元2010年1月22日申請,當中描述兩核心104之間的檢查點進行同步之方法。The simulated initial state generator 202 receives the captured processor bus information 118 for generating a simulated initial memory image 212, a simulated initial state 214A of the first core, and a second The core's simulated initial state 214B. Subsequently, the simulated initial memory image 212 is copied as a simulated result memory image 232, and the simulated initial state 214A of the first core is copied to a first core simulation result state 234A, and The second core simulation initial state 214B is then copied to a second core simulation result state 234B. For convenience of explanation, it is assumed that each core 104 has stored a first checkpoint state (including the state of the internal scratchpad described above), and has flushed the contents of the cache memory itself, and re-executes the predetermined number of instructions. After that, a second checkpoint is stored and the contents of the cache memory are read. In addition, it is further assumed that the processor bus transmission information 118 includes bus transmissions of the first and second checkpoints and all transmissions between the two, which are generated by executing a predetermined number of instructions. See U.S. Provisional Patent Application No. 61/297,505, filed on January 22, 2010, which describes the method of synchronizing checkpoints between two cores 104.

根據本發明實施例,模擬初始狀態產生器202藉由以下方法來產生模擬初始記憶體映像212:
(1) 偵測在雙核心處理器102中的第一檢查點以及第二檢查點之間,讀取記憶體112中ㄧ位址的傳輸。
(2) 判斷上述的讀取傳輸是否為第一、第二檢查點之間,第一次對這個位址作讀取的傳輸。
(3) 如果是,則對此傳輸產生一記憶體位址記錄(memory location record),其包括記憶體位址以及讀取的資料值。藉由上述方法,模擬初始狀態產生器202便產生少量的模擬初始記憶體映像212。然而,少量的記憶體映像已能滿足軟體功能模組範例206需求,因為在第一、第二檢查點之間,軟體功能模組範例206只需要讀取之前產生的記憶體位址,若否,即表示在實際的雙核心處理器102上有錯誤產生。
According to an embodiment of the invention, the simulated initial state generator 202 generates the simulated initial memory map 212 by:
(1) Detecting the transmission of the address in the memory 112 between the first checkpoint and the second checkpoint in the dual core processor 102.
(2) It is judged whether the above-mentioned read transmission is the transmission of reading the address for the first time between the first and second checkpoints.
(3) If yes, a memory location record is generated for this transmission, which includes the memory address and the read data value. By the above method, the simulated initial state generator 202 produces a small amount of the simulated initial memory map 212. However, a small amount of memory image can meet the requirements of the software function module example 206, because between the first and second checkpoints, the software function module example 206 only needs to read the previously generated memory address, and if not, That is, an error is generated on the actual dual core processor 102.

模擬初始狀態產生器202可直接從擷取於處理器匯流排傳輸資訊118的第一檢查點狀態,以產生第一核心之模擬初始狀態214A。在本發明實施例中,如上所述,在每個檢查點上,每個核心104根據一預定格式來分別寫入自身狀態資訊至記憶體112中的一預定位址,其使得模擬初始狀態產生器202能在處理器匯流排傳輸資訊118的內部找到第一核心104A之第一檢查點狀態。同樣地,模擬初始狀態產生器202亦直接從擷取於處理器匯流排傳輸資訊118的第一檢查點狀態,來產生第二核心之模擬初始狀態214B。The simulated initial state generator 202 can directly derive the first checkpoint state from the processor bus transfer information 118 to generate the simulated initial state 214A of the first core. In the embodiment of the present invention, as described above, at each checkpoint, each core 104 writes its own state information to a predetermined address in the memory 112 according to a predetermined format, which causes the simulated initial state to be generated. The device 202 can find the first checkpoint status of the first core 104A within the processor bus transfer information 118. Similarly, the simulated initial state generator 202 also generates the simulated initial state 214B of the second core directly from the first checkpoint state retrieved from the processor bus transfer information 118.

實際結果產生器208接收第一圖中的處理器匯流排傳輸資訊118,用以產生一第一核心之實際結果狀態224A、一第二核心之實際結果狀態224B以及一實際結果記憶體映像222。實際結果產生器208直接從擷取於處理器匯流排傳輸資訊118的第二檢查點狀態,來產生第一核心之實際結果狀態224A。根據一在本發明實施例中,如上所述,在每個檢查點上,每個核心104根據一預定格式來分別寫入自身檢查點狀態至記憶體112中一預定位址,其致能使得實際結果產生器208來發現能在處理器匯流排傳輸資訊118的內部的找到第一核心104A之第二檢查點狀態。同樣地,實際結果產生器208亦直接從擷取於處理器匯流排傳輸資訊118的第二檢查點狀態,來產生第二核心之實際結果狀態224B。比較單元226將第一核心之實際結果狀態224A與一第一核心之模擬結果狀態234A進行比較,並將第二核心之實際結果狀態224B與一第二核心之模擬結果狀態234B進行比較的情形,將在後面做更進一步的討論。The actual result generator 208 receives the processor bus transfer information 118 in the first figure for generating a first core actual result state 224A, a second core actual result state 224B, and an actual result memory map 222. The actual result generator 208 generates the actual result state 224A of the first core directly from the second checkpoint state retrieved from the processor bus transfer information 118. According to an embodiment of the present invention, as described above, at each checkpoint, each core 104 writes its own checkpoint state to a predetermined address in the memory 112 according to a predetermined format, which enables The actual result generator 208 finds a second checkpoint state that can be found within the processor bus transfer information 118 to find the first core 104A. Similarly, the actual result generator 208 also generates the actual result state 224B of the second core directly from the second checkpoint state retrieved from the processor bus transfer information 118. The comparing unit 226 compares the actual result state 224A of the first core with the simulation result state 234A of the first core, and compares the actual result state 224B of the second core with the simulation result state 234B of the second core, Further discussion will be given later.

在本發明實施例中,實際結果產生器208藉由以下方法來產生實際結果記憶體映像222:
(1) 偵測在雙核心處理器102中的第一檢查點以及第二檢查點之間,寫入記憶體112中ㄧ位址的傳輸,所述傳輸包括每個核心104在第二檢查點將自身快取記憶體內容寫入至記憶體112中。
(2) 判斷上述寫入傳輸是否為第一、第二檢查點之間,最後一次對這個位址做寫入傳輸。
(3) 如果是,則對此傳輸產生一記憶體位址記錄(memory location record),其包括記憶體位址以及寫入的資料值。藉由上述方法,實際結果產生器208便產生少量的實際結果記憶體映像222。然而,少量的記憶體映像已能滿足軟體功能模組範例206需求,因為在第一、第二檢查點之間,軟體功能模組範例206只需要寫入之前產生的記憶體位址。若否,則表示在實際的雙核心處理器102上有錯誤產生。至於比較單元226將會拿實際結果記憶體映像222與一模擬結果記憶體映像232進行比較的情形,將在後面做更進一步的討論。
In the embodiment of the present invention, the actual result generator 208 generates the actual result memory map 222 by the following method:
(1) detecting transmission of an address in the memory 112 between the first checkpoint and the second checkpoint in the dual core processor 102, the transmission including each core 104 at the second checkpoint The self cache memory content is written to the memory 112.
(2) It is judged whether the above write transmission is between the first and second checkpoints, and the last write address transmission is performed on this address.
(3) If yes, a memory location record is generated for this transmission, which includes the memory address and the written data value. With the above method, the actual result generator 208 produces a small number of actual result memory maps 222. However, a small amount of memory image can satisfy the software function module example 206 requirement, because between the first and second checkpoints, the software function module example 206 only needs to write the previously generated memory address. If not, it indicates that an error has occurred on the actual dual core processor 102. The situation in which the comparison unit 226 will compare the actual result memory map 222 with a simulation result memory map 232 will be discussed further below.

速率控制器204接收第一圖中所擷取到的心跳訊號資訊122,用以產生命令218A至第一核心之軟體功能模組範例206A,以及產生命令218B至第二核心之軟體功能模組範例206B。命令218A動態控制軟體功能模組範例206彼此間執行指令的速率。在本發明實施例中,每一命令218控制軟體功能模組範例執行N個指令,其中N是定義於命令中。在另一實施例中,軟體功能模組範例206為多執行緒(multi-threaded)且透過諸如信號(semaphore) 來相互通訊。在本發明實施例中,命令會控制一個核心104的軟體功能模組範例206去執行X個指令,直到另一核心104的軟體功能模組範例206執行Y個指令後才停止。接下來的圖示將詳細說明速率控制器204如何使用心跳訊號資訊122來發出命令218,以動態地控制軟體功能模組範例206彼此間執行指令的速率。The rate controller 204 receives the heartbeat signal information 122 captured in the first figure, and generates a command 218A to the first core software function module example 206A, and generates a command 218B to the second core software function module example. 206B. Command 218A Dynamic Control Software Function Module Example 206 The rate at which instructions are executed between each other. In the embodiment of the present invention, each command 218 controls the software function module example to execute N instructions, where N is defined in the command. In another embodiment, the software function module example 206 is multi-threaded and communicates with each other through, for example, a semaphore. In the embodiment of the present invention, the command controls a software function module instance 206 of a core 104 to execute X instructions until the software function module example 206 of the other core 104 executes Y instructions. The following illustration will detail how the rate controller 204 uses the heartbeat signal information 122 to issue commands 218 to dynamically control the rate at which the software function module instances 206 execute instructions with each other.

每個軟體功能模組範例206模擬了核心104的系統行為。第一核心之軟體功能模組範例206A存取(read/write)第一核心之模擬結果狀態234A,而第二核心之軟體功能模組範例206B存取(read/write)第二核心之模擬結果狀態234B。此外,每個軟體功能模組範例206也會在執行記憶體存取指令時,依據速率控制器204的命令來讀取且/或寫入模擬結果記憶體映像232。特別是,由第一核心之軟體功能模組範例206A寫入資料至模擬結果記憶體映像232時,第二核心之軟體功能模組範例206B會知道,反之亦然,如此便分別影響了軟體功能模組範例206的模擬結果狀態234。每個軟體功能模組範例206執行完預定數量(例如100,000個)指令後,被複製到第一核心之模擬結果狀態234A的第一核心之模擬初始狀態214A,將會被更新變成真正的第一核心之模擬結果狀態234A,且被複製到第二核心之模擬結果狀態234B的第二核心之模擬初始狀態214B,也將會被更新變成真正的第二核心之模擬結果狀態234B。比較單元226將第一核心之模擬結果狀態234A與第一核心之實際結果狀態224A進行比較,並將第二核心之模擬結果狀態234B與第二核心之實際結果狀態224B進行比較,以判斷真正的雙核心處理器102是否在第一及第二檢查點之間出現了錯誤,而比較結果則由取否指示器(pass/fail indicator)228所指出。此外,每個軟體功能模組範例206執行完預定數量(例如100,000個)指令後,被複製到模擬結果記憶體映像232的模擬初始記憶體映像212之值,將會被更新而變成真正的模擬結果記憶體映像232。比較單元226將模擬結果記憶體映像232與實際結果記憶體映像222進行比較,以判斷真正的雙核心處理器102是否在第一及第二檢查點之間出現了錯誤,比較結果則由取/否指示器所指出。Each software function module example 206 simulates the system behavior of the core 104. The first core software function module example 206A accesses (read/write) the first core simulation result state 234A, and the second core software function module example 206B accesses (read/write) the second core simulation result State 234B. In addition, each software function module example 206 also reads and/or writes the simulation result memory map 232 in accordance with commands of the rate controller 204 when the memory access instruction is executed. In particular, when the data is written by the first core software function module example 206A to the simulation result memory image 232, the second core software function module example 206B will know, and vice versa, thus affecting the software function respectively. The simulation result state 234 of the module example 206. After each predetermined number (eg, 100,000) of instructions is executed by each software function module instance 206, the simulated initial state 214A of the first core copied to the first core simulation result state 234A will be updated to become the first The core simulation result state 234A, and the simulated initial state 214B of the second core copied to the second core simulation result state 234B, will also be updated to become the true second core simulation result state 234B. The comparison unit 226 compares the simulation result state 234A of the first core with the actual result state 224A of the first core, and compares the simulation result state 234B of the second core with the actual result state 224B of the second core to determine the true Whether the dual core processor 102 has an error between the first and second checkpoints is indicated by the pass/fail indicator 228. In addition, after each predetermined number (eg, 100,000) of instructions is executed by each software function module instance 206, the value of the simulated initial memory image 212 copied to the simulation result memory image 232 will be updated to become a true simulation. Result memory map 232. The comparison unit 226 compares the simulation result memory map 232 with the actual result memory map 222 to determine whether the true dual core processor 102 has an error between the first and second checkpoints, and the comparison result is taken from / No indicator indicated.

因此,經由使用速率控制器204作中介的優勢,心跳訊號資訊122可用來動態地控制每個軟體功能模組範例206執行指令的速率。也就是說,速率控制器204可控制軟體功能模組範例206彼此間執行指令的順序,如此而指令可按照核心104存取記憶體的適當順序來定適當的來執行順序,於是能以精確地從各核心104與記憶體112的實際初始狀態來模擬實際的雙核心處理器102之運作行為,此即比較單元226可將實際雙核心處理器102之運作行為與其模擬之運作行為進行比較的原因。Thus, via the advantage of using rate controller 204 as an intermediary, heartbeat signal information 122 can be used to dynamically control the rate at which each software function module instance 206 executes instructions. That is, the rate controller 204 can control the order in which the software function module examples 206 execute instructions between each other, such that the instructions can be executed in an appropriate order in accordance with the proper order in which the core 104 accesses the memory, so that the instructions can be accurately The operational behavior of the actual dual core processor 102 is simulated from the actual initial state of each core 104 and memory 112, that is, the comparison unit 226 can compare the operational behavior of the actual dual core processor 102 with its simulated operational behavior. .

請參考第三圖,係為本發明所揭示之第二圖之操作模擬環境124之方法流程圖。如第三圖所示,軟體功能模組模擬環境124首先根據第十四圖中的步驟S1406,來產生一模擬結果記憶體映像232以及模擬結果狀態234,其會分別與實際結果記憶體映像222以及進行實際結果狀態224進行比較(步驟S1408)。流程開始於步驟S302。Please refer to the third figure, which is a flowchart of a method for operating the simulation environment 124 in the second diagram disclosed in the present invention. As shown in the third figure, the software function module simulation environment 124 first generates a simulation result memory map 232 and a simulation result state 234 according to step S1406 in FIG. 14 which will respectively correspond to the actual result memory map 222. And the actual result state 224 is compared (step S1408). The flow begins in step S302.

在步驟S302中,速率控制器204從資料夾116接收心跳訊號資訊122。接著流程前往步驟S304。
步驟S304中,在心跳訊號資訊122所指出之心跳訊號106的下一個時脈週期中,速率控制器204檢查每個核心104的心跳訊號106之值。心跳訊號106之值會於下列各實施例與圖示中再作說明。接著流程前往步驟S306。
在步驟S306中,速率控制器204判斷核心N(第一核心104A或第二核心104B)是否產生心跳。如果是,則執行步驟S308;否則回到步驟304去檢測下一個時脈週期。
在步驟S308中,速率控制器204發出命令218以驅使核心N的軟體功能模組範例206根據所判斷出的心跳資訊來執行一或多個指令,這部份後面會再詳述。接著流程前往步驟S312。
接著,步驟S312中,核心N的軟體功能模組範例206執行下一指令或與模擬結果記憶體映像232及模擬結果狀態234相關的指令。如果是執行一記憶體讀取指令,則核心N的軟體功能模組範例206將讀取模擬結果記憶體映像232。如果是執行一記憶體寫入指令,則軟體功能模組範例206會對核心N更新模擬結果記憶體映像232。之後又回到步驟S304繼續檢查下一時脈週期。
以下將說明指令執行指示105、心跳產生器103、心跳訊號106、及其被速率控制器204所使用的各種實施例。
請參考第四圖,係為本發明所揭示之雙核心處理器之一具體實施例之功能方塊圖。在第四、五圖中,核心104與心跳訊號106的時脈速率相同。此外,核心104在每一核心時脈週期可執行完一個指令。如圖所示,每個核心104的指令執行指示105都是一個位元,若核心104在核心時脈週期內完成一個指令時,則該位元為真(true),否則為假(false)。同樣地,心跳產生器103產生一位元的心跳訊號106A,若第一核心104A有在核心時脈週期內完成一個指令時,則該位元為真,否則為假;且心跳產生器103產生一位元的心跳訊號106B,若第二核心104B有在核心時脈週期內完成一個指令時,則該位元為真,否則為假。然而,須注意在指令執行指示105及其相關心跳訊號106產生時可能會有延遲時間。在實施例中,由於核心104和心跳訊號106會根據不同的時脈來源運作,因此心跳產生器103包括同步邏輯電路,用以同步指令執行指示105及心跳訊號106。
請參考第五圖,其揭示在第四圖實施例中,速率控制器204之操作例示表。圖表中包括六個時脈週期,標示為0-5,其對應於速率控制器204在第三圖步驟S302中,所收到的心跳訊號資訊122中的六個時脈。速率控制器204從心跳訊號資訊122中收到的第一核心104A的心跳訊號106A以及第二核心104B的心跳訊號106B,亦如圖所示。此外,在每個時脈週期中,根據第三圖中的步驟S306之判斷,圖表標示了速率控制器204在所對應的模擬時脈週期期間,是否控制第一核心之軟體功能模組範例206A去執行指令,以及是否控制第二核心之軟體功能模組範例206B去執行指令。在本例示中,第一核心106A在時脈1, 3-5的心跳訊號106A為真,因此速率控制器204在模擬的時脈1, 3-5期間,將命令第一核心之軟體功能模組範例206A執行指令。另一方面,第二核心106B在時脈0-2, 4的心跳訊號106B為真,因此速率控制器204在模擬的時脈0-2, 4期間,將命令第二核心之軟體功能模組範例206B執行指令。
請再參考第六圖,係為本發明所揭示之雙核心處理器另一實施例之功能方塊圖。第六、七圖相似於第四、五圖,核心104與心跳訊號106的時脈速率相同。然而,在本實施例中,核心104在每一核心時脈週期可執行完多個指令,例如三個,亦可為其他數量,不以揭露者為限。如圖所示,每個核心104的指令執行指示105都是兩個位元,用來指出核心104在一個核心時脈週期內執行完的指令數量。同樣地,心跳產生器103產生兩位元的心跳訊號106A,用來指出第一核心104A在一個核心時脈週期內所執行完的指令數量,並產生兩位元的心跳訊號106B,用來指出第二核心104B在一個核心時脈週期內執行完的指令數量。
請參考第七圖,係為本發明所揭示之依據第六圖實施例之速率控制器之操作例示表,其與第五圖的圖表相似。然而,如圖所示,在每個時脈週期中,速率控制器204從心跳訊號資訊122中收到的第一核心104A的心跳訊號106A之數值會包含0-3,而不是0(假)或1(真)。同樣地,在每個時脈週期中,根據第三圖中的步驟S306之判斷,圖表標示了速率控制器204在所對應的模擬時脈週期期間,是否根據第三圖中的步驟S308,會控制第一核心之軟體功能模組範例206A去執行指令(如果是,會標示有多少指令被執行)、以及是否控制第二核心之軟體功能模組範例206B去執行指令(如果是,會標示有多少指令被執行)。本例示中,第一核心104A在時脈0, 2的心跳訊號106A是0,在時脈4時是1,在時脈3時是2,以及在時脈1, 5時是3,因此,速率控制器204將命令第一核心之軟體功能模組範例206A在模擬的時脈0, 2期間,執行0個指令;在模擬的時脈4期間,執行1個指令;在模擬的時脈3期間,執行2個指令;以及在模擬的時脈1, 5期間,執行3個指令。另一方面,第二核心104B在時脈3, 5的心跳訊號106B是0,在時脈0, 4時是1,在時脈1, 2時是2,而沒有任一時脈的心跳訊號106B為3,因此,速率控制器204命令第二核心之軟體功能模組範例206B在模擬的時脈3, 5期間,執行0個指令;在模擬的時脈0, 4期間,執行1個指令;在模擬的時脈1, 2期間,執行2個指令;而不會有任一時脈執行3個指令。
請參考第八圖,係為本發明所揭示之雙核心處理器之又一具體實施例之功能方塊圖。第八、九圖相似於第六、七圖,每個核心104在每一核心時脈週期可完成多個指令。然而,在本實施例中,核心104之時脈速率是心跳訊號106之時脈速率的好幾倍,例如十倍,亦可為其他數量,不以揭露者為限。此外,每個核心104的指令執行指示105都是一個位元,當核心104完成一預訂數量指令,則該位元為真(true),否則為假(false)。在實施例中,所述的預訂數量為32個,但不以揭露者為限。具體來說,預訂數量至少要與時脈比率(clock ratio)以及每一核心104可於一時脈週期中完成的最大數量指令之積一樣大。一實施例中,核心104的完成單元(retire unit)包括一計數器(counter),用來計數每時脈週期中所完成的指令數量,而指令執行指示105是計數器的有效位元(effectively bit)M(M = log2N),其中N是時脈比率以及核心104可於一時脈週期中完成的最大數量指令之積。同樣地,心跳產生器103依據指令執行指示105A產生一位元的心跳訊號106A,以及依據指令執行指示105B產生一位元的心跳訊號106B。
請參考第九圖,係為本發明所揭示之依據第八圖實施例之操作速率控制器之操作例示表。本實施例的圖表與第五圖相似,但須注意第五圖和第七圖中,因為核心104之時脈速率與心跳訊號106之時脈速率是一樣的,每時脈週期的心跳訊號資訊122指出在相對應的核心時脈週期中完成一或多個指令,因此,圖中的核心和心跳訊號106的時脈週期是相互對應的(時脈0-5)。然而,在第九圖中,每時脈週期的心跳訊號資訊122係指出在多個時脈週期中所完成的指令數量,因此,圖中所示的時脈週期係僅與心跳訊號106相對應。此外,在每個時脈週期中,根據第三圖中的步驟S306之判斷,圖表指出了速率控制器204在所對應的模擬時脈週期期間,是否如第三圖的步驟S308所述,係控制透過命令218A指示第一核心之軟體功能模組範例206A去執行32個指令,以及是否控制透過命令218B指示第二核心之軟體功能模組範例206B去執行32個指令。本例示中,第一核心106A在時脈1, 5的心跳訊號106A為真,因此,在心跳訊號106的時脈1, 5時會模擬出32個時脈,速率控制器204便在模擬出的32個時脈期間,透過命令218A指示第一核心之軟體功能模組範例206A執行32個指令。另一方面,由於第二核心106B在時脈0, 2, 4的心跳訊號106B為真,因此,在心跳訊號106的時脈0, 2, 4時會模擬出32個時脈,而速率控制器204便在模擬出的32個時脈期間,透過命令218B指示第一核心之軟體功能模組範例206B執行32個指令。
請參考第十圖,係為本發明所揭示之雙核心處理器之再一具體實施例之功能方塊圖。第十、十一圖相似於第八、九圖,核心104在每一核心時脈週期可完成多個指令,且當核心104完成一預訂數量指令(如32個),則一位元的心跳訊號106為真,否則指令執行指示105為假(false)。然而,本實施例中與第六圖相似,每個指令執行指示105是兩位元,用來指出在每時脈週期中,核心104完成的指令數量。本實施例中,一旦核心104完成預定數量的指令,心跳產生器103便在心跳訊號106上產生一真值。在實施例中,心跳產生器103包括一計數器(counter),用來計數每時脈週期中已完成的指令數量,而心跳訊號106是計數器的有效位元M(M = log2N)。
請參考第十一圖,係為本發明所揭示之依據第十圖實施例之速率控制器之操作例示表。本實施例圖表與第九圖相似,所接收到的心跳訊號資訊122、圖示意思也相同,在此就不予贅述。
請參考第十二圖,係為本發明所揭示之雙核心處理器之更一具體實施例之功能方塊圖。第十二、十三圖相似於第十、十一圖,核心104在每一核心時脈週期可完成多個指令,且核心104之時脈速率是心跳訊號106之時脈速率的好幾倍。每個指令執行指示105是兩位元,用來指出在每時脈週期中,核心104所完成的指令數量。然而,本實施例更包括一除錯記憶體陣列(debug memory array)1212,心跳產生器103根據指令執行指示105來寫入心跳訊號資訊122至除錯記憶體陣列1212。一在實施例中,心跳產生器103是將每個時脈週期所收到的指令執行指示105寫進除錯記憶體陣列1212。心跳產生器103隨後再從除錯記憶體陣列1212讀出心跳訊號資訊122,並在透過處理器匯流排114上將其寫入系統記憶體112中。當心跳訊號資訊122寫進系統記憶體112時,邏輯分析器108便擷取了心跳訊號資訊122。心跳產生器103係將心跳訊號資訊122寫入系統記憶體112的一預定位址,如此便致能一來邏輯分析器108便能將其儲存至資料夾116,而寫入的心跳訊號資訊122在之後會被速率控制器204使用,一如第三圖之所示。本實施例的心跳訊號資訊122係與第六、七圖的相似,亦即每時脈的心跳訊號資訊122係用以指出核心104在該時脈所完成的指令數量。心跳產生器103會對雙核心處理器102的一匯流排介面單元(bus interface unit)1216產生需求(requests),匯流排介面單元1216係使雙核心處理器102連接至處理器匯流排114的介面。根據一在實施例中,心跳產生器103產生的需求是最低優先權的需求,其可傳至匯流排介面單元1216,並當處理器匯流排114閒置(idle)時,匯流排介面單元1216才試著產生在處理器匯流排114上的傳輸,以將心跳訊號資訊122從除錯記憶體陣列1212寫進系統記憶體112。如此可能減少因侵入性地在處理器匯流排114寫入心跳訊號資訊122(相對於第四至十一圖以及第十五至十六圖中,在非侵入式的側波帶匯流排(noninvasive sideband bus)上寫入心跳訊號資訊122),因而影響雙核心處理器102操作時序之可能性,是以當心跳特徵被啟動後錯誤不再出現。心跳產生器103監控整個除錯記憶體陣列1212,當發現除錯記憶體陣列1212快被寫滿時,便與匯流排介面單元1216通訊來提高需求的優先權,以盡快寫入心跳訊號資訊122。在實施例中,除錯記憶體陣列1212係類似於雙核心處理器102的L2快取記憶體(L2 cache memory)1214的記憶體陣列,都被核心104共享。雙核心處理器102可更佳地設計成將除錯記憶體陣列1212設計成為L2快取記憶體1214的附屬物備份,藉此可共享通用的控制邏輯與電路佈局。一在實施例中,除錯記憶體陣列1212的容量是夠大的,以至於在兩個檢查點之間隔便相對夠小,所以在檢查點到達之前心跳產生器103都不需要再寫入心跳訊號資訊122,這個好處是基於利用非侵入性的方式來寫入心跳資訊而來。
請參考第十三圖,係為本發明所揭示之依據第十二圖實施例之速率控制器之操作例示表。本實施例圖表與第七圖相似,所接收到的心跳訊號資訊122、圖示意思也相同,在此就不予贅述。但須注意相對於第十二圖而言本實施例沒有心跳訊號106,因此每時脈週期的心跳訊號資訊122指出每時脈所完成的指令數量,其中圖中所示的時脈週期0-5係與核心時脈週期相對應。
上述實施例的相對優缺點將在以下詳述。第四、五圖以及第八至十三圖的優點是,僅需在多核心處理器封裝上安置較小數量的外部接腳(pin),亦即每核心104一個接腳。因此,這些實施例會比第六、七圖在晶片大小上更具優勢(scalable),因為第六、七圖的其每個核心104需要多個接腳,若核心104數量更多,這個問題更明顯。但有鑒於目前時下的微處理器都是超純量架構(superscalar),且能於每時脈週期執行多個指令,第四、五圖之實施例卻限制每核心104只能在每核心時脈週期中完成單一指令。相反地,第六至十三圖之實施例的優點在於,可支援核心104於每時脈週期中完成多個指令。此外,第四至七圖的實施例限制核心週期與心跳訊號106匯流排之週期速率相同,但有鑒於許多時下的微處理器之時脈速率都很高,使得實作上無法隨核心時脈速率來運作一外部匯流排。相反地,第八至十三圖以及第十五至十六圖(描述在後)的優點在於,可支援核心週期頻率為心跳訊號106的匯流排頻率的倍數之架構。如上所述,第十二、十三圖的實施例之缺點是,其為具侵入性而可能會影響多核心處理器之程式執行的時序,如此當致能心跳時,可能會導致錯過一些錯誤。然而,第十二、十三圖的實施例之優點在於,不需要額外的外部接腳,而這個優點在某些應用上卻是必須的。
請參考第十四圖,係為本發明所揭示之第二圖之操作模擬環境之方法流程圖。軟體功能模組模擬環境124被用來判斷在第一及第二檢查點之間是否有發生錯誤,亦可被運用在系統100操作了一段時間且產生許多組檢查點之後,用以判斷儲存在第一圖的資料夾116中的多組第一及第二檢查點之間是否有發生錯誤。流程起始於步驟S1402。
首先,步驟S1402中,實際結果產生器208使用處理器匯流排傳輸資訊118來產生由雙核心處理器102在兩檢查點之間,執行預定數量指令的實際結果記憶體映像222以及實際結果狀態224,一如第二圖之所示。流程接著前往步驟S1404。
接著,步驟S1404中,模擬初始狀態產生器202使用處理器匯流排傳輸資訊118來產生模擬初始記憶體映像212以及模擬初始狀態214,一如第二圖之所示。流程接著前往步驟S1406。
步驟S1406中,複製模擬初始記憶體映像212到模擬結果記憶體映像232,且複製第一核心之模擬初始狀態214A到第一核心之模擬結果狀態234A,以及複製第二核心之模擬初始狀態214B到第二核心之模擬結果狀態234B。隨後,速率控制器204以及軟體功能模組範例206使用複製的映像來更新模擬結果記憶體映像232以及模擬結果狀態234,一如第三圖之所示。須注意在第三圖的操作中,於每個核心的指令執行裡,可能發生其中一個核心104執行一記憶體寫入指令、而另一個核心104執行一記憶體讀取指令,諸如上述的信號寫入與讀取(semaphore write and read)的情形。如果在這類的讀寫動作之間的指令數量小於心跳訊號資訊122的間隔(granularity),則在雙核心處理器102的實際執行期間,會存在多種可能影響實際記憶體存取的順序。因此,軟體功能模組模擬環境124若偵測到這個情形,將假設記憶體存取的可能順序,再依照該可能順序來執行步驟S1406,並記錄操作情形。須注意第八至十一圖之實施例所產生的心跳訊號資訊122,其間隔大於第六至七圖以及第十二至十三圖;而第十二至十三圖之實施例所產生的心跳訊號資訊122之間隔又大於第四至五圖。較大的間隔會使軟體功能模組模擬環境124需要更多的時間來完成第十四圖的操作程序,因為記憶體存取順序之數量可能會很多。流程接著前往步驟S1408。
步驟S1408中,比較單元226將步驟S1406產生的模擬結果與步驟S1402產生的實際結果進行比較,流程接著前往步驟S1412。
在步驟S1412中,將判斷比較結果是否相符合,如果是,則進行步驟S1414,否則進行步驟S1416。
步驟S1414中,比較單元226在取否指示器228上產生一取值(pass value),流程中止於步驟S1414。
在步驟S1416中,軟體功能模組模擬環境124判斷是否有其他可能的記憶體存取順序未被執行,如果有,則回到步驟S1406,並使用不同的記憶體存取順序來操作;否則,則進行步驟S1418。
在步驟S1418中,比較單元226在取否指示器228上產生一否值(fail value)。流程中止於步驟S1418。
請參考第十五圖,係為本發明第一圖所揭示雙核心處理器102之另一實施例之功能方塊圖。第十五、十六圖相似於第十、十一圖,每個核心104在每一核心時脈週期可完成多個指令,且核心104之時脈速率是心跳訊號106之時脈速率的好幾倍,每個指令執行指示105是兩位元,用來指出每時脈週期中,核心104所完成的指令數量。然而,本在第十五圖的實施例裡,的在每個匯流排時脈週期中,心跳產生器103產生兩位元的心跳訊號106A, 106B,其分別指出第一核心104A及第二核心104B執行所完成的指令數量為0,8,16,或32。
請參考第十六圖,係為本發明第二圖所揭示之速率控制器204,在第十五圖實施例下之操作例示表。本實施例圖表部份與第七、九、十一圖相似,如圖所示,每核心時脈中,速率控制器204從心跳訊號資訊122中接收的心跳訊號106之值從0至3。同樣地,在每個時脈週期中,根據第三圖中的步驟S306之判斷,圖表標示了速率控制器204根據第三圖中的步驟S308,在所對應的模擬時脈週期期間,是否透過命令218A以控制第一核心之軟體功能模組範例206A去執行指令(如果是,會標示有多少指令被執行),以及是否透過命令218B來控制第二核心之軟體功能模組範例206B去執行指令(如果是,會標示有多少指令被執行)。在本例示中,第一核心104A在時脈0, 2的心跳訊號106A是0,在時脈4時是1,在時脈3時是2,以及在時脈1, 5時是3。因此,速率控制器204透過命令218A指示第一核心之軟體功能模組範例206A在模擬的時脈0, 2期間執行0個指令;在模擬的時脈4期間執行8個指令;在模擬的時脈3期間執行16個指令;以及在模擬的時脈1, 5期間執行32個指令。另一方面,第二核心104B在時脈3, 5的心跳訊號106B是0,在時脈0, 4時是1,在時脈1, 2時是2,而沒有任一時脈的心跳訊號為3。因此,速率控制器204會透過命令218B控制第二核心之軟體功能模組範例206B在模擬的時脈3, 5期間執行0個指令;在模擬的時脈0, 4期間執行8個指令;在模擬的時脈1, 2期間執行16個指令;而不會有任一時脈執行32個指令。
第十五、十六圖之實施例的優點,是在第十四圖的操作下,可提供的間隔比第八至十一圖之實施例更好。此外,本發明實施例中的心跳訊號106,亦可用在具有更細間隔的例子中,而在這種情形下的錯誤模式並不需要讓每個核心104皆重新產生錯誤。例如,假設在側波帶匯流排上有八個心跳訊號106,多核心處理器包括四個核心104,但只必須運作兩個核心104來重新產生錯誤。這個情況下,心跳產生器103編寫成只使用八個心跳訊號106位元中的四個,用來分別指出每個核心104完成的指令數量(0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 或32個)。
雖然上述實施例中,每核心104係執行單一執行緒,但亦可設計成同時執行多個執行緒,並由心跳資訊指出執行緒所完成的指令。
此外,雖然上述實施例中,兩核心104具有相同的核心時脈週期,但亦可具有不同的核心時脈週期,而由心跳訊號資訊122指出兩核心速率,而速率控制器204在產生命令218時便可以列入考量。
In step S302, the rate controller 204 receives the heartbeat signal information 122 from the folder 116. The flow then proceeds to step S304.
In step S304, the rate controller 204 checks the value of the heartbeat signal 106 of each core 104 in the next clock cycle of the heartbeat signal 106 indicated by the heartbeat signal information 122. The value of the heartbeat signal 106 will be further illustrated in the following embodiments and figures. The flow then proceeds to step S306.
In step S306, the rate controller 204 determines whether the core N (the first core 104A or the second core 104B) generates a heartbeat. If yes, go to step S308; otherwise, go back to step 304 to detect the next clock cycle.
In step S308, the rate controller 204 issues a command 218 to drive the software function module instance 206 of the core N to execute one or more instructions based on the determined heartbeat information, as will be described in more detail later. The flow then proceeds to step S312.
Next, in step S312, the software function module example 206 of the core N executes the next instruction or an instruction related to the simulation result memory map 232 and the simulation result state 234. If a memory read command is executed, the software function module instance 206 of core N will read the analog result memory map 232. If a memory write command is executed, the software function module example 206 updates the simulation result memory map 232 to the core N. Then, returning to step S304, the next clock cycle is continued.
The instruction execution instructions 105, the heartbeat generator 103, the heartbeat signal 106, and various embodiments used by the rate controller 204 are described below.
Please refer to the fourth figure, which is a functional block diagram of a specific embodiment of a dual core processor disclosed in the present invention. In the fourth and fifth figures, the core 104 has the same clock rate as the heartbeat signal 106. In addition, core 104 can execute one instruction per core clock cycle. As shown, the instruction execution indication 105 of each core 104 is a bit. If the core 104 completes an instruction within the core clock cycle, then the bit is true (true), otherwise false (false) . Similarly, the heartbeat generator 103 generates a one-bit heartbeat signal 106A, which is true if the first core 104A has completed an instruction within the core clock cycle, otherwise false; and the heartbeat generator 103 generates One-bit heartbeat signal 106B, if the second core 104B has completed an instruction in the core clock cycle, then the bit is true, otherwise it is false. However, it should be noted that there may be a delay time when the instruction execution instruction 105 and its associated heartbeat signal 106 are generated. In an embodiment, since the core 104 and the heartbeat signal 106 operate according to different clock sources, the heartbeat generator 103 includes synchronization logic for synchronizing the instruction execution indication 105 and the heartbeat signal 106.
Please refer to the fifth figure, which discloses an operation example table of the rate controller 204 in the fourth embodiment. The six clock cycles are included in the chart, labeled 0-5, which correspond to the six clocks in the heartbeat signal information 122 received by the rate controller 204 in the third step S302. The heartbeat signal 106A of the first core 104A and the heartbeat signal 106B of the second core 104B received by the rate controller 204 from the heartbeat signal information 122 are also as shown. In addition, in each clock cycle, according to the judgment of step S306 in the third figure, the graph indicates whether the rate controller 204 controls the first core software function module example 206A during the corresponding analog clock cycle. The instruction is executed, and whether the second core software function module example 206B is controlled to execute the instruction. In this example, the heartbeat signal 106A of the first core 106A at clocks 1, 3-5 is true, so the rate controller 204 will command the software function of the first core during the simulated clocks 1, 3-5. Group Example 206A executes the instructions. On the other hand, the second core 106B is true at the heartbeat signal 106B of the clock 0-2, 4, so the rate controller 204 will command the second core software function module during the simulated clock 0-2, 4. Example 206B executes the instructions.
Please refer to the sixth figure again, which is a functional block diagram of another embodiment of the dual core processor disclosed in the present invention. The sixth and seventh graphs are similar to the fourth and fifth graphs, and the core 104 has the same clock rate as the heartbeat signal 106. However, in this embodiment, the core 104 can execute multiple instructions in each core clock cycle, for example, three, or other numbers, not limited to the exposer. As shown, the instruction execution instructions 105 for each core 104 are two bits that indicate the number of instructions that the core 104 has executed during a core clock cycle. Similarly, the heartbeat generator 103 generates a two-element heartbeat signal 106A for indicating the number of instructions executed by the first core 104A during a core clock cycle and generating a two-element heartbeat signal 106B for indicating The number of instructions executed by the second core 104B during a core clock cycle.
Please refer to the seventh figure, which is an operation example of the rate controller according to the sixth embodiment, which is similar to the chart of the fifth figure. However, as shown, in each clock cycle, the value of the heartbeat signal 106A of the first core 104A received by the rate controller 204 from the heartbeat signal information 122 will include 0-3 instead of 0 (false). Or 1 (true). Similarly, in each clock cycle, according to the judgment of step S306 in the third figure, the graph indicates whether the rate controller 204 will follow the step S308 in the third figure during the corresponding analog clock cycle. Controlling the first core software function module example 206A to execute the instruction (if yes, indicating how many instructions are executed), and whether to control the second core software function module example 206B to execute the instruction (if yes, it will be marked with How many instructions are executed). In the present example, the first core 104A is 0 in the heartbeat 0, 2, 0 in the clock 4, 2 in the clock 3, and 3 in the clock 1, 5, therefore, The rate controller 204 will execute the first core software function module example 206A during the simulated clock 0, 2, executing 0 instructions; during the simulated clock 4, executing 1 instruction; in the simulated clock 3 During the execution, 2 instructions are executed; and during the simulated clock 1, 5, 3 instructions are executed. On the other hand, the second core 104B has a heartbeat signal 106B of 0 in clocks 3, 5, 1 in clocks 0, 4, and 2 in clocks 1, 2, and no heartbeat signal 106B in any of the clocks. 3, therefore, the rate controller 204 commands the second core software function module example 206B to execute 0 instructions during the simulated clocks 3, 5; during the simulated clocks 0, 4, one instruction is executed; During the simulated clock 1, 2, 2 instructions are executed; there is no clock to execute 3 instructions.
Please refer to the eighth figure, which is a functional block diagram of still another embodiment of the dual core processor disclosed in the present invention. The eighth and ninth diagrams are similar to the sixth and seventh diagrams, and each core 104 can complete multiple instructions per core clock cycle. However, in this embodiment, the clock rate of the core 104 is several times the clock rate of the heartbeat signal 106, for example, ten times, and may be other numbers, not limited by the exposer. In addition, the instruction execution instructions 105 of each core 104 are all one bit. When the core 104 completes a predetermined number of instructions, the bit is true (true), otherwise it is false. In an embodiment, the number of reservations is 32, but is not limited to the disclosure. In particular, the number of subscriptions must be at least as large as the product of the clock ratio and the maximum number of instructions that each core 104 can complete in a clock cycle. In one embodiment, the retire unit of the core 104 includes a counter for counting the number of instructions completed in each clock cycle, and the instruction execution indication 105 is an effective bit of the counter. M (M = log2N), where N is the product of the clock ratio and the maximum number of instructions that core 104 can complete in a clock cycle. Similarly, the heartbeat generator 103 generates a one-bit heartbeat signal 106A in accordance with the instruction execution instruction 105A, and generates a one-bit heartbeat signal 106B in accordance with the instruction execution instruction 105B.
Please refer to the ninth figure, which is an operation example of the operation rate controller according to the eighth embodiment of the present invention. The chart of this embodiment is similar to the fifth figure, but it should be noted that in the fifth and seventh figures, since the clock rate of the core 104 is the same as the clock rate of the heartbeat signal 106, the heartbeat signal information per clock cycle is the same. 122 indicates that one or more instructions are completed in the corresponding core clock cycle, and therefore, the clock cycles of the core and heartbeat signals 106 in the figure correspond to each other (clocks 0-5). However, in the ninth figure, the heartbeat signal information 122 per clock cycle indicates the number of instructions completed in a plurality of clock cycles, and therefore, the clock cycle shown in the figure corresponds to only the heartbeat signal 106. . Further, in each clock cycle, according to the judgment of step S306 in the third figure, the graph indicates whether the rate controller 204 is during the corresponding analog clock cycle, as described in step S308 of the third figure. Control passes the command 218A to instruct the first core software function module instance 206A to execute 32 instructions, and whether to control the second core software function module instance 206B to execute 32 instructions via the command 218B. In this example, the heartbeat signal 106A of the first core 106A at the clocks 1, 5 is true. Therefore, 32 clocks are simulated at the clocks 1 and 5 of the heartbeat signal 106, and the rate controller 204 simulates During the 32 clock periods, the first core software function module example 206A is instructed to execute 32 instructions via command 218A. On the other hand, since the heartbeat signal 106B of the second core 106B at the clocks 0, 2, 4 is true, 32 clocks are simulated at the clocks 0, 2, 4 of the heartbeat signal 106, and the rate control is performed. The device 204 instructs the first core software function module instance 206B to execute 32 instructions during the simulated 32 clocks via command 218B.
Please refer to the tenth figure, which is a functional block diagram of still another embodiment of the dual core processor disclosed in the present invention. The tenth and eleventh figures are similar to the eighth and ninth figures. The core 104 can complete a plurality of instructions in each core clock cycle, and when the core 104 completes a predetermined number of instructions (such as 32), the one-bit heartbeat Signal 106 is true, otherwise instruction execution indication 105 is false (false). However, in the present embodiment, similar to the sixth figure, each instruction execution instruction 105 is a two-bit element for indicating the number of instructions that the core 104 completes in each clock cycle. In this embodiment, once the core 104 completes a predetermined number of instructions, the heartbeat generator 103 generates a true value on the heartbeat signal 106. In an embodiment, the heartbeat generator 103 includes a counter for counting the number of instructions completed in each clock cycle, and the heartbeat signal 106 is the valid bit M of the counter (M = log2N).
Please refer to the eleventh figure, which is an operation example of the rate controller according to the tenth embodiment of the present invention. The diagram of this embodiment is similar to the ninth diagram, and the received heartbeat signal information 122 and the meanings of the diagrams are also the same, and will not be described herein.
Please refer to the twelfth figure, which is a functional block diagram of a further embodiment of the dual core processor disclosed in the present invention. The twelfth and thirteenth figures are similar to the tenth and eleventh views. The core 104 can complete a plurality of instructions in each core clock cycle, and the clock rate of the core 104 is several times the clock rate of the heartbeat signal 106. Each instruction execution indication 105 is a two-bit element that indicates the number of instructions that the core 104 has completed in each clock cycle. However, the embodiment further includes a debug memory array 1212, and the heartbeat generator 103 writes the heartbeat signal information 122 to the debug memory array 1212 according to the instruction execution instruction 105. In one embodiment, the heartbeat generator 103 writes an instruction execution indication 105 received for each clock cycle into the debug memory array 1212. The heartbeat generator 103 then reads the heartbeat signal information 122 from the debug memory array 1212 and writes it to the system memory 112 over the processor bus 114. When the heartbeat signal 122 is written into the system memory 112, the logic analyzer 108 retrieves the heartbeat signal information 122. The heartbeat generator 103 writes the heartbeat signal information 122 to a predetermined address of the system memory 112, so that the logic analyzer 108 can store it to the folder 116, and the heartbeat signal information 122 is written. It will then be used by the rate controller 204 as shown in the third figure. The heartbeat signal information 122 of the present embodiment is similar to the sixth and seventh figures, that is, the heartbeat signal information 122 of each clock is used to indicate the number of instructions that the core 104 has completed at the clock. The heartbeat generator 103 generates a request for a bus interface unit 1216 of the dual core processor 102, and the bus interface unit 1216 connects the dual core processor 102 to the interface of the processor bus 114. . According to an embodiment, the demand generated by the heartbeat generator 103 is the lowest priority requirement, which can be passed to the bus interface unit 1216, and when the processor bus 114 is idle, the bus interface unit 1216 is Attempts to generate a transmission on processor bus 114 to write heartbeat signal information 122 from debug memory array 1212 into system memory 112. It is thus possible to reduce the invasive write of the heartbeat signal information 122 in the processor bus 114 (relative to the non-invasive sideband busbars (relative to the fourth to eleventh and fifteenth to sixteenth figures). The sideband bus) writes the heartbeat signal information 122), thus affecting the possibility of the dual core processor 102 operating sequence, so that the error no longer occurs when the heartbeat feature is activated. The heartbeat generator 103 monitors the entire debug memory array 1212. When the debug memory array 1212 is found to be full, it communicates with the bus interface unit 1216 to increase the priority of the demand to write the heartbeat information 122 as soon as possible. . In an embodiment, the debug memory array 1212 is similar to the memory array of the L2 cache memory 1214 of the dual core processor 102, and is shared by the core 104. The dual core processor 102 can be better designed to design the debug memory array 1212 as an appendix backup of the L2 cache memory 1214, thereby sharing common control logic and circuit layout. In an embodiment, the capacity of the debug memory array 1212 is large enough that the interval between the two checkpoints is relatively small, so the heartbeat generator 103 does not need to write the heartbeat before the checkpoint arrives. Signal Info 122, the benefit is based on the use of a non-intrusive way to write heartbeat information.
Please refer to the thirteenth figure, which is an operation example of the rate controller according to the embodiment of the twelfth embodiment disclosed in the present invention. The diagram of this embodiment is similar to the seventh diagram, and the received heartbeat signal information 122 and the meanings of the diagrams are also the same, and will not be described herein. However, it should be noted that the present embodiment does not have a heartbeat signal 106 as compared to the twelfth figure. Therefore, the heartbeat signal information 122 per clock cycle indicates the number of instructions completed per clock, wherein the clock period shown in the figure is 0- The 5 series corresponds to the core clock cycle.
The relative advantages and disadvantages of the above embodiments will be described in detail below. The advantages of the fourth, fifth and eighth to thirteenth drawings are that only a small number of external pins need to be placed on the multi-core processor package, i.e. one pin per core 104. Therefore, these embodiments will be more scalable in terms of chip size than the sixth and seventh figures, because each of the cores 104 of the sixth and seventh figures requires multiple pins, and if the number of cores 104 is larger, the problem is even more obvious. However, in view of the fact that current microprocessors are superscalar and can execute multiple instructions per clock cycle, the fourth and fifth embodiments limit each core 104 to only per core. A single instruction is completed in the clock cycle. Conversely, an advantage of the sixth to thirteenth embodiment is that the core 104 can be supported to complete multiple instructions per clock cycle. In addition, the embodiments of the fourth to seventh embodiments limit the core cycle to the same cycle rate as the heartbeat signal 106 bus, but in view of the fact that the clock rate of many current microprocessors is high, the implementation cannot be performed with the core. The pulse rate is used to operate an external bus. Conversely, the eighth to thirteenth and fifteenth to sixteenth (described later) have the advantage of supporting an architecture in which the core period frequency is a multiple of the bus frequency of the heartbeat signal 106. As mentioned above, the disadvantages of the embodiments of the twelfth and thirteenth figures are that they are intrusive and may affect the timing of execution of programs of the multi-core processor, so that when the heartbeat is enabled, some errors may be missed. . However, the embodiment of the twelfth and thirteenth figures has the advantage that no additional external pins are required, and this advantage is necessary in some applications.
Please refer to FIG. 14 for a flow chart of a method for operating a simulation environment in the second diagram disclosed in the present invention. The software function module simulation environment 124 is used to determine whether an error has occurred between the first and second checkpoints, and may also be used after the system 100 has been operating for a period of time and generates a plurality of sets of checkpoints to determine storage. Whether there is an error between the plurality of sets of first and second check points in the folder 116 of the first figure. The flow starts in step S1402.
First, in step S1402, the actual result generator 208 uses the processor bus transfer information 118 to generate an actual result memory map 222 and an actual result state 224 that are executed by the dual core processor 102 between the two checkpoints for a predetermined number of instructions. , as shown in the second figure. The flow then proceeds to step S1404.
Next, in step S1404, the simulated initial state generator 202 uses the processor bus transfer information 118 to generate the simulated initial memory map 212 and the simulated initial state 214, as shown in the second figure. The flow then proceeds to step S1406.
In step S1406, the simulated initial memory image 212 is copied to the simulation result memory map 232, and the simulated initial state 214A of the first core is copied to the simulation result state 234A of the first core, and the simulated initial state 214B of the second core is copied to The second core simulation result state 234B. The rate controller 204 and the software function module example 206 then use the copied image to update the simulation result memory map 232 and the simulation result state 234, as shown in the third figure. It should be noted that in the operation of the third figure, in each core instruction execution, it may happen that one core 104 executes a memory write instruction and the other core 104 executes a memory read instruction, such as the above signal. Semaphore write and read. If the number of instructions between such read and write actions is less than the granularity of the heartbeat message information 122, during the actual execution of the dual core processor 102, there may be multiple sequences that may affect the actual memory access. Therefore, if the software function module simulation environment 124 detects this situation, it will assume the possible order of memory access, and then perform step S1406 according to the possible sequence, and record the operation situation. It should be noted that the heartbeat signal information 122 generated by the embodiment of the eighth to eleventh embodiments is greater than the sixth to seventh maps and the twelfth to thirteenth graphs; and the embodiments of the twelfth to thirteenth graphs The interval between heartbeat signal information 122 is greater than that of the fourth to fifth figures. Larger intervals will cause the software function module emulation environment 124 to take more time to complete the operation of Figure 14, because the number of memory access sequences may be many. The flow then proceeds to step S1408.
In step S1408, the comparison unit 226 compares the simulation result generated in step S1406 with the actual result generated in step S1402, and the flow proceeds to step S1412.
In step S1412, it is judged whether or not the comparison result coincides, and if so, step S1414 is performed, otherwise step S1416 is performed.
In step S1414, the comparison unit 226 generates a pass value on the take-off indicator 228, and the flow terminates in step S1414.
In step S1416, the software function module simulation environment 124 determines whether other possible memory access sequences are not executed, and if so, returns to step S1406 and operates using different memory access sequences; otherwise, Then, step S1418 is performed.
In step S1418, the comparison unit 226 generates a fail value on the take-off indicator 228. The flow is terminated in step S1418.
Please refer to the fifteenth figure, which is a functional block diagram of another embodiment of the dual core processor 102 disclosed in the first figure of the present invention. The fifteenth and sixteenth diagrams are similar to the tenth and eleventh diagrams. Each core 104 can complete multiple instructions in each core clock cycle, and the clock rate of the core 104 is the clock rate of the heartbeat signal 106. In times, each instruction execution instruction 105 is a two-bit element that indicates the number of instructions that the core 104 has completed per clock cycle. However, in the embodiment of the fifteenth embodiment, in each bus cycle, the heartbeat generator 103 generates two-bit heartbeat signals 106A, 106B indicating the first core 104A and the second core, respectively. The number of instructions executed by 104B is 0, 8, 16, or 32.
Please refer to the sixteenth figure, which is an operation example table of the rate controller 204 disclosed in the second figure of the present invention, in the fifteenth embodiment. The chart portion of this embodiment is similar to the seventh, ninth, and eleventh figures. As shown, the value of the heartbeat signal 106 received by the rate controller 204 from the heartbeat signal information 122 is from 0 to 3 per core clock. Similarly, in each clock cycle, according to the judgment of step S306 in the third figure, the graph indicates whether the rate controller 204 passes through the corresponding analog clock cycle according to step S308 in the third figure. The command 218A controls the first core software function module example 206A to execute the instruction (if yes, how many instructions are executed), and whether the second core software function module example 206B is used to execute the instruction through the command 218B. (If yes, it will indicate how many instructions are executed). In this illustration, the first core 104A has a heartbeat signal 106A of 0 in clocks 0, 2 in clocks 4, 2 in clocks 3, and 3 in clocks 1, 5. Thus, rate controller 204 instructs first core software function module instance 206A to execute 0 instructions during the simulated clock 0, 2 via command 218A; 8 instructions during the simulated clock 4; during the simulation 16 instructions are executed during pulse 3; and 32 instructions are executed during the simulated clock 1,5. On the other hand, the second core 104B has a heartbeat signal 106B of 0 in clocks 3, 5, 1 in clocks 0, 4, and 2 in clocks 1, 2, and no heartbeat signal in any of the clocks. 3. Therefore, the rate controller 204 controls the second core software function module example 206B via the command 218B to execute 0 instructions during the simulated clocks 3, 5; during the simulated clocks 0, 4, 8 instructions are executed; During the simulated clock 1, 2 instructions are executed; there is no clock to execute 32 instructions.
An advantage of the fifteenth and sixteenth embodiments is that under the operation of the fourteenth embodiment, the interval that can be provided is better than that of the eighth to eleventh embodiment. In addition, the heartbeat signal 106 in the embodiment of the present invention can also be used in the example with finer spacing, and the error mode in this case does not need to cause each core 104 to regenerate an error. For example, assume that there are eight heartbeat signals 106 on the sideband bus, and the multicore processor includes four cores 104, but only two cores 104 must be operated to regenerate the error. In this case, the heartbeat generator 103 is programmed to use only four of the eight heartbeat signals 106 bits to indicate the number of instructions completed by each core 104 (0, 2, 4, 6, 8, 10, 12, respectively). , 14, 16, 18, 20, 22, 24, 26, 28, or 32).
Although in the above embodiment, each core 104 executes a single thread, it can also be designed to execute multiple threads at the same time, and the heartbeat information indicates the instructions completed by the thread.
Moreover, although in the above embodiment, the two cores 104 have the same core clock cycle, but may have different core clock cycles, the heartbeat signal information 122 indicates the two core rates, and the rate controller 204 is generating the command 218. It can be taken into consideration.

另一實施例是使用Verilog模擬器來模擬實際的處理器,其致能除錯器,以在任何時間存取處理器的任何通信連結(net),包括用來標示每核心執行指令及存取記憶體的次數之訊號。這些訊號可致能除錯器來提供資訊給軟體功能模組,如此便如同實際的處理器(或至少如石繼處理器的Verilog模擬),可同時執行指令及存取記憶體。然而,這樣會帶來三個缺點:第一,取決於模擬中的時脈週期/指令的數量,Verilog模擬器會耗費非常大量的系統資源及時間,因而使Verilog模擬變成是不太可能實施的方式,而且無法解決某些類型的錯誤。第二,Verilog模擬器模擬出的運作行為,一定與實際的處理器之運作行為不同。第三,Verilog模擬器模擬出的解決方法,需要把處理器設計成具有理想每時脈一狀態重現功能(perfect state-per-clock replay)的能力,這是很難實作出的。一般來說,具有理想每時脈一狀態重現功能之微處理器可隨一輸入狀態(input state)被載入,該輸入狀態定義整個處理器的狀態,意即,沒有任何處理器的狀態不能藉由載入輸入狀態來獲得初始化。而本發明所提的實施例並不會有上述Verilog模擬器的缺點。Another embodiment is to use a Verilog simulator to emulate an actual processor that enables a debugger to access any communication link (net) of the processor at any time, including to indicate execution instructions and access per core. The number of times the memory is signaled. These signals enable the debugger to provide information to the software function module, so that the actual processor (or at least the Verilog analog of the stone processor) can execute instructions and access memory simultaneously. However, this has three drawbacks: First, depending on the number of clock cycles/instructions in the simulation, the Verilog simulator consumes a very large amount of system resources and time, making Verilog simulations less likely to be implemented. The way, and can't solve some types of errors. Second, the operational behavior of the Verilog simulator must be different from the actual processor behavior. Third, the Verilog simulator's solution requires the processor to be designed to have the ideal per-state-per-clock replay capability, which is difficult to achieve. In general, a microprocessor with an ideal per-cycle one-state reproduction function can be loaded with an input state that defines the state of the entire processor, meaning that there is no state of any processor. Initialization cannot be obtained by loading the input state. However, the embodiment of the present invention does not have the disadvantages of the above-described Verilog simulator.

儘管本發明描述各種實施例,但不以揭露者為限,任何熟習電腦相關技術領域者,皆可依據需求修改本發明所揭露之實施例,然所有不脫離本發明精神之變更仍應包含在後續的專利範圍中。例如,軟體可實作功能、架構、模組、模擬且/或上述各裝置、方法。藉由使用一般程式語言(如 C, C++)、硬體描述語言(hardware description languages,HDL),包括Verilog硬體描述語言等,或其他可用之程式,來實作本發明所述之軟體。這樣的軟體可儲存於任何電腦可用之儲存媒體,如磁帶(magnetic tape)、半導體、磁碟(magnetic disk)、或光碟(optical disc)(例如CD-ROM, DVD-ROM等)、網路、有線/無線或其他通訊媒體。本發明所提之裝置及方法實施例可包含於一半導體智能核心(semiconductor intellectual property core),如一為處理器核心(例如內嵌於硬體描述語言中),且可轉成硬體形式,以生產於積體電路上。此外,本發明所述之裝置及方法亦可是硬體和軟體的組合,不以揭露者為限。While the present invention has been described in terms of various embodiments, the embodiments disclosed herein may be modified as required by those skilled in the art. In the scope of subsequent patents. For example, the software can be implemented as a function, architecture, module, simulation, and/or various devices and methods described above. The software of the present invention is implemented by using a general programming language (such as C, C++), hardware description languages (HDL), including a Verilog hardware description language, etc., or other available programs. Such software can be stored in any computer-usable storage medium, such as magnetic tape, semiconductor, magnetic disk, or optical disc (such as CD-ROM, DVD-ROM, etc.), network, Wired/wireless or other communication media. The device and method embodiments of the present invention may be included in a semiconductor intellectual property core, such as a processor core (for example, embedded in a hardware description language), and may be converted into a hardware form to Produced on integrated circuits. In addition, the device and method of the present invention may also be a combination of a hardware and a soft body, and are not limited to those disclosed.

以上所述僅為本發明之較佳實施例而已,並非用以限定本發明之申請專利範圍;凡其它未脫離發明所揭示之精神下所完成之等效改變或修飾,諸如將本發明應用在一般用途電腦的處理器裝置等,均應包含在下述之申請專利範圍內。The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the claims of the present invention; any other equivalent changes or modifications made without departing from the spirit of the invention, such as the application of the present invention The processor device of a general-purpose computer, etc., should be included in the scope of the following patent application.

100‧‧‧計算機系統
102‧‧‧雙核心處理器
103‧‧‧心跳產生器
104A‧‧‧第一核心
104B‧‧‧第二核心
105, 105A, 105B‧‧‧指令執行指示
106, 106A, 106B‧‧‧心跳訊號
108‧‧‧邏輯分析器
112‧‧‧記憶體
114‧‧‧處理器匯流排
116‧‧‧資料夾
118‧‧‧處理器匯流排傳輸資訊
122‧‧‧心跳訊號資訊
124‧‧‧軟體功能模組模擬環境
202‧‧‧模擬初始狀態產生器
204‧‧‧速率控制器
206A‧‧‧第一核心之軟體功能模組範例
206B‧‧‧第二核心之軟體功能模組範例
208‧‧‧實際結果產生器
226‧‧‧比較單元
212‧‧‧模擬初始記憶體映像
214A‧‧‧第一核心之模擬初始狀態
214B‧‧‧第二核心之模擬初始狀態
222‧‧‧實際結果記憶體映像
224A‧‧‧第一核心之實際結果狀態
224B‧‧‧第二核心之實際結果狀態
232‧‧‧模擬結果記憶體映像
234A‧‧‧第一核心之模擬結果狀態
234B‧‧‧第二核心之模擬結果狀態
218A,218B‧‧‧命令
228‧‧‧取否指示器
1212‧‧‧除錯記憶體陣列
1214‧‧‧L2快取記憶體
1216‧‧‧匯流排介面單元
S302-S312‧‧‧步驟
S1402-S1418‧‧‧步驟
100‧‧‧Computer system
102‧‧‧Dual core processor
103‧‧‧Heartbeat Generator
104A‧‧‧First Core
104B‧‧‧Second core
105, 105A, 105B‧‧‧ instruction execution instructions
106, 106A, 106B‧‧ ‧ heartbeat signal
108‧‧‧Logical Analyzer
112‧‧‧ memory
114‧‧‧Processor bus
116‧‧‧ Folders
118‧‧‧Processor Bus Transfer Information
122‧‧‧ heartbeat information
124‧‧‧Software function module simulation environment
202‧‧‧Simulated initial state generator
204‧‧‧ rate controller
206A‧‧‧First core software function module example
206B‧‧‧Example of the second core software function module
208‧‧‧ Actual result generator
226‧‧‧Comparative unit
212‧‧‧ Simulated initial memory image
214A‧‧‧First core simulation initial state
214B‧‧‧Second core simulation initial state
222‧‧‧ Actual result memory image
224A‧‧‧The actual result status of the first core
224B‧‧‧The actual result status of the second core
232‧‧‧simulation result memory image
234A‧‧‧First core simulation result status
234B‧‧‧Second core simulation result status
218A, 218B‧‧‧ Order
228‧‧‧No indicator
1212‧‧‧Debug Memory Array
1214‧‧‧L2 cache memory
1216‧‧‧ bus interface unit
S302-S312‧‧‧Steps
S1402-S1418‧‧‧Steps

第一圖係為本發明所揭示之具有雙核心處理器的計算機系統之功能方塊圖。
第二圖係為本發明所揭示之軟體功能模組模擬環境之功能方塊圖。
第三圖係為本發明所揭示之第二圖之操作模擬環境之方法流程圖。
第四圖係為本發明所揭示之雙核心處理器之一具體實施例之功能方塊圖。
第五圖係為本發明所揭示之依據第四圖實施例之速率控制器之操作例示表。
第六圖係為本發明所揭示之雙核心處理器之另一具體實施例之功能方塊圖。
第七圖係為本發明所揭示之依據第六圖實施例之速率控制器之操作例示表。
第八圖係為本發明所揭示之雙核心處理器之又一具體實施例之功能方塊圖。
第九圖係為本發明所揭示之依據第八圖實施例之速率控制器之操作例示表。
第十圖係為本發明所揭示之雙核心處理器之再一具體實施例之功能方塊圖。
第十一圖係為本發明所揭示之依據第十圖實施例之速率控制器之操作例示表。
第十二圖係為本發明所揭示之雙核心處理器之更一具體實施例之功能方塊圖。
第十三圖係為本發明所揭示之依據第十二圖實施例之速率控制器之操作例示表。
第十四圖係為本發明所揭示之第二圖之操作模擬環境之方法流程圖。
第十五圖係為本發明所揭示之雙核心處理器之更一具體實施例之功能方塊圖。
第十六圖係為本發明所揭示之依據第十五圖實施例之速率控制器之操作例示表。
The first figure is a functional block diagram of a computer system having a dual core processor disclosed herein.
The second figure is a functional block diagram of the software functional module simulation environment disclosed by the present invention.
The third figure is a flow chart of the method for operating the simulation environment of the second figure disclosed in the present invention.
The fourth figure is a functional block diagram of one embodiment of a dual core processor disclosed in the present invention.
The fifth figure is an operation example of the rate controller according to the fourth embodiment of the present invention.
Figure 6 is a functional block diagram of another embodiment of a dual core processor disclosed herein.
The seventh figure is an operation example of the rate controller according to the sixth embodiment of the present invention.
The eighth figure is a functional block diagram of still another embodiment of the dual core processor disclosed in the present invention.
The ninth figure is an operation example of the rate controller according to the eighth embodiment of the present invention.
The tenth figure is a functional block diagram of still another embodiment of the dual core processor disclosed in the present invention.
The eleventh figure is an operation example table of the rate controller according to the tenth embodiment of the present invention.
The twelfth figure is a functional block diagram of a further embodiment of the dual core processor disclosed in the present invention.
Figure 13 is a diagram showing the operation of the rate controller according to the embodiment of the twelfth embodiment disclosed in the present invention.
Figure 14 is a flow chart showing the method of operating the simulation environment of the second diagram disclosed in the present invention.
The fifteenth figure is a functional block diagram of a further embodiment of the dual core processor disclosed in the present invention.
Figure 16 is a diagram showing the operation of the rate controller according to the fifteenth embodiment of the present invention.

S1402-S1418‧‧‧步驟 S1402-S1418‧‧‧Steps

Claims (26)

一種微處理器的除錯方法,該微處理器具有複數個核心,包含:
使微處理器去執行指令的一實際執行(actual execution);
從該微處理器獲得一心跳資訊,其指出該些核心彼此間執行指令的一實際執行順序(actual execution sequence);
命令一軟體功能模組的複數個相關範例根據該實際執行順序來執行指令,以產生執行指令的一模擬結果;及
比較該模擬結果與執行指令的一實際結果,以判斷兩者是否符合。

A microprocessor debugging method, the microprocessor has a plurality of cores, including:
Causing the microprocessor to perform an actual execution of the instruction;
Obtaining a heartbeat information from the microprocessor indicating an actual execution sequence in which the cores execute instructions with each other;
The plurality of related examples of the command-software function module execute the instructions according to the actual execution order to generate a simulation result of the execution instructions; and compare the simulation results with an actual result of the execution instructions to determine whether the two are in conformity.

如申請專利範圍第1項所述之方法,其中該實際結果以及該模擬結果包含該些核心在執行完後的一狀態。

The method of claim 1, wherein the actual result and the simulation result include a state in which the cores are executed.

如申請專利範圍第2項所述之方法,其中該實際結果以及該模擬結果更包含執行完後的一記憶體狀態。

The method of claim 2, wherein the actual result and the simulation result further comprise a memory state after execution.

如申請專利範圍第1項所述之方法,更包含:
若該模擬結果與該實際結果不符合,則標示一錯誤。

For example, the method described in claim 1 further includes:
If the simulation result does not match the actual result, an error is indicated.

如申請專利範圍第1項所述之方法,其中在對同一個記憶體位址進行一記憶體寫入指令以及一記憶體讀取指令之間執行的指令數量,小於該心跳資訊的間隔(granularity),則存在複數個可能影響記憶體存取的可能執行順序,其中命令該軟體功能模組的複數個相關範例執行指令之步驟中,係根據該些可能執行順序來命令該軟體功能模組的該些相關範例執行指令,直到該模擬結果與該實際結果符合為止。

The method of claim 1, wherein the number of instructions executed between a memory write command and a memory read command on the same memory address is less than a granularity of the heartbeat information. a plurality of possible execution sequences that may affect the memory access, wherein the step of instructing the plurality of related examples of the software function module to execute the instruction is to command the software function module according to the possible execution orders Some related examples execute instructions until the simulation results match the actual results.

如申請專利範圍第5項所述之方法,更包含:
若所有該些執行順序的該模擬結果都與該實際結果不符合,則標示一錯誤。

For example, the method described in claim 5 of the patent scope further includes:
If all of the simulation results of the execution sequences do not match the actual results, an error is indicated.

如申請專利範圍第1項所述之方法,其中該心跳資訊包含複數個記錄,其為在指令的該實際執行期間之複數個心跳,其中每一該記錄指出該些核心實際執行的指令數量,而命令該軟體功能模組的複數個相關範例執行指令之步驟中,包含:
對每筆之該些記錄,命令該軟體功能模組的每一該相關範例去實際執行該些記錄中記錄的指令數量。

The method of claim 1, wherein the heartbeat information comprises a plurality of records, which are a plurality of heartbeats during the actual execution of the instructions, wherein each of the records indicates the number of instructions actually executed by the cores, The steps of executing the instruction of the plurality of related examples of the software function module include:
For each of these records, each of the relevant examples of the software function module is commanded to actually execute the number of instructions recorded in the records.

如申請專利範圍第1項所述之方法,其中從該微處理器獲得該心跳資訊之步驟中,包含擷取在一外部匯流排上實際執行指令期間,該微處理器產生的心跳訊號。

The method of claim 1, wherein the step of obtaining the heartbeat information from the microprocessor comprises extracting a heartbeat signal generated by the microprocessor during an actual execution of an instruction on an external bus.

一種微處理器,包含:
複數個核心,每一該些核心輸出一指令執行指示(instruction execution indicator),用來指出該些核心在每一時脈週期中所執行的指令數目;及
一心跳產生器(heartbeat generator),其從每一該些核心接收該指令執行指示,並對每一個在一外部匯流排上的該些核心產生一心跳指示(heartbeat indicator),其中該心跳指示指出了每一該些核心在該外部匯流排的每個時脈中,所執行的指令數量。

A microprocessor comprising:
a plurality of cores, each of the core outputs an instruction execution indicator for indicating the number of instructions executed by the core in each clock cycle; and a heartbeat generator Each of the cores receives the instruction execution instruction and generates a heartbeat indicator for each of the cores on an external bus, wherein the heartbeat indication indicates that each of the cores is in the external bus The number of instructions executed in each clock.

如申請專利範圍第9項所述之微處理器,其中該些核心的時脈週期速率與該外部匯流排的時脈週期速率相同。

The microprocessor of claim 9, wherein the core clock cycle rate is the same as the clock cycle rate of the external bus.

如申請專利範圍第9項所述之微處理器,其中該些核心的時脈週期速率大於該外部匯流排的時脈週期速率,其中每一該心跳指示指出的指令執行數量比在每一該指令執行指示指出的可完成指令的最大數量還大。

The microprocessor of claim 9, wherein the core clock cycle rate is greater than a clock cycle rate of the external bus, wherein each of the heartbeat indications indicates an instruction execution number ratio The maximum number of achievable instructions indicated by the instruction execution indication is still large.

如申請專利範圍第11項所述之微處理器,其中該些核心的時脈週期速率與該外部匯流排的時脈週期速率之比率為J,每一該指令執行指示指出的可完成指令的最大數量為K,每一該心跳指示指出的指令執行數量為L,其中L大於或等於J和K之積。

The microprocessor of claim 11, wherein a ratio of a clock cycle rate of the cores to a clock cycle rate of the external bus is J, each of the instructions executing the instructions indicating the completion of the instructions The maximum number is K, and each of the heartbeat indications indicates that the number of instructions executed is L, where L is greater than or equal to the product of J and K.

如申請專利範圍第11項所述之微處理器,其中每一該心跳指示包含一單一位元。

The microprocessor of claim 11, wherein each of the heartbeat indications comprises a single bit.

如申請專利範圍第9項所述之微處理器,其中每一該些核心包含一計數器,用來計數每時脈週期中已執行的指令數量,其中該指令執行指示係為該計數器之計數值的一輸出位元。

The microprocessor of claim 9, wherein each of the cores includes a counter for counting the number of instructions executed in each clock cycle, wherein the instruction execution indication is the counter value of the counter. An output bit.

如申請專利範圍第14項所述之微處理器,其中該計數器之計數值的該輸出位元是位元M,且M = log2N,其中N是該核心的時脈與該外部匯流排的時脈之比率以及該核心於每時脈週期中可完成的最大數量指令之積。

The microprocessor of claim 14, wherein the output bit of the count value of the counter is a bit M, and M = log2N, where N is the clock of the core and the time of the external bus The ratio of the pulse and the product of the maximum number of instructions that the core can complete per clock cycle.

如申請專利範圍第9項所述之微處理器,其中該心跳產生器包含一與每一該些核心相關的計數器,用來計數每時脈週期中已執行的指令數量,其中該指令執行指示係為該計數器之計數值的一輸出位元。

The microprocessor of claim 9, wherein the heartbeat generator includes a counter associated with each of the cores for counting the number of instructions executed in each clock cycle, wherein the instruction execution indication Is an output bit of the counter value of the counter.

如申請專利範圍第16項所述之微處理器,其中該計數器之計數值的該輸出位元是位元M,且M = log2N,其中N是該核心的時脈與該外部匯流排的時脈之比率以及該核心於每時脈週期中可完成的最大數量指令之積。

The microprocessor of claim 16, wherein the output bit of the count value of the counter is a bit M, and M = log2N, where N is the clock of the core and the time of the external bus The ratio of the pulse and the product of the maximum number of instructions that the core can complete per clock cycle.

如申請專利範圍第17項所述之微處理器,其中每一該指令執行指示係為一位元,且每一該心跳指示包含一單一位元。

The microprocessor of claim 17, wherein each of the instruction execution instructions is a one-bit element, and each of the heartbeat indications comprises a single bit.

如申請專利範圍第9項所述之微處理器,其中該外部匯流排包含一側波帶匯流排(sideband bus),其耦接於該微處理器,該側波帶匯流排不同於耦接於微處理器的主處理器匯流排(main processor bus)。

The microprocessor of claim 9, wherein the external bus bar comprises a sideband bus coupled to the microprocessor, the sideband busbar being different from the coupling The main processor bus of the microprocessor.

如申請專利範圍第19項所述之微處理器,其中至少有部份的該側波帶匯流排係為一JTAG匯流排(JTAG bus)。

The microprocessor of claim 19, wherein at least a portion of the sideband busbars are a JTAG bus.

如申請專利範圍第19項所述之微處理器,其中該側波帶匯流排係為一服務處理器匯流排(service processor bus),其耦接於該微處理器內部的一服務處理器。

The microprocessor of claim 19, wherein the sideband bus is a service processor bus coupled to a service processor internal to the microprocessor.

一種微處理器,包含:
複數個核心,每一個核心會產生一指令執行指示(instruction execution indicator),用來指示各核心在每一時脈期間,所執行的指令數目;
一儲存陣列(memory array),其儲存在一段時脈期間中,由該些核心所產生的指令執行指示;及
一匯流排介面單元(bus interface unit),其耦接於該微處理器的一外部匯流排,其中該匯流排介面單元用來將儲存於儲存陣列中的該指令執行指示寫入至該微處理器的一外部記憶體中。

A microprocessor comprising:
a plurality of cores, each core generating an instruction execution indicator for indicating the number of instructions executed by each core during each clock;
a memory array stored in a clock period, an instruction execution instruction generated by the cores; and a bus interface unit coupled to the microprocessor An external bus, wherein the bus interface unit is configured to write the instruction execution instruction stored in the storage array to an external memory of the microprocessor.

如申請專利範圍第22項所述之微處理器,其中該匯流排介面單元以相較於其他處理該外部匯流排上的傳輸之最低優先權來將該指令執行指示寫入至該外部記憶體中。

The microprocessor of claim 22, wherein the bus interface unit writes the instruction execution instruction to the external memory in accordance with a lowest priority of other transmissions on the external bus. in.

如申請專利範圍第19項所述之微處理器,更包含:
一心跳產生器,耦接於該儲存陣列以及該匯流排介面單元,用來從每一該些核心接收該指令執行指示,其中該心跳產生器係在該段時脈期間中寫入該指令執行指示至儲存陣列,並從該儲存陣列讀出該指令執行指示,使該匯流排介面單元將其寫至該外部記憶體。

The microprocessor described in claim 19, further comprising:
a heartbeat generator coupled to the storage array and the bus interface unit for receiving the instruction execution indication from each of the cores, wherein the heartbeat generator writes the instruction execution during the segment clock period Instructing to the storage array and reading the instruction execution indication from the storage array to cause the bus interface unit to write to the external memory.

如申請專利範圍第24項所述之微處理器,其中該心跳產生器等候從該儲存陣列中讀出該指令執行指示,並在該段時脈期間結束後以使該匯流排介面單元將該指令執行指示寫至該外部記憶體。

The microprocessor of claim 24, wherein the heartbeat generator waits to read the instruction execution indication from the storage array, and after the segment clock period ends, the bus interface unit will The instruction execution instruction is written to the external memory.

如申請專利範圍第24項所述之微處理器,其中該心跳產生器週期性地從該儲存陣列中讀出該指令執行指示,以使該匯流排介面單元將其寫至該外部記憶體。The microprocessor of claim 24, wherein the heartbeat generator periodically reads the instruction execution indication from the storage array to cause the bus interface unit to write to the external memory.
TW100106953A 2010-03-16 2011-03-02 Microprocessor and debugging method thereof TWI470421B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US31425310P 2010-03-16 2010-03-16
US12/964,949 US8762779B2 (en) 2010-01-22 2010-12-10 Multi-core processor with external instruction execution rate heartbeat

Publications (2)

Publication Number Publication Date
TW201133232A TW201133232A (en) 2011-10-01
TWI470421B true TWI470421B (en) 2015-01-21

Family

ID=46751117

Family Applications (1)

Application Number Title Priority Date Filing Date
TW100106953A TWI470421B (en) 2010-03-16 2011-03-02 Microprocessor and debugging method thereof

Country Status (1)

Country Link
TW (1) TWI470421B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200508855A (en) * 2003-02-14 2005-03-01 Advantest Corp Method and structure to develop a test program for semiconductor integrated circuits
CN101084488A (en) * 2004-09-14 2007-12-05 科威尔公司 Debug in a multicore architecture
US20080177527A1 (en) * 2007-01-17 2008-07-24 Nec Electronics Corporation Simulation system, simulation method and simulation program
US20100008464A1 (en) * 2008-07-11 2010-01-14 Infineon Technologies Ag System profiling

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200508855A (en) * 2003-02-14 2005-03-01 Advantest Corp Method and structure to develop a test program for semiconductor integrated circuits
CN101084488A (en) * 2004-09-14 2007-12-05 科威尔公司 Debug in a multicore architecture
US20080177527A1 (en) * 2007-01-17 2008-07-24 Nec Electronics Corporation Simulation system, simulation method and simulation program
US20100008464A1 (en) * 2008-07-11 2010-01-14 Infineon Technologies Ag System profiling

Also Published As

Publication number Publication date
TW201133232A (en) 2011-10-01

Similar Documents

Publication Publication Date Title
JP3175757B2 (en) Debug system
US8725485B2 (en) Simulation method and simulation apparatus
CN102360329B (en) Bus monitoring and debugging control device and methods for monitoring and debugging bus
US8762779B2 (en) Multi-core processor with external instruction execution rate heartbeat
Rosenfeld et al. DRAMSim2: A cycle accurate memory system simulator
JP5492280B2 (en) Debug in multi-core architecture
JP4564110B2 (en) Computer-implemented method and signal processor simulator for simulating the operation of dual processor circuits
WO2020207040A1 (en) On-chip debugging device and method
TWI437424B (en) Microprocessor with system-robust self-reset capability and self-resetting method thereof
CN102929686A (en) Functional verification method of on-chip multi-core processor
US20180276052A1 (en) Deadlock detector, system including the same and associated method
US20060161818A1 (en) On-chip hardware debug support units utilizing multiple asynchronous clocks
US8036874B2 (en) Software executing device and co-operation method
JP2008140405A (en) Co-validation method between electronic circuit and control program
US10664637B2 (en) Testbench restoration based on capture and replay
JP2008282308A (en) Cooperation verification device, cooperation verification method, and cooperation verification program
US10970442B1 (en) Method of debugging hardware and firmware of data storage
CN202267954U (en) Bus monitoring and debugging control device
CN102096607B (en) Microprocessor and debugging method thereof
US7231568B2 (en) System debugging device and system debugging method
TWI470421B (en) Microprocessor and debugging method thereof
US7992049B2 (en) Monitoring of memory and external events
TW201310241A (en) A full bus transaction level modeling approach for fast and accurate contention analysis
JP2005234617A (en) Multiprocessor debugger and debugging method
JP5789832B2 (en) Integrated circuit device, verification device, and verification method