TWI426379B - System and method for detecting system error of a computer - Google Patents

System and method for detecting system error of a computer Download PDF

Info

Publication number
TWI426379B
TWI426379B TW99146730A TW99146730A TWI426379B TW I426379 B TWI426379 B TW I426379B TW 99146730 A TW99146730 A TW 99146730A TW 99146730 A TW99146730 A TW 99146730A TW I426379 B TWI426379 B TW I426379B
Authority
TW
Taiwan
Prior art keywords
computer system
cpu
error
level signal
gpio
Prior art date
Application number
TW99146730A
Other languages
Chinese (zh)
Other versions
TW201227265A (en
Inventor
yu-gang Zhang
Original Assignee
Hon Hai Prec Ind Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hon Hai Prec Ind Co Ltd filed Critical Hon Hai Prec Ind Co Ltd
Priority to TW99146730A priority Critical patent/TWI426379B/en
Publication of TW201227265A publication Critical patent/TW201227265A/en
Application granted granted Critical
Publication of TWI426379B publication Critical patent/TWI426379B/en

Links

Landscapes

  • Debugging And Monitoring (AREA)

Description

電腦系統錯誤偵測系統及方法 Computer system error detection system and method

本發明涉及一種電腦系統檢測系統及方法,尤其係關於一種電腦系統錯誤偵測系統及方法。 The invention relates to a computer system detection system and method, in particular to a computer system error detection system and method.

目前,於高階之電腦系統(例如伺服器)中出現了集成北橋功能之中央處理器(Central Processing Unit,CPU),將記憶體控制與連接之功能從北橋移到了CPU內部。若是於這種類型之電腦系統發生錯誤時,例如CPU內部錯誤(Inter Error,IERR)及記憶體多位元錯誤(Multi-Bit Error),CPU都會因此而停止工作。然而,這兩種類型之系統錯誤都是由CPU用相同之一根管腳向外部設備傳遞資訊,這樣會造成外部設備獲取CPU錯誤訊號之混亂,而且會降低伺服器自我監測能力,使得用戶無法詳細之辨清錯誤根源,增加了維護之難度。 At present, a central processing unit (CPU) with integrated North Bridge function appears in a high-end computer system (such as a server), and the functions of memory control and connection are moved from the north bridge to the inside of the CPU. If an error occurs in this type of computer system, such as CPU internal error (Inter Error, IERR) and memory multi-bit error (Multi-Bit Error), the CPU will stop working. However, these two types of system errors are caused by the CPU transmitting information to the external device using the same one of the pins, which will cause the external device to obtain the CPU error signal confusion, and will reduce the server self-monitoring ability, making the user unable to Detailed identification of the root cause of the error increases the difficulty of maintenance.

鑒於以上內容,有必要提供一種電腦系統錯誤偵測系統及方法,利用基板管理控制器自動偵測出電腦系統發生之錯誤是CPU之內部錯誤還是記憶體之複數元錯誤。 In view of the above, it is necessary to provide a computer system error detection system and method, and use the substrate management controller to automatically detect whether the error occurred in the computer system is an internal error of the CPU or a complex error of the memory.

所述之電腦系統錯誤偵測系統,安裝並運行於基板管理控制器中 。該電腦系統包括CPU及記憶體,該基板管理控制器藉由CPU之GPIO埠與電腦系統相連接。所述之電腦系統錯誤偵測系統包括:參數設置模組,用於設置一個中斷標誌值,及初始化該中斷標誌值為零;訊號監測模組,用於在電腦系統運行過程中即時地監測由CPU藉由GPIO埠輸出之GPIO訊號,及於一段延遲時間內檢測CPU輸出之GPIO訊號是否由高電平訊號轉化為低電平訊號;中斷服務模組,用於當所述GPIO訊號由高電平訊號轉化為低電平訊號時,啟動中斷服務程式觸發一次中斷服務將中斷標誌值加一,及於所述延遲時間後檢測中斷標誌值是否大於等於一;錯誤處理模組,用於當中斷標誌值等於一時確定電腦系統發生之系統錯誤為CPU內部錯誤,及當中斷標誌值大於一時確定電腦系統發生之系統錯誤為記憶體之複數位元錯誤。 The computer system error detection system is installed and operated in a baseboard management controller . The computer system includes a CPU and a memory, and the baseboard management controller is connected to the computer system by a GPIO of the CPU. The computer system error detection system includes: a parameter setting module, configured to set an interrupt flag value, and initialize the interrupt flag value to zero; the signal monitoring module is configured to monitor the computer system during the running process. The CPU uses the GPIO signal output by the GPIO, and detects whether the GPIO signal output by the CPU is converted into a low level signal by a high level signal within a delay time; and the interrupt service module is used when the GPIO signal is high. When the flat signal is converted into a low level signal, the startup interrupt service program triggers an interrupt service to increment the interrupt flag value by one, and detects whether the interrupt flag value is greater than or equal to one after the delay time; the error processing module is used to interrupt The flag value is equal to one time to determine that the system error occurred in the computer system is an internal CPU error, and when the value of the interrupt flag is greater than one, it is determined that the system error occurred in the computer system is a complex bit error of the memory.

所述之電腦系統錯誤偵測方法,該電腦系統包括CPU及記憶體,並藉由CPU之GPIO埠與基板管理控制器相連接。該方法包括步驟:設置一個中斷標誌值,並初始化該中斷標誌值為零;於電腦系統運行過程中即時地監測由CPU藉由GPIO埠輸出之GPIO訊號;判斷所述CPU輸出之GPIO訊號是否為高電平訊號;當所述CPU輸出之GPIO訊號為低電平訊號時,啟動中斷服務程式觸發一次中斷服務將中斷標誌值加一;於一段延遲時間內檢測CPU輸出之GPIO訊號是否由低電平訊號轉化為高電平訊號;當GPIO訊號由低電平訊號轉化為高電平訊號時,啟動中斷服務程式觸發一次中斷服務將中斷標誌值加一;於延遲時間後判斷中斷標誌值是否大於等於一;若中斷標誌值等於一,則確定電腦系統發生之系統錯誤為CPU內 部錯誤;若中斷標誌值大於一,則確定電腦系統發生之系統錯誤為記憶體之複數位元錯誤。 In the computer system error detection method, the computer system includes a CPU and a memory, and is connected to the baseboard management controller by a GPIO of the CPU. The method includes the steps of: setting an interrupt flag value, and initializing the interrupt flag value to zero; monitoring the GPIO signal output by the CPU through the GPIO埠 during the running of the computer system; determining whether the GPIO signal output by the CPU is a high level signal; when the GPIO signal output by the CPU is a low level signal, the interrupt service routine is triggered to trigger an interrupt service to increase the value of the interrupt flag by one; and the GPIO signal output by the CPU is detected to be low by a delay time. The flat signal is converted into a high level signal; when the GPIO signal is converted from a low level signal to a high level signal, the start interrupt service program triggers an interrupt service to increment the interrupt flag value; after the delay time, it is determined whether the interrupt flag value is greater than Is equal to one; if the interrupt flag value is equal to one, it is determined that the system error occurred in the computer system is within the CPU Part error; if the interrupt flag value is greater than one, it is determined that the system error occurred in the computer system is a complex bit error of the memory.

相較於習知技術,本發明所述之電腦系統錯誤偵測系統及方法,利用基板管理控制器能夠自動偵測出電腦系統發生之錯誤是CPU之內部錯誤還是記憶體之複數位元錯誤,並提示用戶電腦系統發生之錯誤類型,進而使用戶方便對電腦系統發生之錯誤進行處理。 Compared with the prior art, the computer system error detection system and method of the present invention can automatically detect whether an error occurred in the computer system is an internal error of the CPU or a complex bit error of the memory. It also prompts the user of the type of error that occurs in the computer system, which in turn allows the user to easily handle errors that occur in the computer system.

1‧‧‧基板管理控制器 1‧‧‧Baseboard Management Controller

10‧‧‧電腦系統錯誤偵測系統 10‧‧‧Computer System Error Detection System

101‧‧‧參數設置模組 101‧‧‧ parameter setting module

102‧‧‧訊號監測模組 102‧‧‧Signal Monitoring Module

103‧‧‧中斷服務模組 103‧‧‧Interrupt Service Module

104‧‧‧錯誤處理模組 104‧‧‧Error handling module

11‧‧‧微處理器 11‧‧‧Microprocessor

12‧‧‧儲存單元 12‧‧‧ storage unit

2‧‧‧電腦系統 2‧‧‧ computer system

20‧‧‧CPU 20‧‧‧CPU

21‧‧‧記憶體 21‧‧‧ memory

22‧‧‧顯示器 22‧‧‧ display

圖1係本發明電腦系統錯誤偵測系統較佳實施例之架構圖。 1 is a block diagram of a preferred embodiment of a computer system error detection system of the present invention.

圖2係本發明電腦系統錯誤偵測方法較佳實施例之流程圖。 2 is a flow chart of a preferred embodiment of the computer system error detection method of the present invention.

圖3係由電腦系統之CPU輸出之GPIO訊號之示意圖。 Figure 3 is a schematic diagram of a GPIO signal output by a CPU of a computer system.

如圖1所示,係本發明電腦系統錯誤偵測系統10較佳實施例之架構圖。於本實施例中,該電腦系統錯誤偵測系統10安裝並運行於基板管理控制器(Baseboard Management Controller,BMC)1中,能夠於電腦系統2發生系統錯誤時,例如電腦系統2之中央處理器(Central Processing Unit,CPU)20本身發生內部錯誤(Inter Error,IERR)或者電腦系統2之記憶體21發生複數位元錯誤(Multi-Bit Error),偵測出所述系統錯誤是CPU 20之內部錯誤還是記憶體21之複數位元錯誤。所述之CPU 20內部錯誤包括,但不僅限於,CPU 20處理資料時發生之資料運算錯誤或者為邏輯控制錯誤。所述之記憶體21之複數位元錯誤包括,但不僅限於,記憶體21中之資料損壞錯誤或者為讀寫資料時發生位元錯位錯 誤。 1 is an architectural diagram of a preferred embodiment of the computer system error detection system 10 of the present invention. In this embodiment, the computer system error detection system 10 is installed and runs in a Baseboard Management Controller (BMC) 1 and can be used in a system error of the computer system 2, such as a central processing unit of the computer system 2. (Central Processing Unit, CPU) 20 itself generates an internal error (Inter Error, IERR) or the memory 21 of the computer system 2 generates a multi-bit error (Multi-Bit Error), and detects that the system error is internal to the CPU 20. The error is still a complex bit error of the memory 21. The CPU 20 internal error includes, but is not limited to, a data operation error occurring when the CPU 20 processes the data or a logic control error. The plurality of bit errors of the memory 21 include, but are not limited to, data corruption errors in the memory 21 or bit error dislocations when reading or writing data. error.

所述之基板管理控制器1藉由CPU 20之通用輸入輸出埠(General Purpose Input Output,GPIO)與電腦系統2相連接。所述之電腦系統2可以為桌上型電腦、筆記本電腦、伺服器或者為工作站電腦之一種。於本實施例中,所述之基板管理控制器1包括,但不僅限於,微處理器11及儲存單元12。所述之電腦系統2包括,但不僅限於,CPU 20、記憶體21及顯示器22。 The substrate management controller 1 is connected to the computer system 2 by a general purpose input output (GPIO) of the CPU 20. The computer system 2 can be a desktop computer, a notebook computer, a server, or a workstation computer. In the embodiment, the substrate management controller 1 includes, but is not limited to, a microprocessor 11 and a storage unit 12. The computer system 2 includes, but is not limited to, a CPU 20, a memory 21, and a display 22.

所述之電腦系統錯誤偵測系統10包括參數設置模組101、訊號監測模組102、中斷服務模組103及錯誤處理模組104。本發明所稱之模組係一種能夠被微處理器11所執行並且能夠完成固定功能之一系列電腦程式段,其儲存於基板管理控制器1之儲存單元12中。 The computer system error detection system 10 includes a parameter setting module 101, a signal monitoring module 102, an interrupt service module 103, and an error processing module 104. The module referred to in the present invention is a series of computer programs that can be executed by the microprocessor 11 and can perform a fixed function, and is stored in the storage unit 12 of the substrate management controller 1.

所述之參數設置模組101用於設置一個中斷標誌值(例如標誌為tag),並初始化該中斷標誌值tag為零,即令中斷標誌值tag=0。所述之中斷標誌值tag用於標示微處理器11啟動中斷服務程式觸發中斷服務之次數,當微處理器11每啟動一次中斷服務,將中斷標誌值tag作加一運算,即作tag=tag+1運算。 The parameter setting module 101 is configured to set an interrupt flag value (for example, the flag is tag), and initialize the interrupt flag value tag to zero, that is, the interrupt flag value tag=0. The interrupt flag value tag is used to indicate the number of times the microprocessor 11 starts the interrupt service program to trigger the interrupt service. When the microprocessor 11 starts the interrupt service, the interrupt flag value tag is incremented, that is, tag=tag +1 operation.

所述之訊號監測模組102用於在電腦系統1運行過程中即時地監測由CPU 20藉由GPIO埠輸出之GPIO訊號,及判斷GPIO訊號是否為高電平訊號。於本實施例中,當電腦系統1正常運行時,CPU 20輸出之GPIO訊號始終為高電平訊號,其用二進位數字“1”表示,參考圖3(a)所示。當電腦系統1運行時CPU 20本身發生內部錯 誤,CPU 20輸出之GPIO訊號則由高電平訊號“1”變化為低電平訊號,其用二進位數字“0”表示,而後GPIO訊號一直為低電平訊號“0”,參考圖3(b)所示。當電腦系統1運行時記憶體21發生複數位元錯誤,則CPU 20輸出之GPIO訊號於高低電平訊號之間上下波動,亦即由高電平訊號“1”變化為低電平訊號“0”,再由低電平訊號“0”變化為高電平訊號“1”,而後一直為低電平訊號“0”。參考圖3(c)所示。當記憶體21發生複數位元錯誤,GPIO訊號一般經過18個CPU時鐘週期之高低電平訊號上下波動後,而後一直為低電平訊號“0”。 The signal monitoring module 102 is configured to monitor the GPIO signal output by the CPU 20 through the GPIO 即时 during the operation of the computer system 1 and determine whether the GPIO signal is a high level signal. In the present embodiment, when the computer system 1 is operating normally, the GPIO signal output by the CPU 20 is always a high level signal, which is represented by a binary digit "1", as shown in FIG. 3(a). The CPU 20 itself has an internal error when the computer system 1 is running. In error, the GPIO signal output by the CPU 20 is changed from a high level signal "1" to a low level signal, which is represented by a binary digit "0", and then the GPIO signal is always a low level signal "0", referring to FIG. (b) is shown. When the memory 21 of the computer system 1 is in a complex bit error, the GPIO signal output by the CPU 20 fluctuates between the high and low level signals, that is, the high level signal "1" changes to the low level signal "0. ", then change from the low level signal "0" to the high level signal "1", and then the low level signal "0". Refer to Figure 3(c). When a complex bit error occurs in the memory 21, the GPIO signal generally fluctuates up and down after 18 CPU clock cycles, and then remains a low level signal "0".

所述之訊號監測模組102還用於在一段延遲時間內判斷CPU 20輸出之GPIO訊號是否由高電平訊號轉化為低電平訊號。於本實施例中,所述之延遲時間之長短取決於CPU 20之工作頻率,假如CPU 20之工作頻率為1000Hz,則所述之延遲時間為3秒鐘。 The signal monitoring module 102 is further configured to determine whether the GPIO signal output by the CPU 20 is converted from a high level signal to a low level signal within a delay time. In this embodiment, the delay time depends on the operating frequency of the CPU 20. If the operating frequency of the CPU 20 is 1000 Hz, the delay time is 3 seconds.

所述之中斷服務模組103用於當所述之延遲時間內GPIO訊號由高電平訊號“1”轉化為低電平訊號“0”時,啟動中斷服務程式觸發一次中斷服務將中斷標誌值tag加一,即作tag=tag+1運算。該中斷服務模組103還用於於所述之延遲時間後檢測中斷標誌值tag並判斷該中斷標誌值tag是否等於一。 The interrupt service module 103 is configured to start the interrupt service program to trigger an interrupt service to interrupt the flag value when the GPIO signal is converted from the high level signal "1" to the low level signal "0" during the delay time. Tag plus one, that is, tag=tag+1 operation. The interrupt service module 103 is further configured to detect the interrupt flag value tag after the delay time and determine whether the interrupt flag value tag is equal to one.

所述之錯誤處理模組104用於當中斷標誌值tag等於一時確定電腦系統1發生之系統錯誤為CPU 20之內部錯誤,當中斷標誌值tag大於一時確定電腦1發生之系統錯誤為記憶體21之複數位元錯誤,例如記憶體21發生2,4,8,…2n位元錯誤,其中n為大於1之自然數。所述之錯誤處理模組104還用於判斷電腦系統1是否發生死 機。當訊號監測模組102沒有監測到CPU 20輸出之GPIO訊號時,即說明電腦系統1發生死機,錯誤處理模組104將中斷標誌值tag清零,並關閉或重起電腦系統1對發生之系統錯誤進行處理。 The error processing module 104 is configured to determine that the system error occurred in the computer system 1 is an internal error of the CPU 20 when the interrupt flag value tag is equal to one, and determine that the system error occurred in the computer 1 is the memory 21 when the interrupt flag value tag is greater than one. The complex bit error, such as memory 21, occurs with 2, 4, 8, ... 2 n bit errors, where n is a natural number greater than one. The error processing module 104 is further configured to determine whether the computer system 1 has crashed. When the signal monitoring module 102 does not detect the GPIO signal output by the CPU 20, it indicates that the computer system 1 is dead, the error processing module 104 clears the interrupt flag value tag, and closes or restarts the computer system 1 to the system that occurred. Error handling.

於本實施例中,當電腦系統1發生系統錯誤,錯誤處理模組104將系統錯誤(例如CPU 20之內部錯誤或者記憶體21之複數位元錯誤)顯示於電腦1之顯示器22中,以提示用戶電腦系統1發生之系統錯誤之類型,進而使用戶方便對電腦系統1發生之錯誤進行處理。 In this embodiment, when a system error occurs in the computer system 1, the error processing module 104 displays a system error (for example, an internal error of the CPU 20 or a complex bit error of the memory 21) in the display 22 of the computer 1 to prompt The type of system error that occurs in the user computer system 1 allows the user to conveniently handle errors occurring in the computer system 1.

如圖2所示,係本發明電腦系統錯誤偵測系統10方法較佳實施例之流程圖。於本實施例中,該方法能夠於電腦系統2發生系統錯誤時,利用基板管理控制器1偵測出該系統錯誤是電腦系統2之CPU 20之內部錯誤還是記憶體21之複數位元錯誤。 2 is a flow chart of a preferred embodiment of the computer system error detection system 10 of the present invention. In this embodiment, the method can detect whether the system error is an internal error of the CPU 20 of the computer system 2 or a complex bit error of the memory 21 when the system error occurs in the computer system 2.

步驟S20,參數設置模組101設置一個中斷標誌值(例如標誌為tag),並初始化該中斷標誌值tag為零,即令中斷標誌值tag=0。 In step S20, the parameter setting module 101 sets an interrupt flag value (for example, the flag is tag), and initializes the interrupt flag value tag to zero, that is, the interrupt flag value tag=0.

步驟S21,訊號監測模組102於電腦系統1運行過程中即時地檢測由CPU 20藉由GPIO埠輸出之GPIO訊號。當電腦系統1正常運行時,則所述之GPIO訊號始終為高電平訊號“1”,參考圖3(a)所示。當電腦系統1運行時CPU 20本身發生內部錯誤,所述之GPIO訊號則由高電平訊號“1”變化為低電平訊號“0”,而後GPIO訊號一直為低電平訊號“0”,參考圖3(b)所示。當電腦系統1運行時記憶體21發生複數位元錯誤,所述之GPIO訊號則由高電平訊 號“1”變化為低電平訊號“0”,再由低電平訊號“0”變化為高電平訊號“1”。參考圖3(c)所示,當記憶體21發生複數位元錯誤時,GPIO訊號一般經過18個CPU時鐘週期之高低電平訊號上下波動後,而後一直為低電平訊號“0”。 In step S21, the signal monitoring module 102 detects the GPIO signal output by the CPU 20 through the GPIO埠 during the running of the computer system 1. When the computer system 1 is operating normally, the GPIO signal is always a high level signal "1", as shown in FIG. 3(a). When the computer system 1 is running, the CPU 20 itself generates an internal error, and the GPIO signal changes from a high level signal "1" to a low level signal "0", and then the GPIO signal is always a low level signal "0". Refer to Figure 3(b). When the computer system 1 is running, the memory 21 has a complex bit error, and the GPIO signal is high level. The number "1" changes to a low level signal "0", and then changes from a low level signal "0" to a high level signal "1". Referring to FIG. 3(c), when a complex bit error occurs in the memory 21, the GPIO signal generally fluctuates up and down after 18 CPU clock cycles, and then remains a low level signal "0".

步驟S22,訊號監測模組102判斷CPU 20輸出之GPIO訊號是否為高電平訊號“1”。若所述之GPIO訊號為高電平訊號“1”,則返回步驟S21;若所述之GPIO訊號為低電平訊號“0”,則執行步驟S23。 In step S22, the signal monitoring module 102 determines whether the GPIO signal output by the CPU 20 is a high level signal "1". If the GPIO signal is a high level signal "1", then return to step S21; if the GPIO signal is a low level signal "0", then step S23 is performed.

步驟S23,中斷服務模組103啟動中斷服務程式觸發一次中斷服務將中斷標誌值tag加一,即作tag=tag+1運算。 In step S23, the interrupt service module 103 starts the interrupt service program to trigger an interrupt service to increment the interrupt flag value tag, that is, the tag=tag+1 operation.

步驟S24,訊號監測模組102於一段延遲時間內判斷所述GPIO訊號是否由低電平訊號“0”轉化為高電平訊號“1”。於本實施例中,所述之延遲時間之長短取決於CPU 20之工作頻率,假如CPU 20之工作頻率為1000Hz,則所述之延遲時間為3秒鐘。 In step S24, the signal monitoring module 102 determines whether the GPIO signal is converted from the low level signal “0” to the high level signal “1” within a delay time. In this embodiment, the delay time depends on the operating frequency of the CPU 20. If the operating frequency of the CPU 20 is 1000 Hz, the delay time is 3 seconds.

若於所述延遲時間內GPIO訊號由低電平訊號“0”轉化為高電平訊號“1”,則返回步驟S23。若於所述延遲時間內GPIO訊號沒有從低電平訊號“0”轉化為高電平訊號“1”,步驟S25,中斷服務模組103於所述延遲時間後判斷中斷標誌值tag是否等於一。 If the GPIO signal is converted from the low level signal "0" to the high level signal "1" during the delay time, the process returns to step S23. If the GPIO signal does not transition from the low level signal "0" to the high level signal "1" during the delay time, in step S25, the interrupt service module 103 determines whether the interrupt flag value tag is equal to one after the delay time. .

若中斷標誌值tag等於一,步驟S26,錯誤處理模組104則確定電腦系統1發生之系統錯誤為CPU 20之內部錯誤;若中斷標誌值tag大於一,步驟S27,錯誤處理模組104則確定電腦1發生之系統錯誤為記憶體21之複數位元錯誤。 If the interrupt flag value tag is equal to one, in step S26, the error processing module 104 determines that the system error occurred in the computer system 1 is an internal error of the CPU 20; if the interrupt flag value tag is greater than one, in step S27, the error processing module 104 determines The system error that occurred in computer 1 is the complex bit error of memory 21.

步驟S28,錯誤處理模組104判斷電腦系統1是否發生死機。於本實施例中,當訊號監測模組102沒有監測到CPU 20輸出之GPIO訊號時,錯誤處理模組104判斷電腦系統1發生死機。若電腦系統1沒有發生死機,則流程返回步驟S21,訊號監測模組102則繼續監測CPU 20輸出之GPIO訊號。若電腦系統1發生死機,步驟S29,錯誤處理模組104則將中斷標誌值tag清零,並關閉或重起電腦系統1對發生之系統錯誤進行處理。 In step S28, the error processing module 104 determines whether the computer system 1 has crashed. In this embodiment, when the signal monitoring module 102 does not monitor the GPIO signal output by the CPU 20, the error processing module 104 determines that the computer system 1 has crashed. If the computer system 1 does not crash, the flow returns to step S21, and the signal monitoring module 102 continues to monitor the GPIO signal output by the CPU 20. If the computer system 1 crashes, in step S29, the error processing module 104 clears the interrupt flag value tag, and closes or restarts the computer system 1 to process the system error that occurred.

當電腦系統1發生系統錯誤,所述之錯誤處理模組104將系統錯誤(例如CPU 20之內部錯誤或者記憶體21之複數位元錯誤)顯示於電腦1之顯示器22中,以提示用戶電腦系統1發生之系統錯誤之類型,進而使用戶方便對電腦系統1發生之錯誤進行處理。 When a computer system 1 has a system error, the error processing module 104 displays a system error (such as an internal error of the CPU 20 or a complex bit error of the memory 21) in the display 22 of the computer 1 to prompt the user of the computer system. 1 The type of system error that occurs, which in turn allows the user to easily handle errors that occur in computer system 1.

以上所述僅為本發明之較佳實施例而已,且已達廣泛之使用功效,凡其他未脫離本發明所揭示之精神下所完成之均等變化或修飾,均應包含於下述之申請專利範圍內。 The above is only the preferred embodiment of the present invention, and has been used in a wide range of applications. Any other equivalent changes or modifications that are not departing from the spirit of the present invention should be included in the following patent application. Within the scope.

1‧‧‧基板管理控制器 1‧‧‧Baseboard Management Controller

10‧‧‧電腦系統錯誤偵測系統 10‧‧‧Computer System Error Detection System

101‧‧‧參數設置模組 101‧‧‧ parameter setting module

102‧‧‧訊號監測模組 102‧‧‧Signal Monitoring Module

103‧‧‧中斷服務模組 103‧‧‧Interrupt Service Module

104‧‧‧錯誤處理模組 104‧‧‧Error handling module

11‧‧‧微處理器 11‧‧‧Microprocessor

12‧‧‧儲存單元 12‧‧‧ storage unit

2‧‧‧電腦系統 2‧‧‧ computer system

20‧‧‧CPU 20‧‧‧CPU

21‧‧‧記憶體 21‧‧‧ memory

22‧‧‧顯示器 22‧‧‧ display

Claims (10)

一種電腦系統錯誤偵測系統,安裝並運行於基板管理控制器中,該電腦系統包括CPU及記憶體,該基板管理控制器藉由CPU之GPIO埠與電腦系統相連接,該電腦系統錯誤偵測系統包括:參數設置模組,用於設置一個中斷標誌值,及初始化該中斷標誌值為零;訊號監測模組,用於在電腦系統運行過程中即時地監測由CPU藉由GPIO埠輸出之GPIO訊號,及於一段延遲時間內檢測CPU輸出之GPIO訊號是否由高電平訊號轉化為低電平訊號;中斷服務模組,用於當所述GPIO訊號由高電平訊號轉化為低電平訊號時,啟動中斷服務程式觸發一次中斷服務將中斷標誌值加一,及於所述延遲時間後檢測中斷標誌值是否大於等於一;錯誤處理模組,用於當中斷標誌值等於一時確定電腦系統發生之系統錯誤為CPU內部錯誤,及當中斷標誌值大於一時確定電腦系統發生之系統錯誤為記憶體之複數位元錯誤。 A computer system error detection system is installed and operated in a baseboard management controller. The computer system includes a CPU and a memory. The baseboard management controller is connected to the computer system by a GPIO of the CPU, and the computer system detects the error. The system includes: a parameter setting module, configured to set an interrupt flag value, and initialize the interrupt flag value to zero; the signal monitoring module is configured to monitor the GPIO output by the CPU through the GPIO埠 during the running of the computer system. a signal, and detecting whether the GPIO signal output by the CPU is converted from a high level signal to a low level signal during a delay time; and an interrupt service module for converting the GPIO signal from a high level signal to a low level signal The startup interrupt service program triggers an interrupt service to increment the interrupt flag value by one, and detects whether the interrupt flag value is greater than or equal to one after the delay time; the error processing module is configured to determine that the computer system occurs when the interrupt flag value is equal to one. The system error is a CPU internal error, and when the interrupt flag value is greater than one, it is determined that the system error occurred in the computer system is the memory Bit error. 如申請專利範圍第1項所述之電腦系統錯誤偵測系統,其中,所述之錯誤處理模組還用於當電腦系統發生系統錯誤時將所述系統錯誤顯示於電腦之顯示器上。 The computer system error detection system of claim 1, wherein the error processing module is further configured to display the system error on a display of the computer when a system error occurs in the computer system. 如申請專利範圍第1項所述之電腦系統錯誤偵測系統,其中,所述之錯誤處理模組還用於判斷電腦系統是否發生死機,當電腦系統發生死機時,將中斷標誌值清零,並關閉或重起電腦系統對所述系統錯誤進行處理。 The computer system error detection system of claim 1, wherein the error processing module is further configured to determine whether a computer system has a crash, and when the computer system crashes, the interrupt flag value is cleared. And shut down or restart the computer system to handle the system error. 如申請專利範圍第1項所述之電腦系統錯誤偵測系統,其中,當電腦系統正常運行時CPU輸出之GPIO訊號始終為高電平訊號,當CPU發生內部錯誤時CPU輸出之GPIO訊號由高電平訊號變化為低電平訊號,而後一直為低電平訊號,當記憶體發生複數位元錯誤時CPU輸出之GPIO訊號於高電平訊號與低電平訊號之間上下波動。 For example, the computer system error detection system described in claim 1, wherein the GPIO signal output by the CPU is always a high level signal when the computer system is in normal operation, and the GPIO signal output by the CPU is high when an internal error occurs in the CPU. The level signal changes to a low level signal and then remains a low level signal. When a complex bit error occurs in the memory, the GPIO signal output by the CPU fluctuates between a high level signal and a low level signal. 如申請專利範圍第1項所述之電腦系統錯誤偵測系統,其中,所述之延遲時間長短取決於CPU之工作頻率。 The computer system error detection system of claim 1, wherein the delay time depends on the operating frequency of the CPU. 一種電腦系統錯誤偵測方法,該電腦系統包括CPU及記憶體,並藉由CPU之GPIO埠與基板管理控制器相連接,該方法包括步驟:設置一個中斷標誌值,並初始化該中斷標誌值為零;於電腦系統運行過程中即時地監測由CPU藉由GPIO埠輸出之GPIO訊號;判斷所述CPU輸出之GPIO訊號是否為高電平訊號;當所述CPU輸出之GPIO訊號為低電平訊號時,啟動中斷服務程式觸發一次中斷服務將中斷標誌值加一;於一段延遲時間內檢測CPU輸出之GPIO訊號是否由低電平訊號轉化為高電平訊號;當所述GPIO訊號由低電平訊號轉化為高電平訊號時,啟動中斷服務程式觸發一次中斷服務將中斷標誌值加一;於所述延遲時間後判斷中斷標誌值是否大於等於一;若中斷標誌值等於一,則確定電腦系統發生之系統錯誤為CPU內部錯誤;若中斷標誌值大於一,則確定電腦系統發生之系統錯誤為記憶體之複數位元錯誤。 A computer system error detection method, the computer system comprising a CPU and a memory, and connected to the baseboard management controller by a GPIO of the CPU, the method comprising the steps of: setting an interrupt flag value, and initializing the interrupt flag value Zero; during the operation of the computer system, the GPIO signal output by the CPU through the GPIO is monitored instantaneously; whether the GPIO signal output by the CPU is a high level signal; when the GPIO signal output by the CPU is a low level signal When the interrupt service routine is started, the interrupt service triggers the interrupt flag value by one; and detects whether the GPIO signal output by the CPU is converted from a low level signal to a high level signal during a delay time; when the GPIO signal is low level When the signal is converted into a high level signal, the interrupt service routine is triggered to trigger an interrupt service to increase the value of the interrupt flag by one; after the delay time, it is determined whether the value of the interrupt flag is greater than or equal to one; if the value of the interrupt flag is equal to one, the computer system is determined. The system error that occurred is a CPU internal error; if the interrupt flag value is greater than one, it is determined that the system error occurred in the computer system is memory The complex bit is wrong. 如申請專利範圍第6項所述之電腦系統錯誤偵測方法,其中,該方法還包括步驟:當電腦系統發生系統錯誤時,將所述之系統錯誤顯示於電腦之顯示器上。 The computer system error detection method of claim 6, wherein the method further comprises the step of: displaying the system error on the display of the computer when a system error occurs in the computer system. 如申請專利範圍第6項所述之電腦系統錯誤偵測方法,其中,該方法還包括步驟:判斷電腦系統是否發生死機;若電腦系統沒有發生死機,則繼續監測CPU輸出之GPIO訊號;若電腦系統發生死機,則將中斷標誌值清零並關閉或重起電腦系統對所述之系統錯誤進行處理。 The computer system error detection method of claim 6, wherein the method further comprises the steps of: determining whether the computer system has a crash; if the computer system does not crash, continuing to monitor the GPIO signal output by the CPU; If the system crashes, the interrupt flag value is cleared and the computer system is shut down or restarted to process the system error. 如申請專利範圍第6項所述之電腦系統錯誤偵測方法,其中,當電腦系統正常運行時CPU輸出之GPIO訊號始終為高電平訊號,當CPU發生內部錯誤時CPU輸出之GPIO訊號由高電平訊號變化為低電平訊號,而後一直為低電平訊號,當記憶體發生複數位元錯誤時CPU輸出之GPIO訊號於高電平訊號與低電平訊號之間上下波動。 For example, the computer system error detection method described in claim 6 wherein the GPIO signal output by the CPU is always a high level signal when the computer system is in normal operation, and the GPIO signal output by the CPU is high when an internal error occurs in the CPU. The level signal changes to a low level signal and then remains a low level signal. When a complex bit error occurs in the memory, the GPIO signal output by the CPU fluctuates between a high level signal and a low level signal. 如申請專利範圍第6項所述之電腦系統錯誤偵測方法,其中,所述之延遲時間長短取決於CPU之工作頻率。 The computer system error detection method of claim 6, wherein the delay time depends on the operating frequency of the CPU.
TW99146730A 2010-12-29 2010-12-29 System and method for detecting system error of a computer TWI426379B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW99146730A TWI426379B (en) 2010-12-29 2010-12-29 System and method for detecting system error of a computer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW99146730A TWI426379B (en) 2010-12-29 2010-12-29 System and method for detecting system error of a computer

Publications (2)

Publication Number Publication Date
TW201227265A TW201227265A (en) 2012-07-01
TWI426379B true TWI426379B (en) 2014-02-11

Family

ID=46933161

Family Applications (1)

Application Number Title Priority Date Filing Date
TW99146730A TWI426379B (en) 2010-12-29 2010-12-29 System and method for detecting system error of a computer

Country Status (1)

Country Link
TW (1) TWI426379B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107656855B (en) * 2016-07-26 2020-06-30 佛山市顺德区顺达电脑厂有限公司 System and method for reminding user of misplacing CPU

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI220705B (en) * 2002-03-07 2004-09-01 Inventec Corp Method and system for error detecting
CN101201764A (en) * 2006-12-15 2008-06-18 鸿富锦精密工业(深圳)有限公司 Method for restoring embedded system
US20090276666A1 (en) * 2008-04-30 2009-11-05 Egenera, Inc. System, method, and adapter for creating fault-tolerant communication busses from standard components

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI220705B (en) * 2002-03-07 2004-09-01 Inventec Corp Method and system for error detecting
CN101201764A (en) * 2006-12-15 2008-06-18 鸿富锦精密工业(深圳)有限公司 Method for restoring embedded system
US20090276666A1 (en) * 2008-04-30 2009-11-05 Egenera, Inc. System, method, and adapter for creating fault-tolerant communication busses from standard components

Also Published As

Publication number Publication date
TW201227265A (en) 2012-07-01

Similar Documents

Publication Publication Date Title
CN102567177B (en) System and method for detecting error of computer system
CN107122321B (en) Hardware repair method, hardware repair system, and computer-readable storage device
US8738965B2 (en) Test method and test device for restarting a computer based on a hardware information comparison and a restart count
US11068360B2 (en) Error recovery method and apparatus based on a lockup mechanism
CN104636221B (en) Computer system fault processing method and device
US20080104387A1 (en) Hard disk drive self-test system and method
US7783872B2 (en) System and method to enable an event timer in a multiple event timer operating environment
US8954629B2 (en) Adapter and debugging method using the same
US7672247B2 (en) Evaluating data processing system health using an I/O device
US8122176B2 (en) System and method for logging system management interrupts
US8726088B2 (en) Method for processing booting errors
US10776193B1 (en) Identifying an remediating correctable hardware errors
US20140143597A1 (en) Computer system and operating method thereof
US8495626B1 (en) Automated operating system installation
JP6794805B2 (en) Failure information management program, start-up test method and parallel processing device
JP5529686B2 (en) Computer apparatus abnormality inspection method and computer apparatus using the same
TW201423390A (en) Computer system and operating method thereof
US8495353B2 (en) Method and circuit for resetting register
TWI426379B (en) System and method for detecting system error of a computer
KR101494000B1 (en) Method and system for power-on self testing after system off, and booting method the same
TWI497279B (en) Debug device and debug method
US20110172945A1 (en) Method for monitoring burn-in procedure of electronic device
JP6217086B2 (en) Information processing apparatus, error detection function diagnosis method, and computer program
CN107450894B (en) Method for informing startup phase and server system
US20060230196A1 (en) Monitoring system and method using system management interrupt

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees