TWI654518B - Method for storing error status information and server using the same - Google Patents

Method for storing error status information and server using the same

Info

Publication number
TWI654518B
TWI654518B TW105111184A TW105111184A TWI654518B TW I654518 B TWI654518 B TW I654518B TW 105111184 A TW105111184 A TW 105111184A TW 105111184 A TW105111184 A TW 105111184A TW I654518 B TWI654518 B TW I654518B
Authority
TW
Taiwan
Prior art keywords
error
data
peripheral
memory
peripheral device
Prior art date
Application number
TW105111184A
Other languages
Chinese (zh)
Other versions
TW201737086A (en
Inventor
黃翔瑞
Original Assignee
神雲科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 神雲科技股份有限公司 filed Critical 神雲科技股份有限公司
Priority to TW105111184A priority Critical patent/TWI654518B/en
Publication of TW201737086A publication Critical patent/TW201737086A/en
Application granted granted Critical
Publication of TWI654518B publication Critical patent/TWI654518B/en

Links

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

一種錯誤狀態儲存方法,由一伺服器執行,該伺服器包括一處理單元、一基本輸入輸出系統、至少一個周邊裝置,及一至少包括一非揮發性記憶體的晶片組。該方法包含:該處理單元根據一周邊錯誤觸發,產生一處理中斷請求,該周邊錯誤觸發是相關於該至少一個周邊裝置的其中至少一者處於錯誤狀態;該基本輸入輸出系統根據該處理單元的處理中斷請求,記錄該至少一個周邊裝置的每一者的狀態資料,並進行儲存於該晶片組的非揮發性記憶體的程序,該狀態資料包括一裝置識別碼及一對應該裝置識別碼的狀態種類。藉由該錯誤狀態儲存方法,該至少一個周邊裝置的每一者的狀態資料可被保存,以幫助使用者找出該至少一個周邊裝置的其中至少一者發生錯誤的原因,以便能夠更快速地進行除錯。An error state storage method is performed by a server, the server comprising a processing unit, a basic input/output system, at least one peripheral device, and a chip set including at least one non-volatile memory. The method includes: the processing unit generates a processing interrupt request according to a peripheral error trigger, the peripheral error triggering is related to at least one of the at least one peripheral device being in an error state; the basic input output system is according to the processing unit Processing an interrupt request, recording status data of each of the at least one peripheral device, and performing a program stored in the non-volatile memory of the chip set, the status data including a device identification code and a pair of device identification codes Type of status. With the error state storage method, the status data of each of the at least one peripheral device can be saved to help the user find out the cause of the error of at least one of the at least one peripheral device, so that the error can be more quickly Troubleshoot.

Description

錯誤狀態儲存方法及伺服器Error state storage method and server

本發明是有關於一種儲存方法及設備,特別是指一種錯誤狀態儲存方法及伺服器。The invention relates to a storage method and device, in particular to an error state storage method and a server.

現有的伺服器在運行的過程中,可能發生的錯誤例如一周邊元件互連裝置(PCI device)錯誤,該周邊元件互連裝置例如為網路卡、圖形處理卡、視訊加速卡等,且該周邊元件互連裝置發生錯誤時會記錄一錯誤狀態於其狀態暫存器。為了提高系統可靠性,該伺服器的一基本輸入輸出系統通常組配以在這樣的錯誤出現時,執行一系統管理中斷(System Management Interruption)程式來讀取該周邊元件互連裝置的狀態暫存器並記錄該錯誤狀態。In the course of the operation of the existing server, an error such as a peripheral device interconnection device (PCI device) may be generated, and the peripheral component interconnection device is, for example, a network card, a graphics processing card, a video acceleration card, etc., and the like An error condition is recorded in its status register when an error occurs in the peripheral component interconnect. In order to improve system reliability, a basic input/output system of the server is usually configured to perform a system management interrupt (Program Management Interruption) program to read the state temporary storage of the peripheral component interconnection device when such an error occurs. And record the error status.

但是,若使用者需要維修檢錯時,恐無法僅透過該基本輸入輸出系統所記錄的相關於該發生錯誤的該周邊元件互連裝置的該錯誤狀態,一眼看出錯誤的原因,而必須憑檢修經驗判斷,或者費時地對所有周邊元件互連裝置逐一檢測,才能找到出錯的地方。此外,該基本輸入輸出系統的該系統管理中斷程式通常設計成在該基本輸入輸出系統記錄該錯誤狀態後即清除所有狀態暫存器的狀態資料,導致使用者難以對發生錯誤的該周邊元件互連裝置偵錯並診斷出原因。However, if the user needs to perform maintenance error detection, it may not be possible to see the cause of the error at a glance only through the error state of the peripheral component interconnection device recorded by the basic input/output system. Experience judgment, or time-consuming detection of all peripheral component interconnection devices one by one, in order to find the place of error. In addition, the system management interrupt program of the basic input/output system is generally designed to clear the state data of all state registers after the basic input/output system records the error state, thereby making it difficult for the user to interact with the peripheral components in which the error occurred. The device is debugging and diagnosing the cause.

因此,本發明之目的,即在提供一種錯誤狀態儲存方法。Accordingly, it is an object of the present invention to provide an error state storage method.

於是,本發明錯誤狀態儲存方法,該伺服器包括一處理單元、至少一個周邊裝置、一基本輸入輸出系統、一控制晶片組,及一電連接該控制晶片組的晶片組,及一電連接該晶片組的記憶體,該晶片組至少包括一非揮發性記憶體及一基板管理控制器,該錯誤狀態儲存方法包含一步驟(A)、一步驟(B)、一步驟(C),及一步驟(D)。Therefore, the error state storage method of the present invention comprises a processing unit, at least one peripheral device, a basic input/output system, a control chip set, and a chip set electrically connected to the control chip set, and an electrical connection. a memory of the chipset, the chipset including at least one non-volatile memory and a substrate management controller, the error state storage method comprising a step (A), a step (B), a step (C), and a Step (D).

該步驟(A)是該處理單元根據一周邊錯誤觸發,產生一處理中斷請求,該周邊錯誤觸發是相關於該至少一個周邊裝置的其中至少一者處於錯誤狀態。The step (A) is that the processing unit generates a processing interrupt request according to a peripheral error trigger, and the peripheral error trigger is related to at least one of the at least one peripheral device being in an error state.

該步驟(B)是該基本輸入輸出系統根據該處理單元的處理中斷請求,將該至少一個周邊裝置的每一者的狀態資料儲存到該記憶體,該狀態資料包括一裝置識別碼及一對應該裝置識別碼的狀態種類。The step (B) is that the basic input/output system stores the state data of each of the at least one peripheral device to the memory according to the processing interrupt request of the processing unit, the state data includes a device identification code and a pair The type of status of the identification code should be installed.

該步驟(C)該基本輸入輸出系統發送一資料已準備命令經由該控制晶片組至該晶片組的基板管理控制器,該資料已準備命令指示該至少一個周邊裝置的每一者的狀態資料已寫入該記憶體。The step (C), the basic input/output system sends a data ready command via the control chipset to the substrate management controller of the chipset, the data prepared command indicating that the status data of each of the at least one peripheral device has been Write to this memory.

該步驟(D)該基板管理控制器根據該資料已準備命令,以從該記憶體中存取該至少一個周邊裝置的每一者的狀態資料,並寫入該晶片組的非揮發性記憶體。In the step (D), the substrate management controller prepares a command according to the data, to access state data of each of the at least one peripheral device from the memory, and write the non-volatile memory of the chip set. .

本發明之另一個目的,即在於提供一種可以實施該錯誤狀態儲存方法的伺服器。Another object of the present invention is to provide a server that can implement the error state storage method.

於是,本發明伺服器包含一處理單元、至少一個周邊裝置、一基本輸入輸出系統、一電連接該基本輸入輸出系統的控制晶片組、一電連接該控制晶片組的晶片組,及一電連接該晶片組的記憶體。Thus, the server of the present invention comprises a processing unit, at least one peripheral device, a basic input/output system, a control chip set electrically connected to the basic input/output system, a chip set electrically connected to the control chip set, and an electrical connection The memory of the wafer set.

該晶片組至少包括一非揮發性記憶體,及一基板管理控制器。The chip set includes at least one non-volatile memory and a substrate management controller.

該處理單元根據一周邊錯誤觸發,產生一處理中斷請求,該周邊錯誤觸發是相關於該至少一個周邊裝置的其中至少一者處於錯誤狀態。The processing unit generates a processing interrupt request according to a peripheral error trigger, the peripheral error triggering being related to at least one of the at least one peripheral device being in an error state.

其中,該基本輸入輸出系統根據該處理單元的處理中斷請求,將該至少一個周邊裝置的每一者的狀態資料儲存到該記憶體,該狀態資料包括一裝置識別碼及一對應該裝置識別碼的狀態種類。The basic input/output system stores, according to the processing interrupt request of the processing unit, status data of each of the at least one peripheral device to the memory, the status data including a device identification code and a pair of device identification codes. The type of status.

其中,該基本輸入輸出系統發送一資料已準備命令經由該控制晶片組至該晶片組的基板管理控制器,該資料已準備命令指示該至少一個周邊裝置的每一者的狀態資料已寫入該記憶體。Wherein the basic input/output system sends a data ready command via the control chipset to the substrate management controller of the chipset, the data prepared command indicating that the status data of each of the at least one peripheral device has been written to the Memory.

其中,該基板管理控制器根據該資料已準備命令,以從該記憶體中存取該至少一個周邊裝置的每一者的狀態資料,並寫入該晶片組的非揮發性記憶體。The baseboard management controller prepares a command according to the data to access state data of each of the at least one peripheral device from the memory and write the non-volatile memory of the chipset.

本發明之功效在於:藉由在發現該至少一個周邊裝置的其中至少一者處於錯誤狀態時,觸發該基本輸入輸出系統將該至少一個周邊裝置的每一者的狀態資料儲存到該記憶體,並且該基板管理控制器從該記憶體存取該狀態資料並儲存到該非揮發性記憶體,該至少一個周邊裝置的每一者的狀態資料可被保存,來幫助使用者找出該至少一個周邊裝置的其中至少一者發生錯誤的原因,以便能夠更快速地進行除錯。The effect of the present invention is to trigger the basic input/output system to store state data of each of the at least one peripheral device to the memory by detecting that at least one of the at least one peripheral device is in an error state, And the substrate management controller accesses the status data from the memory and stores the status data into the non-volatile memory, and the status data of each of the at least one peripheral device can be saved to help the user find the at least one periphery. At least one of the devices has a cause of error so that debugging can be performed more quickly.

參閱圖1,本發明伺服器的一實施例,包含一周邊連接匯流排1、一周邊裝置2、一控制晶片組3、一晶片組4、一基本輸入輸出系統(Basic Input/Output System)5、一記憶體6,及一處理單元7。Referring to FIG. 1, an embodiment of a server of the present invention includes a peripheral connection busbar 1, a peripheral device 2, a control chipset 3, a chipset 4, and a basic input/output system (Basic Input/Output System). , a memory 6, and a processing unit 7.

該周邊連接匯流排1例如一周邊元件互連(Peripheral Component Interconnect, PCI)匯流排、或者例如是一高速周邊元件互連匯流排(Peripheral Component Interconnect express, PCIe)。The peripheral connection bus 1 is, for example, a Peripheral Component Interconnect (PCI) bus, or is, for example, a high-speed Peripheral Component Interconnect Express (PCIe).

該周邊裝置2為周邊元件互連裝置(PCI device),電連接該周邊連接匯流排1,且例如為網路卡、圖形處理晶片、視訊加速晶片等。本實施例為方便說明起見,僅在圖1中繪出一個周邊裝置2作為示意,但其數目也可以是兩個、三個等…,而不以所繪示者為限。該周邊裝置2具有一狀態暫存器(圖未示出),且在運作正常時,該周邊裝置2會記錄一正常運作狀態於該狀態暫存器,而在運作發生錯誤時,會記錄一錯誤狀態於該狀態暫存器。在此需說明的是,在本實施例中所提及的該錯誤狀態,是選自於一同位元錯誤(PERR)、一系統錯誤(SERR)、一可修正周邊元件互連錯誤(correctable PCI error)、一不可修正周邊元件互連錯誤(uncorrectable PCI error)、一致命周邊元件互連錯誤(fatal PCI error)的其中一者。The peripheral device 2 is a peripheral device interconnecting device (PCI device) electrically connected to the peripheral connection bus bar 1 and is, for example, a network card, a graphics processing chip, a video acceleration chip, or the like. For convenience of explanation, only one peripheral device 2 is illustrated in FIG. 1 as an illustration, but the number may also be two, three, etc., and is not limited to the one shown. The peripheral device 2 has a state register (not shown), and when the device is in normal operation, the peripheral device 2 records a normal operating state in the state register, and when an operation error occurs, a record is recorded. The error status is in this status register. It should be noted that the error state mentioned in this embodiment is selected from a common bit error (PERR), a system error (SERR), and a correctable peripheral component interconnection error (correctable PCI). Error), one of the uncorrectable PCI error, a fatal PCI error.

該晶片組4包括一周邊裝置41、一基板管理控制器(Baseboard Management Controller)42,及一非揮發性記憶體44,在本實施例中,該非揮發性記憶體44位於該晶片組4的該基板管理控制器42內,但並不以此為限。該周邊裝置41連接於該周邊連接匯流排1,且例如是一在板(on-board)的視訊圖形陣列(VGA)晶片,並具有一狀態暫存器(圖未示出),該周邊裝置41在運作正常時,會記錄該正常運作狀態於該狀態暫存器,而在運作發生錯誤時,會記錄該錯誤狀態於該狀態暫存器。該基板管理控制器42用以監看該伺服器的例如風扇轉速、供應電壓、系統溫度等運作情形,並且該基板管理控制器42的非揮發性記憶體44包括一系統事件日誌(system event log)43及一系統日誌(system log)45,該系統事件日誌43用以記錄一裝置識別碼(device ID)及一對應該裝置識別碼的錯誤狀態。該系統日誌(system log)用以記錄例如該周邊裝置2、該周邊裝置41的狀態資料,該狀態資料包括一裝置識別碼及一對應該裝置識別碼的狀態種類,且該狀態種類可為該正常運作狀態與該錯誤狀態的其中一者。The chip set 4 includes a peripheral device 41, a baseboard management controller 42 and a non-volatile memory 44. In the embodiment, the non-volatile memory 44 is located in the chip set 4. The substrate management controller 42 is internal, but not limited thereto. The peripheral device 41 is connected to the peripheral connection busbar 1, and is, for example, an on-board video graphics array (VGA) chip, and has a state register (not shown), the peripheral device When the operation is normal, the normal operation state is recorded in the state register, and when an operation error occurs, the error state is recorded in the state register. The baseboard management controller 42 is configured to monitor operating conditions of the server such as fan speed, supply voltage, system temperature, etc., and the non-volatile memory 44 of the baseboard management controller 42 includes a system event log (system event log) And a system log 45 for recording a device ID and a pair of device identifiers. The system log is used to record, for example, the status data of the peripheral device 2 and the peripheral device 41. The status data includes a device identification code and a status type of the pair of device identification codes, and the status category may be One of the normal operating state and the error state.

該控制晶片組3為一平台控制器中心(Platform Controller Hub, PCH),例如為一南橋晶片,且該控制晶片組3電連接該周邊連接匯流排1、該晶片組4與該基本輸入輸出系統5。當該控制晶片組3經由該周邊連接匯流排1,讀取該周邊裝置2的狀態暫存器及\或是該周邊裝置41的狀態暫存器,而得知該周邊裝置2及\或是該周邊裝置41處於該錯誤狀態時,可以產生一周邊錯誤觸發,以指示與該周邊連接匯流排1連接的任何一個周邊裝置處於該錯誤狀態。The control chipset 3 is a Platform Controller Hub (PCH), for example, a south bridge chip, and the control chip set 3 is electrically connected to the peripheral connection busbar 1, the chipset 4, and the basic input/output system. 5. When the control chip group 3 is connected to the bus bar 1 via the periphery, the state register of the peripheral device 2 and/or the state register of the peripheral device 41 are read, and the peripheral device 2 and/or When the peripheral device 41 is in the wrong state, a peripheral error trigger can be generated to indicate that any one of the peripheral devices connected to the peripheral connection busbar 1 is in the error state.

該記憶體6包括一狀態儲存區塊61,該狀態儲存區塊61為該記憶體6中分配給該晶片組4的該周邊裝置41(亦即該VGA晶片)存放資料並能供該基板管理控制器42存取。該記憶體6例如為一供該周邊裝置41(亦即該VGA晶片)存放資料的動態隨機存取記憶體(DRAM)、第二代雙倍資料率(DDR2)記憶體、或例如為一第三代雙倍資料率(DDR3)記憶體。The memory 6 includes a state storage block 61. The state storage block 61 stores data for the peripheral device 41 (that is, the VGA chip) allocated to the chip set 4 in the memory 6, and can be used for the substrate management. The controller 42 accesses. The memory 6 is, for example, a dynamic random access memory (DRAM) for storing data in the peripheral device 41 (that is, the VGA chip), a second generation double data rate (DDR2) memory, or, for example, a first Three generations of double data rate (DDR3) memory.

該處理單元7為該伺服器的中央處理單元,電連接該周邊連接匯流排1並包括一北橋晶片71,該北橋晶片71用於處理高速訊號,例如高速周邊元件互連(PCIe)介面信號等,還有負責與該控制晶片組3之間的通信。該北橋晶片71也有類似該控制晶片組3的功能,當其經由該周邊連接匯流排1,讀取該周邊裝置2及\或是該周邊裝置41的狀態暫存器,而得知該周邊裝置2及\或是該周邊裝置41處於該錯誤狀態時,可以產生一周邊錯誤觸發,以指示與該周邊連接匯流排1連接的任何一個周邊裝置處於該錯誤狀態。The processing unit 7 is a central processing unit of the server, electrically connected to the peripheral connection busbar 1 and includes a north bridge wafer 71 for processing high speed signals, such as high speed peripheral component interconnection (PCIe) interface signals, etc. Also responsible for communication with the control chipset 3. The north bridge wafer 71 also has a function similar to the control chip group 3, and when it is connected to the bus bar 1 via the periphery, the peripheral device 2 and/or the state register of the peripheral device 41 are read, and the peripheral device is known. 2 and \ or when the peripheral device 41 is in the wrong state, a peripheral error trigger can be generated to indicate that any peripheral device connected to the peripheral connection busbar 1 is in the error state.

該基本輸入輸出系統5電連接該控制晶片組3,並具有一系統管理中斷處理模組(system management interrupt handler)51。當該處理單元7接收到來自該控制晶片組3的周邊錯誤觸發,或者是來自該北橋晶片71的周邊錯誤觸發時,進入一系統管理模式(System Management Mode, SMM),並將系統控制權轉移到該基本輸入輸出系統5的系統管理中斷處理模組51,以由該系統管理中斷處理模組51判斷中斷產生的原因。該系統管理中斷處理模組51例如可為一程式。The basic input/output system 5 is electrically connected to the control chip set 3 and has a system management interrupt handler 51. When the processing unit 7 receives the peripheral error trigger from the control chip set 3, or the peripheral error trigger from the north bridge wafer 71, enters a system management mode (SMM), and transfers the system control right. The system management interrupt processing module 51 of the basic input/output system 5 determines the cause of the interruption by the system management interrupt processing module 51. The system management interrupt processing module 51 can be, for example, a program.

參閱圖2,本發明錯誤狀態儲存方法的一實施例,於圖1所示的該伺服器中執行,且該方法包含以下步驟。Referring to FIG. 2, an embodiment of the error state storage method of the present invention is executed in the server shown in FIG. 1, and the method includes the following steps.

首先,在步驟(A),該處理單元7根據一周邊錯誤觸發,產生一處理中斷請求。如前所述,該周邊錯誤觸發是在該周邊裝置2處於該錯誤狀態時,由該控制晶片組3所產生,或者是由該北橋晶片71所產生。該處理中斷請求為一系統管理中斷(System Management Interruption, SMI),此時,該處理單元7進入該系統管理模式(System Management Mode, SMM)並將系統控制權轉移到該基本輸入輸出系統5的系統管理中斷處理模組51。First, in step (A), the processing unit 7 generates a processing interrupt request based on a peripheral error trigger. As described above, the peripheral error trigger is generated by the control chip group 3 or generated by the north bridge wafer 71 when the peripheral device 2 is in the error state. The processing interrupt request is a System Management Interruption (SMI). At this time, the processing unit 7 enters the System Management Mode (SMM) and transfers system control rights to the basic input/output system 5. The system manages the interrupt processing module 51.

接著,在步驟(E),該基本輸入輸出系統5的系統管理中斷處理模組51回應於該處理單元7的處理中斷請求,記錄該周邊錯誤資料,該周邊錯誤資料即處於錯誤狀態的該周邊裝置的狀態資料,並且包括一裝置識別碼及一對應該裝置識別碼的錯誤狀態。例如,當該周邊裝置2運作異常時,該周邊錯誤資料包括該周邊裝置2的裝置識別碼,以及對應的該錯誤狀態,例如前述的該同位元錯誤、該系統錯誤、該可修正周邊元件互連錯誤、該不可修正周邊元件互連錯誤,及該致命周邊元件互連錯誤的其中一者。在本實施例中,該周邊錯誤資料的大小約為12 bytes。Next, in step (E), the system management interrupt processing module 51 of the basic input/output system 5 responds to the processing interrupt request of the processing unit 7, and records the peripheral error data, that is, the peripheral error data is in the periphery of the error state. The status data of the device, and includes a device identification code and an error status of a pair of device identification codes. For example, when the peripheral device 2 operates abnormally, the peripheral error data includes the device identification code of the peripheral device 2, and the corresponding error state, such as the aforementioned homotopic error, the system error, and the correctable peripheral component mutual. A fault, a non-correctable peripheral component interconnection error, and one of the fatal peripheral component interconnection errors. In this embodiment, the size of the peripheral error data is about 12 bytes.

接著,在步驟(F),該基本輸入輸出系統5的系統管理中斷處理模組51發送該周邊錯誤資料,經由該控制晶片組3至該晶片組4的基板管理控制器42。Next, in step (F), the system management interrupt processing module 51 of the basic input/output system 5 transmits the peripheral error data via the control wafer set 3 to the substrate management controller 42 of the wafer set 4.

接著,在步驟(G),該基板管理控制器42將該周邊錯誤資料寫入該非揮發性記憶體44的系統事件日誌43。Next, in step (G), the substrate management controller 42 writes the peripheral error data into the system event log 43 of the non-volatile memory 44.

接著,在步驟(B),該基本輸入輸出系統5的系統管理中斷處理模組51根據該處理中斷請求,將每一個周邊裝置的狀態資料儲存到該記憶體6。該步驟(B)包括以下子步驟。Next, in step (B), the system management interrupt processing module 51 of the basic input/output system 5 stores the status data of each peripheral device in the memory 6 based on the processing interrupt request. This step (B) includes the following sub-steps.

步驟(B0),該基本輸入輸出系統5的系統管理中斷處理模組51記錄每一個周邊裝置2的狀態資料。詳細來說,該基本輸入輸出系統5掃描該周邊連接匯流排1上連接的所有周邊裝置,以讀取每一個周邊裝置(本例中的該周邊裝置2、該周邊裝置41)的狀態資料,並且暫存於該處理單元7的例如一外接的動態隨機存取記憶體(DRAM)(圖未示出)。例如該周邊裝置2運作異常、該周邊裝置41為正常運作,該基本輸入輸出系統5所記錄的該狀態資料,包括該周邊裝置2的裝置識別碼與對應的該錯誤狀態,以及該周邊裝置41的裝置識別碼與對應的該正常運作狀態。In step (B0), the system management interrupt processing module 51 of the basic input/output system 5 records the status data of each peripheral device 2. In detail, the basic input/output system 5 scans all peripheral devices connected to the peripheral connection busbar 1 to read status data of each peripheral device (the peripheral device 2 and the peripheral device 41 in this example). And temporarily stored in the processing unit 7, for example, an external dynamic random access memory (DRAM) (not shown). For example, the peripheral device 2 operates abnormally, and the peripheral device 41 is in normal operation. The status data recorded by the basic input/output system 5 includes the device identification code of the peripheral device 2 and the corresponding error state, and the peripheral device 41. The device identification code corresponds to the normal operating state.

步驟(B1),該基本輸入輸出系統5的系統管理中斷處理模組51取得一記憶體位址資料,且該記憶體位址資料相關於該記憶體6中的該狀態儲存區塊61的位址。詳細來說,該記憶體位址資料儲存於該周邊裝置41的一記憶體映射暫存器(memory mapped I/O register,MMIO register)(圖未示出)中,且該記憶體位址資料是在開機時,由該基本輸入輸出系統利用記憶體位址映射技術分配給該周邊裝置41的位址,且該記憶體位址資料指示該記憶體6的狀態儲存區塊61的起始位址與存取範圍。In step (B1), the system management interrupt processing module 51 of the basic input/output system 5 obtains a memory address data, and the memory address data is related to the address of the state storage block 61 in the memory 6. In detail, the memory address data is stored in a memory mapped I/O register (MMIO register) (not shown) of the peripheral device 41, and the memory address data is At the time of power-on, the basic input/output system allocates the address of the peripheral device 41 by using the memory address mapping technology, and the memory address data indicates the start address and access of the state storage block 61 of the memory 6. range.

步驟(B2),該基本輸入輸出系統5的系統管理中斷處理模組51根據該記憶體位址資料,將該處理單元7的該外接的動態隨機存取記憶體(DRAM)(圖未示出)中所暫存的每一個周邊裝置(本例中的該周邊裝置2、該周邊裝置41)的狀態資料複製到該記憶體6的狀態儲存區塊61。較佳地,該基本輸入輸出系統5判斷該狀態資料的大小,是否在該記憶體位址資料所指示的該狀態儲存區塊61的存取範圍之內,如果在存取範圍內,則將該狀態資料一次存入該記憶體6的狀態儲存區塊61,如果超過該存取範圍,則分成多次存入該狀態儲存區塊61,每一次是存入符合該存取範圍的資料量。Step (B2), the system management interrupt processing module 51 of the basic input/output system 5 selects the external dynamic random access memory (DRAM) of the processing unit 7 according to the memory address data (not shown). The status data of each of the peripheral devices (the peripheral device 2 and the peripheral device 41 in this example) temporarily stored in the medium is copied to the state storage block 61 of the memory 6. Preferably, the basic input/output system 5 determines whether the size of the status data is within the access range of the status storage block 61 indicated by the memory address data, and if it is within the access range, The status data is once stored in the status storage block 61 of the memory 6. If the access range is exceeded, the status storage block 61 is stored in multiple times, and each time the amount of data conforming to the access range is stored.

接著,在步驟(C),該基本輸入輸出系統5的系統管理中斷處理模組51發送一資料已準備命令,經由該控制晶片組3至該基板管理控制器42,該資料已準備命令指示每一個周邊裝置2的狀態資料已寫入該記憶體6的狀態儲存區塊61中。Next, in step (C), the system management interrupt processing module 51 of the basic input/output system 5 sends a data prepared command via the control chipset 3 to the baseboard management controller 42, the data ready command indicating each The status data of a peripheral device 2 has been written into the state storage block 61 of the memory 6.

接著,在步驟(D),該晶片組4的基板管理控制器42根據該資料已準備命令,以從該記憶體6的該狀態儲存區塊61中存取每一個周邊裝置(本例中的該周邊裝置2、該周邊裝置41)的狀態資料,並寫入該非揮發性記憶體44的系統日誌45。較佳地,該基本輸入輸出系統5根據該狀態資料的大小判斷是否每一個周邊裝置(本例中的該周邊裝置2、該周邊裝置41)的狀態資料都已完全複製到該記憶體6的狀態儲存區塊61。若判斷的結果為否,則返回步驟(B2)將未複製完的該狀態資料存入該記憶體6的狀態儲存區塊61;若判斷的結果為是,則將該周邊裝置2、該周邊裝置41的狀態暫存器的狀態資料清除。Next, in step (D), the substrate management controller 42 of the wafer set 4 has prepared a command according to the data to access each peripheral device from the state storage block 61 of the memory 6 (in this example The status information of the peripheral device 2 and the peripheral device 41) is written into the system log 45 of the non-volatile memory 44. Preferably, the basic input/output system 5 determines whether the state data of each peripheral device (the peripheral device 2 and the peripheral device 41 in this example) has been completely copied to the memory 6 according to the size of the state data. Status storage block 61. If the result of the determination is no, returning to step (B2), the unrepeated state data is stored in the state storage block 61 of the memory 6. If the result of the determination is yes, the peripheral device 2, the periphery The status data of the status register of the device 41 is cleared.

由以上說明可知,本發明錯誤儲存方法,能在至少一個周邊裝置發生錯誤時,藉由該基本輸入輸出系統5的系統管理中斷處理模組51儲存該周邊錯誤資料至該基板管理控制器42的系統事件日誌43,以及儲存每一個周邊裝置的狀態資料至該記憶體6,並且該基板管理控制器42儲存每一個周邊裝置的狀態資料至該非揮發性記憶體44的系統日誌45,即能在該系統管理中斷處理模組51將每一個周邊裝置2的狀態暫存器的狀態資料清除之前,將該周邊錯誤資料及每一個周邊裝置的狀態資料予以保存,以在錯誤發生時協助使用者判斷是不是因為其它的周邊裝置發生問題而觸發的,以便能夠更快速地進行除錯,因此,確實可達到本發明之目的。It can be seen from the above description that the error storage method of the present invention can store the peripheral error data to the substrate management controller 42 by the system management interrupt processing module 51 of the basic input/output system 5 when an error occurs in at least one peripheral device. a system event log 43, and storing status data of each peripheral device to the memory 6, and the substrate management controller 42 stores the status data of each peripheral device to the system log 45 of the non-volatile memory 44, that is, The system management interrupt processing module 51 saves the peripheral error data and the status data of each peripheral device before the status data of each state device of the peripheral device 2 is cleared, to assist the user in determining the error when the error occurs. Whether it is triggered by a problem with other peripheral devices, so that debugging can be performed more quickly, and therefore, the object of the present invention can be achieved.

惟以上所述者,僅為本發明之較佳實施例而已,當不能以此限定本發明實施之範圍,凡是依本發明申請專利範圍及專利說明書內容所作之簡單的等效變化與修飾,皆仍屬本發明專利涵蓋之範圍內。The above is only the preferred embodiment of the present invention, and the scope of the present invention is not limited thereto, and the simple equivalent changes and modifications made by the scope of the patent application and the patent specification of the present invention are It is still within the scope of the invention patent.

1‧‧‧周邊連接匯流排1‧‧‧ peripheral connection bus

5‧‧‧基本輸入輸出系統5‧‧‧Basic input and output system

2‧‧‧周邊裝置2‧‧‧ peripheral devices

51‧‧‧系統管理中斷處理模組51‧‧‧System Management Interruption Processing Module

3‧‧‧控制晶片組3‧‧‧Control chipset

6‧‧‧記憶體6‧‧‧ memory

4‧‧‧晶片組4‧‧‧ Chipset

61‧‧‧狀態儲存區塊61‧‧‧ State storage block

41‧‧‧周邊裝置41‧‧‧ Peripheral devices

7‧‧‧處理單元7‧‧‧Processing unit

42‧‧‧基板管理控制器42‧‧‧Base Management Controller

71‧‧‧北橋晶片71‧‧‧ North Bridge Chip

43‧‧‧系統事件日誌43‧‧‧System event log

A~G‧‧‧步驟A~G‧‧‧ steps

44‧‧‧非揮發性記憶體44‧‧‧ Non-volatile memory

B0~B2‧‧‧子步驟B0~B2‧‧‧ substeps

45‧‧‧系統日誌45‧‧‧System log

本發明之其他的特徵及功效,將於參照圖式的實施方式中清楚地呈現,其中: 圖1是一方塊圖,說明本發明伺服器的一實施例;及 圖2是一流程圖,說明本發明錯誤狀態儲存方法的一實施例。Other features and advantages of the present invention will be apparent from the embodiments of the present invention, wherein: Figure 1 is a block diagram illustrating an embodiment of the server of the present invention; and Figure 2 is a flow chart illustrating An embodiment of the error state storage method of the present invention.

Claims (9)

一種錯誤狀態儲存方法,由一伺服器執行,該伺服器包括一處理單元、至少一個周邊裝置、一基本輸入輸出系統、一電連接該基本輸入輸出系統的控制晶片組、一電連接該控制晶片組的晶片組,及一電連接該晶片組的記憶體,該晶片組至少包括一非揮發性記憶體及一基板管理控制器,該錯誤狀態儲存方法包含:(A)該處理單元根據一周邊錯誤觸發,產生一處理中斷請求,該周邊錯誤觸發是相關於該至少一個周邊裝置的其中至少一者處於錯誤狀態;(B)該基本輸入輸出系統根據該處理單元的處理中斷請求,將該至少一個周邊裝置的每一者的狀態資料儲存到該記憶體,該狀態資料包括一裝置識別碼及一對應該裝置識別碼的狀態種類;(C)該基本輸入輸出系統發送一資料已準備命令經由該控制晶片組至該晶片組的基板管理控制器,該資料已準備命令指示該至少一個周邊裝置的每一者的狀態資料已寫入該記憶體;及(D)該基板管理控制器根據該資料已準備命令,以從該記憶體中存取該至少一個周邊裝置的每一者的狀態資料,並寫入該晶片組的非揮發性記憶體;(E)該基本輸入輸出系統根據該處理單元的處理中斷請求,還記錄一周邊錯誤資料,該周邊錯誤資料包括一裝置識別碼及一對應該裝置識別碼的錯誤狀態;及 (F)該基本輸入輸出系統發送該周邊錯誤資料,經由該控制晶片組至該晶片組的基板管理控制器。 An error state storage method is performed by a server, the server includes a processing unit, at least one peripheral device, a basic input/output system, a control chip set electrically connected to the basic input/output system, and an electrical connection of the control chip a set of chips, and a memory electrically connected to the chip set, the chip set including at least one non-volatile memory and a substrate management controller, the error state storage method comprising: (A) the processing unit according to a periphery The error triggering generates a processing interrupt request, wherein the peripheral error triggering is related to at least one of the at least one peripheral device being in an error state; (B) the basic input output system is configured to interrupt the request according to the processing unit The status data of each of the peripheral devices is stored in the memory, the status data includes a device identification code and a status category of the pair of device identification codes; (C) the basic input/output system sends a data ready command via Controlling the chipset to the substrate management controller of the chipset, the data has prepared a command to indicate the at least The status data of each of the peripheral devices has been written to the memory; and (D) the substrate management controller has prepared a command based on the data to access each of the at least one peripheral device from the memory. State data and write to the non-volatile memory of the chipset; (E) the basic input/output system records a peripheral error data according to the processing interrupt request of the processing unit, and the peripheral error data includes a device identification code And a pair of error states that should be the device identification code; and (F) The basic input/output system transmits the peripheral error data via the control chip set to the substrate management controller of the chip set. 如請求項第1項所述的錯誤狀態儲存方法,還包含:(G)該基板管理控制器將來自該基本輸入輸出系統的該周邊錯誤資料寫入該晶片組的非揮發性記憶體的一系統事件日誌(system event log)。 The error state storage method of claim 1, further comprising: (G) the substrate management controller writing the peripheral error data from the basic input/output system to one of the non-volatile memory of the chipset System event log. 如請求項第1項所述的錯誤狀態儲存方法,其中,在該步驟(D)中,該基板管理控制器將該至少一個周邊裝置的每一者的狀態資料寫入該晶片組的非揮發性記憶體的一系統日誌(system log)。 The error state storage method according to Item 1, wherein in the step (D), the substrate management controller writes the state data of each of the at least one peripheral device to the non-volatile portion of the chip set. A system log of sexual memory. 如請求項第1項所述的錯誤狀態儲存方法,該伺服器的晶片組還包括一周邊裝置,且該伺服器的記憶體具有一分配給該晶片組的周邊裝置存放資料並能供該基板管理控制器存取的一狀態儲存區塊,其中,該步驟(B)包括(B0)該基本輸入輸出系統記錄該至少一個周邊裝置的每一者的狀態資料,(B1)該基本輸入輸出系統取得一記憶體位址資料,且該記憶體位址資料相關於該記憶體的狀態儲存區塊,及(B2)該基本輸入輸出系統根據該記憶體位址資料,複製該至少一個周邊裝置的每一者的狀態資料於該記憶體的狀態儲存區塊中。 In the error state storage method of claim 1, the chipset of the server further includes a peripheral device, and the memory of the server has a peripheral device allocated to the chipset for storing data and can be used for the substrate. Managing a state storage block accessed by the controller, wherein the step (B) comprises (B0) the basic input output system recording status data of each of the at least one peripheral device, (B1) the basic input output system Obtaining a memory address data, wherein the memory address data is related to the state storage block of the memory, and (B2) the basic input/output system copies each of the at least one peripheral device according to the memory address data The status data is in the state storage block of the memory. 如請求項第1項所述的錯誤狀態儲存方法,該處理單元包括一北橋晶片,其中,在該步驟(A)中,該周邊錯誤觸 發是由該處理單元的北橋晶片與該控制晶片組其中之一者所產生。 The error state storage method of claim 1, the processing unit comprising a north bridge chip, wherein in the step (A), the peripheral error touch The hair is generated by one of the north bridge wafer of the processing unit and the control chip set. 如請求項第1項所述的錯誤狀態儲存方法,其中,在該步驟(A)中,該處理單元所產生的該處理中斷請求為一系統管理中斷(System Management Interruption,SMI),並且該處理單元進入一系統管理模式(System Management Mode,SMM)。 The error state storage method according to Item 1, wherein in the step (A), the processing interrupt request generated by the processing unit is a System Management Interruption (SMI), and the processing is The unit enters a System Management Mode (SMM). 一種伺服器,包含:至少一個周邊裝置;一基本輸入輸出系統;一控制晶片組,電連接該基本輸入輸出系統;一晶片組,電連接該控制晶片組,並至少包括一非揮發性記憶體,及一基板管理控制器;一記憶體,電連接該晶片組;及一處理單元,根據一周邊錯誤觸發,產生一處理中斷請求,該周邊錯誤觸發是相關於該至少一個周邊裝置的其中至少一者處於錯誤狀態;其中,該基本輸入輸出系統根據該處理單元的處理中斷請求,將該至少一個周邊裝置的每一者的狀態資料儲存到該記憶體,該狀態資料包括一裝置識別碼及一對應該裝置識別碼的狀態種類;其中,該基本輸入輸出系統發送一資料已準備命令經由該控制晶片組至該晶片組的基板管理控制器,該資料已 準備命令指示該至少一個周邊裝置的每一者的狀態資料已寫入該記憶體;其中,該基板管理控制器根據該資料已準備命令,以從該記憶體中存取該至少一個周邊裝置的每一者的狀態資料,並寫入該晶片組的非揮發性記憶體;該基本輸入輸出系統根據該處理單元的處理中斷請求,還記錄一周邊錯誤資料,該周邊錯誤資料包括一裝置識別碼及一對應該裝置識別碼的錯誤狀態;該基本輸入輸出系統發送該周邊錯誤資料,經由該控制晶片組至該晶片組的基板管理控制器。 A server comprising: at least one peripheral device; a basic input/output system; a control chip set electrically connected to the basic input/output system; a chip set electrically connected to the control chip set and including at least one non-volatile memory And a substrate management controller; a memory that electrically connects the chip set; and a processing unit that generates a processing interrupt request according to a peripheral error trigger, the peripheral error trigger being related to at least one of the at least one peripheral device One is in an error state; wherein the basic input/output system stores the state data of each of the at least one peripheral device to the memory according to the processing interrupt request of the processing unit, the state data includes a device identification code and a pair of state types of the device identification code; wherein the basic input/output system sends a data ready command to control the chipset to the substrate management controller of the chipset via the control chip set, the data has been Preparing a command indicating that status data of each of the at least one peripheral device has been written to the memory; wherein the baseboard management controller has prepared a command based on the data to access the at least one peripheral device from the memory Each of the status data is written to the non-volatile memory of the chip set; the basic input/output system further records a peripheral error data according to the processing interrupt request of the processing unit, and the peripheral error data includes a device identification code And an error status of the pair of device identification codes; the basic input/output system transmits the peripheral error data via the control chip set to the substrate management controller of the chip set. 如請求項第7項所述的伺服器,其中,該伺服器的晶片組的非揮發性記憶體具有一系統事件日誌;該基板管理控制器將來自該基本輸入輸出系統的該周邊錯誤資料寫入該晶片組的非揮發性記憶體的該系統事件日誌。 The server of claim 7, wherein the non-volatile memory of the chipset of the server has a system event log; the baseboard management controller writes the peripheral error data from the basic input/output system The system event log into the non-volatile memory of the chipset. 如請求項第7項所述的伺服器,其中,該伺服器的晶片組還包括一周邊裝置,且該伺服器的記憶體具有一分配給該晶片組的周邊裝置存放資料並能供該基板管理控制器存取的一狀態儲存區塊,該伺服器的晶片組的非揮發性記憶體具有一系統日誌;該基本輸入輸出系統記錄該至少一個周邊裝置的每一者的狀態資料,並取得一相關於該狀態儲存區塊的記憶體位址資料,據以複製該至少一個周邊裝置的每一者的狀態資料於該記憶體的該狀態儲存區塊中; 該基板管理控制器將該至少一個周邊裝置的每一者的狀態資料寫入該晶片組的非揮發性記憶體的該系統日誌。 The server of claim 7, wherein the server chipset further comprises a peripheral device, and the memory of the server has a peripheral device assigned to the chip set for storing data and for the substrate a state storage block accessed by the controller, the non-volatile memory of the chipset of the server having a system log; the basic input output system records status data of each of the at least one peripheral device, and obtains Corresponding to the memory address data of the state storage block, the state data of each of the at least one peripheral device is copied in the state storage block of the memory; The substrate management controller writes the status data of each of the at least one peripheral device to the system log of the non-volatile memory of the chip set.
TW105111184A 2016-04-11 2016-04-11 Method for storing error status information and server using the same TWI654518B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW105111184A TWI654518B (en) 2016-04-11 2016-04-11 Method for storing error status information and server using the same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW105111184A TWI654518B (en) 2016-04-11 2016-04-11 Method for storing error status information and server using the same

Publications (2)

Publication Number Publication Date
TW201737086A TW201737086A (en) 2017-10-16
TWI654518B true TWI654518B (en) 2019-03-21

Family

ID=61021825

Family Applications (1)

Application Number Title Priority Date Filing Date
TW105111184A TWI654518B (en) 2016-04-11 2016-04-11 Method for storing error status information and server using the same

Country Status (1)

Country Link
TW (1) TWI654518B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090049221A1 (en) 2007-08-14 2009-02-19 Dell Products, Lp System and method of obtaining error data within an information handling system
TWI337707B (en) 2005-10-14 2011-02-21 Dell Products Lp System and method for logging recoverable errors

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI337707B (en) 2005-10-14 2011-02-21 Dell Products Lp System and method for logging recoverable errors
US20090049221A1 (en) 2007-08-14 2009-02-19 Dell Products, Lp System and method of obtaining error data within an information handling system

Also Published As

Publication number Publication date
TW201737086A (en) 2017-10-16

Similar Documents

Publication Publication Date Title
US10824499B2 (en) Memory system architectures using a separate system control path or channel for processing error information
TWI605459B (en) Dynamic application of ecc based on error type
CN105589762B (en) Memory device, memory module and method for error correction
KR102350538B1 (en) DDR memory error recovery
US6615374B1 (en) First and next error identification for integrated circuit devices
TWI553650B (en) Method, apparatus and system for handling data error events with a memory controller
KR102378466B1 (en) Memory devices and modules
CN109155146A (en) Prosthetic device after integral type encapsulation
US11282584B2 (en) Multi-chip package and method of testing the same
US11797369B2 (en) Error reporting for non-volatile memory modules
CN113568777B (en) Fault processing method, device, network chip, equipment and storage medium
US10911259B1 (en) Server with master-slave architecture and method for reading and writing information thereof
TWI654518B (en) Method for storing error status information and server using the same
CN107451028A (en) Error condition storage method and server
JP4299634B2 (en) Information processing apparatus and clock abnormality detection program for information processing apparatus
TWI733964B (en) System for testing whole memory and method thereof
CN108231134B (en) RAM yield remediation method and device
WO2019169615A1 (en) Method for accessing code sram, and electronic device
JP2003022222A (en) Information processor and its maintenance method
JP7400015B2 (en) Data storage device with data verification circuit
JP2002100979A (en) Information processor and error information holding method for information processor
CN116414619A (en) Computer system and method executed in computer system
CN117581211A (en) In-system mitigation of uncorrectable errors based on confidence factors, based on fault perception analysis
CN117707884A (en) Method, system, equipment and medium for monitoring power management chip
TW202324096A (en) Storage device

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees