TWI654518B - Method for storing error status information and server using the same - Google Patents
Method for storing error status information and server using the sameInfo
- Publication number
- TWI654518B TWI654518B TW105111184A TW105111184A TWI654518B TW I654518 B TWI654518 B TW I654518B TW 105111184 A TW105111184 A TW 105111184A TW 105111184 A TW105111184 A TW 105111184A TW I654518 B TWI654518 B TW I654518B
- Authority
- TW
- Taiwan
- Prior art keywords
- error
- data
- peripheral
- memory
- peripheral device
- Prior art date
Links
Landscapes
- Debugging And Monitoring (AREA)
Abstract
一種錯誤狀態儲存方法,由一伺服器執行,該伺服器包括一處理單元、一基本輸入輸出系統、至少一個周邊裝置,及一至少包括一非揮發性記憶體的晶片組。該方法包含:該處理單元根據一周邊錯誤觸發,產生一處理中斷請求,該周邊錯誤觸發是相關於該至少一個周邊裝置的其中至少一者處於錯誤狀態;該基本輸入輸出系統根據該處理單元的處理中斷請求,記錄該至少一個周邊裝置的每一者的狀態資料,並進行儲存於該晶片組的非揮發性記憶體的程序,該狀態資料包括一裝置識別碼及一對應該裝置識別碼的狀態種類。藉由該錯誤狀態儲存方法,該至少一個周邊裝置的每一者的狀態資料可被保存,以幫助使用者找出該至少一個周邊裝置的其中至少一者發生錯誤的原因,以便能夠更快速地進行除錯。An error state storage method is performed by a server, the server comprising a processing unit, a basic input/output system, at least one peripheral device, and a chip set including at least one non-volatile memory. The method includes: the processing unit generates a processing interrupt request according to a peripheral error trigger, the peripheral error triggering is related to at least one of the at least one peripheral device being in an error state; the basic input output system is according to the processing unit Processing an interrupt request, recording status data of each of the at least one peripheral device, and performing a program stored in the non-volatile memory of the chip set, the status data including a device identification code and a pair of device identification codes Type of status. With the error state storage method, the status data of each of the at least one peripheral device can be saved to help the user find out the cause of the error of at least one of the at least one peripheral device, so that the error can be more quickly Troubleshoot.
Description
本發明是有關於一種儲存方法及設備,特別是指一種錯誤狀態儲存方法及伺服器。The invention relates to a storage method and device, in particular to an error state storage method and a server.
現有的伺服器在運行的過程中,可能發生的錯誤例如一周邊元件互連裝置(PCI device)錯誤,該周邊元件互連裝置例如為網路卡、圖形處理卡、視訊加速卡等,且該周邊元件互連裝置發生錯誤時會記錄一錯誤狀態於其狀態暫存器。為了提高系統可靠性,該伺服器的一基本輸入輸出系統通常組配以在這樣的錯誤出現時,執行一系統管理中斷(System Management Interruption)程式來讀取該周邊元件互連裝置的狀態暫存器並記錄該錯誤狀態。In the course of the operation of the existing server, an error such as a peripheral device interconnection device (PCI device) may be generated, and the peripheral component interconnection device is, for example, a network card, a graphics processing card, a video acceleration card, etc., and the like An error condition is recorded in its status register when an error occurs in the peripheral component interconnect. In order to improve system reliability, a basic input/output system of the server is usually configured to perform a system management interrupt (Program Management Interruption) program to read the state temporary storage of the peripheral component interconnection device when such an error occurs. And record the error status.
但是,若使用者需要維修檢錯時,恐無法僅透過該基本輸入輸出系統所記錄的相關於該發生錯誤的該周邊元件互連裝置的該錯誤狀態,一眼看出錯誤的原因,而必須憑檢修經驗判斷,或者費時地對所有周邊元件互連裝置逐一檢測,才能找到出錯的地方。此外,該基本輸入輸出系統的該系統管理中斷程式通常設計成在該基本輸入輸出系統記錄該錯誤狀態後即清除所有狀態暫存器的狀態資料,導致使用者難以對發生錯誤的該周邊元件互連裝置偵錯並診斷出原因。However, if the user needs to perform maintenance error detection, it may not be possible to see the cause of the error at a glance only through the error state of the peripheral component interconnection device recorded by the basic input/output system. Experience judgment, or time-consuming detection of all peripheral component interconnection devices one by one, in order to find the place of error. In addition, the system management interrupt program of the basic input/output system is generally designed to clear the state data of all state registers after the basic input/output system records the error state, thereby making it difficult for the user to interact with the peripheral components in which the error occurred. The device is debugging and diagnosing the cause.
因此,本發明之目的,即在提供一種錯誤狀態儲存方法。Accordingly, it is an object of the present invention to provide an error state storage method.
於是,本發明錯誤狀態儲存方法,該伺服器包括一處理單元、至少一個周邊裝置、一基本輸入輸出系統、一控制晶片組,及一電連接該控制晶片組的晶片組,及一電連接該晶片組的記憶體,該晶片組至少包括一非揮發性記憶體及一基板管理控制器,該錯誤狀態儲存方法包含一步驟(A)、一步驟(B)、一步驟(C),及一步驟(D)。Therefore, the error state storage method of the present invention comprises a processing unit, at least one peripheral device, a basic input/output system, a control chip set, and a chip set electrically connected to the control chip set, and an electrical connection. a memory of the chipset, the chipset including at least one non-volatile memory and a substrate management controller, the error state storage method comprising a step (A), a step (B), a step (C), and a Step (D).
該步驟(A)是該處理單元根據一周邊錯誤觸發,產生一處理中斷請求,該周邊錯誤觸發是相關於該至少一個周邊裝置的其中至少一者處於錯誤狀態。The step (A) is that the processing unit generates a processing interrupt request according to a peripheral error trigger, and the peripheral error trigger is related to at least one of the at least one peripheral device being in an error state.
該步驟(B)是該基本輸入輸出系統根據該處理單元的處理中斷請求,將該至少一個周邊裝置的每一者的狀態資料儲存到該記憶體,該狀態資料包括一裝置識別碼及一對應該裝置識別碼的狀態種類。The step (B) is that the basic input/output system stores the state data of each of the at least one peripheral device to the memory according to the processing interrupt request of the processing unit, the state data includes a device identification code and a pair The type of status of the identification code should be installed.
該步驟(C)該基本輸入輸出系統發送一資料已準備命令經由該控制晶片組至該晶片組的基板管理控制器,該資料已準備命令指示該至少一個周邊裝置的每一者的狀態資料已寫入該記憶體。The step (C), the basic input/output system sends a data ready command via the control chipset to the substrate management controller of the chipset, the data prepared command indicating that the status data of each of the at least one peripheral device has been Write to this memory.
該步驟(D)該基板管理控制器根據該資料已準備命令,以從該記憶體中存取該至少一個周邊裝置的每一者的狀態資料,並寫入該晶片組的非揮發性記憶體。In the step (D), the substrate management controller prepares a command according to the data, to access state data of each of the at least one peripheral device from the memory, and write the non-volatile memory of the chip set. .
本發明之另一個目的,即在於提供一種可以實施該錯誤狀態儲存方法的伺服器。Another object of the present invention is to provide a server that can implement the error state storage method.
於是,本發明伺服器包含一處理單元、至少一個周邊裝置、一基本輸入輸出系統、一電連接該基本輸入輸出系統的控制晶片組、一電連接該控制晶片組的晶片組,及一電連接該晶片組的記憶體。Thus, the server of the present invention comprises a processing unit, at least one peripheral device, a basic input/output system, a control chip set electrically connected to the basic input/output system, a chip set electrically connected to the control chip set, and an electrical connection The memory of the wafer set.
該晶片組至少包括一非揮發性記憶體,及一基板管理控制器。The chip set includes at least one non-volatile memory and a substrate management controller.
該處理單元根據一周邊錯誤觸發,產生一處理中斷請求,該周邊錯誤觸發是相關於該至少一個周邊裝置的其中至少一者處於錯誤狀態。The processing unit generates a processing interrupt request according to a peripheral error trigger, the peripheral error triggering being related to at least one of the at least one peripheral device being in an error state.
其中,該基本輸入輸出系統根據該處理單元的處理中斷請求,將該至少一個周邊裝置的每一者的狀態資料儲存到該記憶體,該狀態資料包括一裝置識別碼及一對應該裝置識別碼的狀態種類。The basic input/output system stores, according to the processing interrupt request of the processing unit, status data of each of the at least one peripheral device to the memory, the status data including a device identification code and a pair of device identification codes. The type of status.
其中,該基本輸入輸出系統發送一資料已準備命令經由該控制晶片組至該晶片組的基板管理控制器,該資料已準備命令指示該至少一個周邊裝置的每一者的狀態資料已寫入該記憶體。Wherein the basic input/output system sends a data ready command via the control chipset to the substrate management controller of the chipset, the data prepared command indicating that the status data of each of the at least one peripheral device has been written to the Memory.
其中,該基板管理控制器根據該資料已準備命令,以從該記憶體中存取該至少一個周邊裝置的每一者的狀態資料,並寫入該晶片組的非揮發性記憶體。The baseboard management controller prepares a command according to the data to access state data of each of the at least one peripheral device from the memory and write the non-volatile memory of the chipset.
本發明之功效在於:藉由在發現該至少一個周邊裝置的其中至少一者處於錯誤狀態時,觸發該基本輸入輸出系統將該至少一個周邊裝置的每一者的狀態資料儲存到該記憶體,並且該基板管理控制器從該記憶體存取該狀態資料並儲存到該非揮發性記憶體,該至少一個周邊裝置的每一者的狀態資料可被保存,來幫助使用者找出該至少一個周邊裝置的其中至少一者發生錯誤的原因,以便能夠更快速地進行除錯。The effect of the present invention is to trigger the basic input/output system to store state data of each of the at least one peripheral device to the memory by detecting that at least one of the at least one peripheral device is in an error state, And the substrate management controller accesses the status data from the memory and stores the status data into the non-volatile memory, and the status data of each of the at least one peripheral device can be saved to help the user find the at least one periphery. At least one of the devices has a cause of error so that debugging can be performed more quickly.
參閱圖1,本發明伺服器的一實施例,包含一周邊連接匯流排1、一周邊裝置2、一控制晶片組3、一晶片組4、一基本輸入輸出系統(Basic Input/Output System)5、一記憶體6,及一處理單元7。Referring to FIG. 1, an embodiment of a server of the present invention includes a peripheral connection busbar 1, a peripheral device 2, a control chipset 3, a chipset 4, and a basic input/output system (Basic Input/Output System). , a memory 6, and a processing unit 7.
該周邊連接匯流排1例如一周邊元件互連(Peripheral Component Interconnect, PCI)匯流排、或者例如是一高速周邊元件互連匯流排(Peripheral Component Interconnect express, PCIe)。The peripheral connection bus 1 is, for example, a Peripheral Component Interconnect (PCI) bus, or is, for example, a high-speed Peripheral Component Interconnect Express (PCIe).
該周邊裝置2為周邊元件互連裝置(PCI device),電連接該周邊連接匯流排1,且例如為網路卡、圖形處理晶片、視訊加速晶片等。本實施例為方便說明起見,僅在圖1中繪出一個周邊裝置2作為示意,但其數目也可以是兩個、三個等…,而不以所繪示者為限。該周邊裝置2具有一狀態暫存器(圖未示出),且在運作正常時,該周邊裝置2會記錄一正常運作狀態於該狀態暫存器,而在運作發生錯誤時,會記錄一錯誤狀態於該狀態暫存器。在此需說明的是,在本實施例中所提及的該錯誤狀態,是選自於一同位元錯誤(PERR)、一系統錯誤(SERR)、一可修正周邊元件互連錯誤(correctable PCI error)、一不可修正周邊元件互連錯誤(uncorrectable PCI error)、一致命周邊元件互連錯誤(fatal PCI error)的其中一者。The peripheral device 2 is a peripheral device interconnecting device (PCI device) electrically connected to the peripheral connection bus bar 1 and is, for example, a network card, a graphics processing chip, a video acceleration chip, or the like. For convenience of explanation, only one peripheral device 2 is illustrated in FIG. 1 as an illustration, but the number may also be two, three, etc., and is not limited to the one shown. The peripheral device 2 has a state register (not shown), and when the device is in normal operation, the peripheral device 2 records a normal operating state in the state register, and when an operation error occurs, a record is recorded. The error status is in this status register. It should be noted that the error state mentioned in this embodiment is selected from a common bit error (PERR), a system error (SERR), and a correctable peripheral component interconnection error (correctable PCI). Error), one of the uncorrectable PCI error, a fatal PCI error.
該晶片組4包括一周邊裝置41、一基板管理控制器(Baseboard Management Controller)42,及一非揮發性記憶體44,在本實施例中,該非揮發性記憶體44位於該晶片組4的該基板管理控制器42內,但並不以此為限。該周邊裝置41連接於該周邊連接匯流排1,且例如是一在板(on-board)的視訊圖形陣列(VGA)晶片,並具有一狀態暫存器(圖未示出),該周邊裝置41在運作正常時,會記錄該正常運作狀態於該狀態暫存器,而在運作發生錯誤時,會記錄該錯誤狀態於該狀態暫存器。該基板管理控制器42用以監看該伺服器的例如風扇轉速、供應電壓、系統溫度等運作情形,並且該基板管理控制器42的非揮發性記憶體44包括一系統事件日誌(system event log)43及一系統日誌(system log)45,該系統事件日誌43用以記錄一裝置識別碼(device ID)及一對應該裝置識別碼的錯誤狀態。該系統日誌(system log)用以記錄例如該周邊裝置2、該周邊裝置41的狀態資料,該狀態資料包括一裝置識別碼及一對應該裝置識別碼的狀態種類,且該狀態種類可為該正常運作狀態與該錯誤狀態的其中一者。The chip set 4 includes a peripheral device 41, a baseboard management controller 42 and a non-volatile memory 44. In the embodiment, the non-volatile memory 44 is located in the chip set 4. The substrate management controller 42 is internal, but not limited thereto. The peripheral device 41 is connected to the peripheral connection busbar 1, and is, for example, an on-board video graphics array (VGA) chip, and has a state register (not shown), the peripheral device When the operation is normal, the normal operation state is recorded in the state register, and when an operation error occurs, the error state is recorded in the state register. The baseboard management controller 42 is configured to monitor operating conditions of the server such as fan speed, supply voltage, system temperature, etc., and the non-volatile memory 44 of the baseboard management controller 42 includes a system event log (system event log) And a system log 45 for recording a device ID and a pair of device identifiers. The system log is used to record, for example, the status data of the peripheral device 2 and the peripheral device 41. The status data includes a device identification code and a status type of the pair of device identification codes, and the status category may be One of the normal operating state and the error state.
該控制晶片組3為一平台控制器中心(Platform Controller Hub, PCH),例如為一南橋晶片,且該控制晶片組3電連接該周邊連接匯流排1、該晶片組4與該基本輸入輸出系統5。當該控制晶片組3經由該周邊連接匯流排1,讀取該周邊裝置2的狀態暫存器及\或是該周邊裝置41的狀態暫存器,而得知該周邊裝置2及\或是該周邊裝置41處於該錯誤狀態時,可以產生一周邊錯誤觸發,以指示與該周邊連接匯流排1連接的任何一個周邊裝置處於該錯誤狀態。The control chipset 3 is a Platform Controller Hub (PCH), for example, a south bridge chip, and the control chip set 3 is electrically connected to the peripheral connection busbar 1, the chipset 4, and the basic input/output system. 5. When the control chip group 3 is connected to the bus bar 1 via the periphery, the state register of the peripheral device 2 and/or the state register of the peripheral device 41 are read, and the peripheral device 2 and/or When the peripheral device 41 is in the wrong state, a peripheral error trigger can be generated to indicate that any one of the peripheral devices connected to the peripheral connection busbar 1 is in the error state.
該記憶體6包括一狀態儲存區塊61,該狀態儲存區塊61為該記憶體6中分配給該晶片組4的該周邊裝置41(亦即該VGA晶片)存放資料並能供該基板管理控制器42存取。該記憶體6例如為一供該周邊裝置41(亦即該VGA晶片)存放資料的動態隨機存取記憶體(DRAM)、第二代雙倍資料率(DDR2)記憶體、或例如為一第三代雙倍資料率(DDR3)記憶體。The memory 6 includes a state storage block 61. The state storage block 61 stores data for the peripheral device 41 (that is, the VGA chip) allocated to the chip set 4 in the memory 6, and can be used for the substrate management. The controller 42 accesses. The memory 6 is, for example, a dynamic random access memory (DRAM) for storing data in the peripheral device 41 (that is, the VGA chip), a second generation double data rate (DDR2) memory, or, for example, a first Three generations of double data rate (DDR3) memory.
該處理單元7為該伺服器的中央處理單元,電連接該周邊連接匯流排1並包括一北橋晶片71,該北橋晶片71用於處理高速訊號,例如高速周邊元件互連(PCIe)介面信號等,還有負責與該控制晶片組3之間的通信。該北橋晶片71也有類似該控制晶片組3的功能,當其經由該周邊連接匯流排1,讀取該周邊裝置2及\或是該周邊裝置41的狀態暫存器,而得知該周邊裝置2及\或是該周邊裝置41處於該錯誤狀態時,可以產生一周邊錯誤觸發,以指示與該周邊連接匯流排1連接的任何一個周邊裝置處於該錯誤狀態。The processing unit 7 is a central processing unit of the server, electrically connected to the peripheral connection busbar 1 and includes a north bridge wafer 71 for processing high speed signals, such as high speed peripheral component interconnection (PCIe) interface signals, etc. Also responsible for communication with the control chipset 3. The north bridge wafer 71 also has a function similar to the control chip group 3, and when it is connected to the bus bar 1 via the periphery, the peripheral device 2 and/or the state register of the peripheral device 41 are read, and the peripheral device is known. 2 and \ or when the peripheral device 41 is in the wrong state, a peripheral error trigger can be generated to indicate that any peripheral device connected to the peripheral connection busbar 1 is in the error state.
該基本輸入輸出系統5電連接該控制晶片組3,並具有一系統管理中斷處理模組(system management interrupt handler)51。當該處理單元7接收到來自該控制晶片組3的周邊錯誤觸發,或者是來自該北橋晶片71的周邊錯誤觸發時,進入一系統管理模式(System Management Mode, SMM),並將系統控制權轉移到該基本輸入輸出系統5的系統管理中斷處理模組51,以由該系統管理中斷處理模組51判斷中斷產生的原因。該系統管理中斷處理模組51例如可為一程式。The basic input/output system 5 is electrically connected to the control chip set 3 and has a system management interrupt handler 51. When the processing unit 7 receives the peripheral error trigger from the control chip set 3, or the peripheral error trigger from the north bridge wafer 71, enters a system management mode (SMM), and transfers the system control right. The system management interrupt processing module 51 of the basic input/output system 5 determines the cause of the interruption by the system management interrupt processing module 51. The system management interrupt processing module 51 can be, for example, a program.
參閱圖2,本發明錯誤狀態儲存方法的一實施例,於圖1所示的該伺服器中執行,且該方法包含以下步驟。Referring to FIG. 2, an embodiment of the error state storage method of the present invention is executed in the server shown in FIG. 1, and the method includes the following steps.
首先,在步驟(A),該處理單元7根據一周邊錯誤觸發,產生一處理中斷請求。如前所述,該周邊錯誤觸發是在該周邊裝置2處於該錯誤狀態時,由該控制晶片組3所產生,或者是由該北橋晶片71所產生。該處理中斷請求為一系統管理中斷(System Management Interruption, SMI),此時,該處理單元7進入該系統管理模式(System Management Mode, SMM)並將系統控制權轉移到該基本輸入輸出系統5的系統管理中斷處理模組51。First, in step (A), the processing unit 7 generates a processing interrupt request based on a peripheral error trigger. As described above, the peripheral error trigger is generated by the control chip group 3 or generated by the north bridge wafer 71 when the peripheral device 2 is in the error state. The processing interrupt request is a System Management Interruption (SMI). At this time, the processing unit 7 enters the System Management Mode (SMM) and transfers system control rights to the basic input/output system 5. The system manages the interrupt processing module 51.
接著,在步驟(E),該基本輸入輸出系統5的系統管理中斷處理模組51回應於該處理單元7的處理中斷請求,記錄該周邊錯誤資料,該周邊錯誤資料即處於錯誤狀態的該周邊裝置的狀態資料,並且包括一裝置識別碼及一對應該裝置識別碼的錯誤狀態。例如,當該周邊裝置2運作異常時,該周邊錯誤資料包括該周邊裝置2的裝置識別碼,以及對應的該錯誤狀態,例如前述的該同位元錯誤、該系統錯誤、該可修正周邊元件互連錯誤、該不可修正周邊元件互連錯誤,及該致命周邊元件互連錯誤的其中一者。在本實施例中,該周邊錯誤資料的大小約為12 bytes。Next, in step (E), the system management interrupt processing module 51 of the basic input/output system 5 responds to the processing interrupt request of the processing unit 7, and records the peripheral error data, that is, the peripheral error data is in the periphery of the error state. The status data of the device, and includes a device identification code and an error status of a pair of device identification codes. For example, when the peripheral device 2 operates abnormally, the peripheral error data includes the device identification code of the peripheral device 2, and the corresponding error state, such as the aforementioned homotopic error, the system error, and the correctable peripheral component mutual. A fault, a non-correctable peripheral component interconnection error, and one of the fatal peripheral component interconnection errors. In this embodiment, the size of the peripheral error data is about 12 bytes.
接著,在步驟(F),該基本輸入輸出系統5的系統管理中斷處理模組51發送該周邊錯誤資料,經由該控制晶片組3至該晶片組4的基板管理控制器42。Next, in step (F), the system management interrupt processing module 51 of the basic input/output system 5 transmits the peripheral error data via the control wafer set 3 to the substrate management controller 42 of the wafer set 4.
接著,在步驟(G),該基板管理控制器42將該周邊錯誤資料寫入該非揮發性記憶體44的系統事件日誌43。Next, in step (G), the substrate management controller 42 writes the peripheral error data into the system event log 43 of the non-volatile memory 44.
接著,在步驟(B),該基本輸入輸出系統5的系統管理中斷處理模組51根據該處理中斷請求,將每一個周邊裝置的狀態資料儲存到該記憶體6。該步驟(B)包括以下子步驟。Next, in step (B), the system management interrupt processing module 51 of the basic input/output system 5 stores the status data of each peripheral device in the memory 6 based on the processing interrupt request. This step (B) includes the following sub-steps.
步驟(B0),該基本輸入輸出系統5的系統管理中斷處理模組51記錄每一個周邊裝置2的狀態資料。詳細來說,該基本輸入輸出系統5掃描該周邊連接匯流排1上連接的所有周邊裝置,以讀取每一個周邊裝置(本例中的該周邊裝置2、該周邊裝置41)的狀態資料,並且暫存於該處理單元7的例如一外接的動態隨機存取記憶體(DRAM)(圖未示出)。例如該周邊裝置2運作異常、該周邊裝置41為正常運作,該基本輸入輸出系統5所記錄的該狀態資料,包括該周邊裝置2的裝置識別碼與對應的該錯誤狀態,以及該周邊裝置41的裝置識別碼與對應的該正常運作狀態。In step (B0), the system management interrupt processing module 51 of the basic input/output system 5 records the status data of each peripheral device 2. In detail, the basic input/output system 5 scans all peripheral devices connected to the peripheral connection busbar 1 to read status data of each peripheral device (the peripheral device 2 and the peripheral device 41 in this example). And temporarily stored in the processing unit 7, for example, an external dynamic random access memory (DRAM) (not shown). For example, the peripheral device 2 operates abnormally, and the peripheral device 41 is in normal operation. The status data recorded by the basic input/output system 5 includes the device identification code of the peripheral device 2 and the corresponding error state, and the peripheral device 41. The device identification code corresponds to the normal operating state.
步驟(B1),該基本輸入輸出系統5的系統管理中斷處理模組51取得一記憶體位址資料,且該記憶體位址資料相關於該記憶體6中的該狀態儲存區塊61的位址。詳細來說,該記憶體位址資料儲存於該周邊裝置41的一記憶體映射暫存器(memory mapped I/O register,MMIO register)(圖未示出)中,且該記憶體位址資料是在開機時,由該基本輸入輸出系統利用記憶體位址映射技術分配給該周邊裝置41的位址,且該記憶體位址資料指示該記憶體6的狀態儲存區塊61的起始位址與存取範圍。In step (B1), the system management interrupt processing module 51 of the basic input/output system 5 obtains a memory address data, and the memory address data is related to the address of the state storage block 61 in the memory 6. In detail, the memory address data is stored in a memory mapped I/O register (MMIO register) (not shown) of the peripheral device 41, and the memory address data is At the time of power-on, the basic input/output system allocates the address of the peripheral device 41 by using the memory address mapping technology, and the memory address data indicates the start address and access of the state storage block 61 of the memory 6. range.
步驟(B2),該基本輸入輸出系統5的系統管理中斷處理模組51根據該記憶體位址資料,將該處理單元7的該外接的動態隨機存取記憶體(DRAM)(圖未示出)中所暫存的每一個周邊裝置(本例中的該周邊裝置2、該周邊裝置41)的狀態資料複製到該記憶體6的狀態儲存區塊61。較佳地,該基本輸入輸出系統5判斷該狀態資料的大小,是否在該記憶體位址資料所指示的該狀態儲存區塊61的存取範圍之內,如果在存取範圍內,則將該狀態資料一次存入該記憶體6的狀態儲存區塊61,如果超過該存取範圍,則分成多次存入該狀態儲存區塊61,每一次是存入符合該存取範圍的資料量。Step (B2), the system management interrupt processing module 51 of the basic input/output system 5 selects the external dynamic random access memory (DRAM) of the processing unit 7 according to the memory address data (not shown). The status data of each of the peripheral devices (the peripheral device 2 and the peripheral device 41 in this example) temporarily stored in the medium is copied to the state storage block 61 of the memory 6. Preferably, the basic input/output system 5 determines whether the size of the status data is within the access range of the status storage block 61 indicated by the memory address data, and if it is within the access range, The status data is once stored in the status storage block 61 of the memory 6. If the access range is exceeded, the status storage block 61 is stored in multiple times, and each time the amount of data conforming to the access range is stored.
接著,在步驟(C),該基本輸入輸出系統5的系統管理中斷處理模組51發送一資料已準備命令,經由該控制晶片組3至該基板管理控制器42,該資料已準備命令指示每一個周邊裝置2的狀態資料已寫入該記憶體6的狀態儲存區塊61中。Next, in step (C), the system management interrupt processing module 51 of the basic input/output system 5 sends a data prepared command via the control chipset 3 to the baseboard management controller 42, the data ready command indicating each The status data of a peripheral device 2 has been written into the state storage block 61 of the memory 6.
接著,在步驟(D),該晶片組4的基板管理控制器42根據該資料已準備命令,以從該記憶體6的該狀態儲存區塊61中存取每一個周邊裝置(本例中的該周邊裝置2、該周邊裝置41)的狀態資料,並寫入該非揮發性記憶體44的系統日誌45。較佳地,該基本輸入輸出系統5根據該狀態資料的大小判斷是否每一個周邊裝置(本例中的該周邊裝置2、該周邊裝置41)的狀態資料都已完全複製到該記憶體6的狀態儲存區塊61。若判斷的結果為否,則返回步驟(B2)將未複製完的該狀態資料存入該記憶體6的狀態儲存區塊61;若判斷的結果為是,則將該周邊裝置2、該周邊裝置41的狀態暫存器的狀態資料清除。Next, in step (D), the substrate management controller 42 of the wafer set 4 has prepared a command according to the data to access each peripheral device from the state storage block 61 of the memory 6 (in this example The status information of the peripheral device 2 and the peripheral device 41) is written into the system log 45 of the non-volatile memory 44. Preferably, the basic input/output system 5 determines whether the state data of each peripheral device (the peripheral device 2 and the peripheral device 41 in this example) has been completely copied to the memory 6 according to the size of the state data. Status storage block 61. If the result of the determination is no, returning to step (B2), the unrepeated state data is stored in the state storage block 61 of the memory 6. If the result of the determination is yes, the peripheral device 2, the periphery The status data of the status register of the device 41 is cleared.
由以上說明可知,本發明錯誤儲存方法,能在至少一個周邊裝置發生錯誤時,藉由該基本輸入輸出系統5的系統管理中斷處理模組51儲存該周邊錯誤資料至該基板管理控制器42的系統事件日誌43,以及儲存每一個周邊裝置的狀態資料至該記憶體6,並且該基板管理控制器42儲存每一個周邊裝置的狀態資料至該非揮發性記憶體44的系統日誌45,即能在該系統管理中斷處理模組51將每一個周邊裝置2的狀態暫存器的狀態資料清除之前,將該周邊錯誤資料及每一個周邊裝置的狀態資料予以保存,以在錯誤發生時協助使用者判斷是不是因為其它的周邊裝置發生問題而觸發的,以便能夠更快速地進行除錯,因此,確實可達到本發明之目的。It can be seen from the above description that the error storage method of the present invention can store the peripheral error data to the substrate management controller 42 by the system management interrupt processing module 51 of the basic input/output system 5 when an error occurs in at least one peripheral device. a system event log 43, and storing status data of each peripheral device to the memory 6, and the substrate management controller 42 stores the status data of each peripheral device to the system log 45 of the non-volatile memory 44, that is, The system management interrupt processing module 51 saves the peripheral error data and the status data of each peripheral device before the status data of each state device of the peripheral device 2 is cleared, to assist the user in determining the error when the error occurs. Whether it is triggered by a problem with other peripheral devices, so that debugging can be performed more quickly, and therefore, the object of the present invention can be achieved.
惟以上所述者,僅為本發明之較佳實施例而已,當不能以此限定本發明實施之範圍,凡是依本發明申請專利範圍及專利說明書內容所作之簡單的等效變化與修飾,皆仍屬本發明專利涵蓋之範圍內。The above is only the preferred embodiment of the present invention, and the scope of the present invention is not limited thereto, and the simple equivalent changes and modifications made by the scope of the patent application and the patent specification of the present invention are It is still within the scope of the invention patent.
1‧‧‧周邊連接匯流排1‧‧‧ peripheral connection bus
5‧‧‧基本輸入輸出系統5‧‧‧Basic input and output system
2‧‧‧周邊裝置2‧‧‧ peripheral devices
51‧‧‧系統管理中斷處理模組51‧‧‧System Management Interruption Processing Module
3‧‧‧控制晶片組3‧‧‧Control chipset
6‧‧‧記憶體6‧‧‧ memory
4‧‧‧晶片組4‧‧‧ Chipset
61‧‧‧狀態儲存區塊61‧‧‧ State storage block
41‧‧‧周邊裝置41‧‧‧ Peripheral devices
7‧‧‧處理單元7‧‧‧Processing unit
42‧‧‧基板管理控制器42‧‧‧Base Management Controller
71‧‧‧北橋晶片71‧‧‧ North Bridge Chip
43‧‧‧系統事件日誌43‧‧‧System event log
A~G‧‧‧步驟A~G‧‧‧ steps
44‧‧‧非揮發性記憶體44‧‧‧ Non-volatile memory
B0~B2‧‧‧子步驟B0~B2‧‧‧ substeps
45‧‧‧系統日誌45‧‧‧System log
本發明之其他的特徵及功效,將於參照圖式的實施方式中清楚地呈現,其中: 圖1是一方塊圖,說明本發明伺服器的一實施例;及 圖2是一流程圖,說明本發明錯誤狀態儲存方法的一實施例。Other features and advantages of the present invention will be apparent from the embodiments of the present invention, wherein: Figure 1 is a block diagram illustrating an embodiment of the server of the present invention; and Figure 2 is a flow chart illustrating An embodiment of the error state storage method of the present invention.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW105111184A TWI654518B (en) | 2016-04-11 | 2016-04-11 | Method for storing error status information and server using the same |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW105111184A TWI654518B (en) | 2016-04-11 | 2016-04-11 | Method for storing error status information and server using the same |
Publications (2)
Publication Number | Publication Date |
---|---|
TW201737086A TW201737086A (en) | 2017-10-16 |
TWI654518B true TWI654518B (en) | 2019-03-21 |
Family
ID=61021825
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW105111184A TWI654518B (en) | 2016-04-11 | 2016-04-11 | Method for storing error status information and server using the same |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI654518B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090049221A1 (en) | 2007-08-14 | 2009-02-19 | Dell Products, Lp | System and method of obtaining error data within an information handling system |
TWI337707B (en) | 2005-10-14 | 2011-02-21 | Dell Products Lp | System and method for logging recoverable errors |
-
2016
- 2016-04-11 TW TW105111184A patent/TWI654518B/en not_active IP Right Cessation
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI337707B (en) | 2005-10-14 | 2011-02-21 | Dell Products Lp | System and method for logging recoverable errors |
US20090049221A1 (en) | 2007-08-14 | 2009-02-19 | Dell Products, Lp | System and method of obtaining error data within an information handling system |
Also Published As
Publication number | Publication date |
---|---|
TW201737086A (en) | 2017-10-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10824499B2 (en) | Memory system architectures using a separate system control path or channel for processing error information | |
TWI605459B (en) | Dynamic application of ecc based on error type | |
CN105589762B (en) | Memory device, memory module and method for error correction | |
KR102350538B1 (en) | DDR memory error recovery | |
US6615374B1 (en) | First and next error identification for integrated circuit devices | |
TWI553650B (en) | Method, apparatus and system for handling data error events with a memory controller | |
KR102378466B1 (en) | Memory devices and modules | |
CN109155146A (en) | Prosthetic device after integral type encapsulation | |
US11282584B2 (en) | Multi-chip package and method of testing the same | |
US11797369B2 (en) | Error reporting for non-volatile memory modules | |
CN113568777B (en) | Fault processing method, device, network chip, equipment and storage medium | |
US10911259B1 (en) | Server with master-slave architecture and method for reading and writing information thereof | |
TWI654518B (en) | Method for storing error status information and server using the same | |
CN107451028A (en) | Error condition storage method and server | |
JP4299634B2 (en) | Information processing apparatus and clock abnormality detection program for information processing apparatus | |
TWI733964B (en) | System for testing whole memory and method thereof | |
CN108231134B (en) | RAM yield remediation method and device | |
WO2019169615A1 (en) | Method for accessing code sram, and electronic device | |
JP2003022222A (en) | Information processor and its maintenance method | |
JP7400015B2 (en) | Data storage device with data verification circuit | |
JP2002100979A (en) | Information processor and error information holding method for information processor | |
CN116414619A (en) | Computer system and method executed in computer system | |
CN117581211A (en) | In-system mitigation of uncorrectable errors based on confidence factors, based on fault perception analysis | |
CN117707884A (en) | Method, system, equipment and medium for monitoring power management chip | |
TW202324096A (en) | Storage device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MM4A | Annulment or lapse of patent due to non-payment of fees |