CN107451035B - Error state data providing method for computer device - Google Patents

Error state data providing method for computer device Download PDF

Info

Publication number
CN107451035B
CN107451035B CN201610378723.4A CN201610378723A CN107451035B CN 107451035 B CN107451035 B CN 107451035B CN 201610378723 A CN201610378723 A CN 201610378723A CN 107451035 B CN107451035 B CN 107451035B
Authority
CN
China
Prior art keywords
status data
error
error status
computer device
control system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610378723.4A
Other languages
Chinese (zh)
Other versions
CN107451035A (en
Inventor
郭明义
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shencloud Technology Co Ltd
Shunda Computer Factory Co Ltd
Original Assignee
Shencloud Technology Co Ltd
Shunda Computer Factory Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shencloud Technology Co Ltd, Shunda Computer Factory Co Ltd filed Critical Shencloud Technology Co Ltd
Priority to CN201610378723.4A priority Critical patent/CN107451035B/en
Publication of CN107451035A publication Critical patent/CN107451035A/en
Application granted granted Critical
Publication of CN107451035B publication Critical patent/CN107451035B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display

Abstract

A method for providing error status data is implemented by a baseboard management control system included in a computer device, the computer device further includes a central processing unit electrically connected to the baseboard management control system, the method for providing error status data includes the following steps: (A) reading and storing the error status data stored in the CPU; (B) determining whether the error status data contains at least one of a plurality of specified errors; (C) continuing to execute step (A) when the error status data is determined not to contain the at least one specific error; and (D) when the error status data is determined to contain the at least one specific error, transmitting the error status data previously stored in step (A) to a user terminal after receiving a data request from the user terminal.

Description

Error state data providing method for computer device
Technical Field
The present invention relates to error status data of a computer device, and more particularly, to a method for providing error status data of a computer device.
Background
A computer device currently used as a server usually includes a baseboard management controller (bmc) system, which is used to provide error status data of the computer device to assist an administrator in managing the computer device.
When the BMC receives an error notification such as a fatal error (CATERR) notification from the CPU, the BMC reads the error status data stored in an internal register of a CPU of the computer device. In practice, however, the computer device is restarted upon occurrence of an abnormality, such as a fatal error (CATERR), thereby clearing the cpu of the error status data corresponding to the abnormality. It should be noted that the operation of the bmc system is not affected when the computer device is restarted. Therefore, if the cpu receives the error notification from the bmc after the computer device is restarted, the error status data, such as Machine Check Architecture error status (mac error status) data, stored in the internal registers of the cpu read and stored by the bmc does not correspond to the error status data when the abnormal condition occurs, but corresponds to the error status data after the computer device is restarted, and therefore, the administrator of the computer device may not correctly analyze the cause of the abnormal condition of the computer device according to the error status data provided by the bmc.
Disclosure of Invention
Therefore, an object of the present invention is to provide an error status data providing method.
To achieve the above object, the method for providing error status data according to the present invention is implemented by a baseboard management control system included in a computer device, the computer device further includes a central processing unit electrically connected to the baseboard management control system, the method for providing error status data includes the following steps:
(A) reading and storing the error status data stored in the CPU;
(B) determining whether the error status data contains at least one of a plurality of specified errors;
(C) continuing to execute step (A) when the error status data is determined not to contain the at least one specific error; and
(D) when it is determined that the error status data includes the at least one specific error, the error status data previously stored in step (a) is transmitted to a user upon receiving a data request from the user.
Compared with the prior art, the error state data providing method of the invention reads and stores the error state data when the computer device generates the at least one specific error by the baseboard management control system, and transmits the error state data when the computer device generates the at least one specific error to the user terminal after receiving the data request from the user terminal under the condition that the error state data contains the at least one specific error, so as to prevent the error state data when the computer device generates the at least one specific error from being cleared before being stored by the baseboard management control system, and further enable a manager to analyze the reason of the error of the computer device according to the error data received by the user terminal.
[ description of the drawings ]
Other features and advantages of the present invention will become apparent from the following detailed description of the preferred embodiments with reference to the accompanying drawings, in which:
fig. 1 is a block diagram illustrating a bmc system included in a computer device for executing an embodiment of the error status data providing method of the invention is electrically connected to a cpu included in the computer device and connected to a user terminal via a communication network.
FIG. 2 is a flow chart illustrating an embodiment of the error status data providing method of the present invention.
FIG. 3 is a flow chart illustrating another embodiment of the error status data providing method of the present invention.
[ detailed description ] embodiments
Referring to fig. 1, the embodiment of the error status data providing method of the present invention is implemented by a baseboard management control system 11 included in a computer device 1. The bmc 11 is connected to a user terminal 2 via a communication network 100. The computer device 1 further includes a central processing unit 12 electrically connected to the baseboard management control system 11, in this embodiment, the computer device 1 is, for example, a server, the baseboard management control system 11 includes, for example, a non-volatile memory module 111, a communication module 112 connected to the communication network 100, and a processing module 113 electrically connected to the non-volatile memory module 111 and the communication module 112, and the central processing unit 12 is, for example, a processor manufactured by Intel corporation.
Referring to fig. 1 and 2, an embodiment of the error status data providing method of the present invention includes the following steps.
In step 31, the processing module 113 of the bmc 11 reads and stores the error status data stored in the internal register (not shown) of the cpu 12 through a Platform Environment Control Interface (PECI), wherein the error status data is related to the computer device 1. In this embodiment, the error status data includes machine check architecture error status data. In addition, the processing module 113 of the bmc 11 stores the error status data by updating the previous error status data previously stored in the nonvolatile memory module 111 to the currently read error status data.
In step 32, the processing module 113 of the bmc 11 determines whether the error status data (i.e., the error status data stored in the non-volatile memory module 111) stored therein contains at least one of a plurality of specific errors. The specific errors are at least one of error types consistent with a hit error (CATERR), an Uncorrectable peripheral interface error (Uncorrectable PCI error), a hit peripheral interface error (total PCI error), a system management interrupt timeout (SMI timeout), a Parity Error (PERR), and a System Error (SERR). When it is determined that the error status data includes the at least one specific error, the flow proceeds to step 33. Otherwise, the flow proceeds to step 34.
In step 33, after the processing module 113 of the bmc 11 receives a data request from the user 2 via the communication module 112, the processing module 113 of the bmc 11 transmits the error status data previously stored in step 31 to the user 2 via the communication module 112. In another embodiment of the present invention, the method for providing error status data further comprises a step 30 (see fig. 3) before the step 31, in which the processing module 113 of the bmc 11 determines whether the computer device 1 has been restarted within a reference time period before a current time. When it is determined that the computer device 1 has been restarted within the reference time period before the current time, the flow proceeds to step 33. Otherwise, the flow proceeds to step 31.
In step 34, after the processing module 113 of the bmc 11 counts a default time period, the process proceeds to step 31.
It should be noted that, in actual use, when the conventional bmc 11 detects that the computer device 1 is not operating normally, it stores a System Event Log (SEL) related to the computer device 1 to assist the administrator to know the reason of the abnormal operation of the computer device 1. However, in addition to the system event log, the administrator must also refer to information such as machine check architecture error status data to understand the reason of the abnormal operation of the computing device 1. When the error status data contains the at least one specific error, the computer device 1 will not function normally, and the system event log will contain some abnormal information, when the administrator knows from the system event log that the computer device 1 is not operating properly due to some abnormal information, the administrator will use the client 2 to issue the data request for the error status data of the non-volatile memory module 111 stored in the baseboard management control system 11 to the baseboard management control system 11, the bmc 11 returns the error status data stored in the non-volatile memory module 111 inside the bmc 11 according to the data request, the administrator thereby obtains the error status data stored by the bmc 11 when the at least one specific error occurs.
After the administrator uses the client 2 to issue the data request and obtains the error status data stored by the bmc 11 when the at least one specific error occurs, the bmc 11 will continue to execute steps 31-32 (see fig. 2) or 30-32 (see fig. 3), in other words, before the bmc 11 receives the data request of the error status data stored in the non-volatile memory module 111 inside the bmc 11, the bmc 11 will not periodically read any error status data from the register inside the cpu 12 or store the error status data stored in the register inside the cpu 12 into the non-volatile memory module 111 inside the bmc 11, so as to prevent the error status data stored in the non-volatile memory module 111 inside the bmc 11 when the at least one specific error occurs from being overwritten or destroyed in the storage process.
In the case that the error status data does not contain the at least one specific error, the bmc 11 counts the default time period, e.g., 50ms later, the steps 31 to 32 (see FIG. 2) or the steps 30 to 32 (see FIG. 3) are continuously executed, since it takes more than 50ms for the computer device 1 to be rebooted due to the occurrence of the at least one specific error, therefore, the time period required for clearing the error status data including the at least one specific error in response to the reboot of the computer device 1 also exceeds 50ms, so that the bmc 11 automatically reads the error status data stored in the internal registers of the cpu 12 by periodically reading the error status data every 50ms, thereby, it is avoided that the error status data when the at least one specific error occurs is cleared before being stored by the bmc 11. Furthermore, since the baseboard management control system 11 is stored by the baseboard management control system 11 when the at least one specific error occurs and after the error status data has been transmitted to the user 2, will proceed to steps 31-32 (see figure 2) or steps 30-32 (see figure 3) ", i.e., before the error status data stored by the bmc 11 is transmitted to the user 2 when the at least one specific error occurs, the processing module 113 of the baseboard management control system 11 does not execute the steps 31-32 (see fig. 2) or the steps 30-32 (see fig. 3), so that, the error status data stored in the non-volatile memory module 111 inside the bmc 11 when the at least one specific error occurs can be prevented from being overwritten or damaged by the error status data subsequently read by the bmc 11. In other embodiments of the present invention, in addition to the bmc 11 periodically reading the error status data stored in the registers inside the cpu 12 to obtain the error status data, when the bmc 11 receives an error notification such as a fatal error (CATERR) notification from the cpu 12, the bmc 11 also reads the error status data stored in the registers inside the cpu 12 to obtain the error status data.
In summary, the method for providing error status data according to the present invention is provided, wherein the bmc 11 periodically reads and stores the error status data of the buffer inside the cpu 12, and when the error status data includes the at least one specific error, the bmc 11 temporarily stops the operation of periodically reading and storing any error status data from the buffer inside the cpu 12, and after receiving the data request from the user 2, transmits the error status data when the at least one specific error occurs to the user 2, so as to ensure that the administrator can obtain the error status data when the at least one specific error occurs, thereby achieving the objective of the present invention.
The above-mentioned embodiments and drawings are only preferred embodiments of the present invention, but not intended to limit the scope of the invention, and all equivalent changes and modifications made by the claims of the present invention should fall within the scope of the present invention.

Claims (6)

1. A method for providing error status data, implemented by a baseboard management control system included in a computer device, the computer device further including a central processing unit electrically connected to the baseboard management control system, the method comprising:
(F) the baseboard management control system determines whether the computer device is restarted within a reference time period before a current time;
(G) when it is determined that the computer device has been restarted within the reference time period before the current time, the process proceeds to step D; and
(H) when it is determined that the computer device has not been restarted within the reference time period prior to the current time, step (a) is performed;
(A) reading and storing the error status data stored in the CPU;
(B) determining whether the error status data contains at least one of a plurality of specified errors;
(C) continuing to execute step (A) when the error status data is determined not to contain the at least one specific error; and
(D) when the error status data is determined to contain the at least one specific error, the baseboard management control system transmits the error status data previously stored in the step (A) to a user terminal after receiving a data request from the user terminal, and the baseboard management control system does not repeatedly execute the steps (A) to (B) before transmitting the error status data previously stored in the step (A) to the user terminal.
2. The method of claim 1, wherein in step (C), when the error status data is determined not to contain the at least one specific error, the BMC counts a predetermined time period and repeats steps (A) through (B) once.
3. The method as claimed in claim 1, wherein the baseboard management control system comprises a nonvolatile memory module for storing the error status data, wherein in step (A), the baseboard management control system updates the previous error status data previously stored in the nonvolatile memory module to the error status data read in step (A) to store the error status data.
4. The method of claim 1, further comprising a step (E) after step (D), and repeating steps (A) through (B).
5. The error status data providing method according to claim 1,
in step (a), the error status data contains machine check architecture error status data; and
in step (B), the plurality of specific errors includes at least one of a fatal error, an uncorrectable PCI error, a hit PCI error, a SMI timeout, a parity error, and an error type of a system error.
6. The method of claim 1, wherein in step (A), the BMC reads the error status data of the CPU through a PCI (platform environment control) interface.
CN201610378723.4A 2016-05-31 2016-05-31 Error state data providing method for computer device Active CN107451035B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610378723.4A CN107451035B (en) 2016-05-31 2016-05-31 Error state data providing method for computer device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610378723.4A CN107451035B (en) 2016-05-31 2016-05-31 Error state data providing method for computer device

Publications (2)

Publication Number Publication Date
CN107451035A CN107451035A (en) 2017-12-08
CN107451035B true CN107451035B (en) 2020-11-10

Family

ID=60485919

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610378723.4A Active CN107451035B (en) 2016-05-31 2016-05-31 Error state data providing method for computer device

Country Status (1)

Country Link
CN (1) CN107451035B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201351133A (en) * 2012-06-13 2013-12-16 Hon Hai Prec Ind Co Ltd Method and system for reading system event
TW201423390A (en) * 2012-12-06 2014-06-16 Inventec Corp Computer system and operating method thereof
CN104424068A (en) * 2013-08-29 2015-03-18 鸿富锦精密工业(深圳)有限公司 System and method for pressure testing of firmware update
TWI512490B (en) * 2014-10-27 2015-12-11 Quanta Comp Inc System for retrieving console messages and method thereof and non-transitory computer-readable medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201351133A (en) * 2012-06-13 2013-12-16 Hon Hai Prec Ind Co Ltd Method and system for reading system event
TW201423390A (en) * 2012-12-06 2014-06-16 Inventec Corp Computer system and operating method thereof
CN104424068A (en) * 2013-08-29 2015-03-18 鸿富锦精密工业(深圳)有限公司 System and method for pressure testing of firmware update
TWI512490B (en) * 2014-10-27 2015-12-11 Quanta Comp Inc System for retrieving console messages and method thereof and non-transitory computer-readable medium

Also Published As

Publication number Publication date
CN107451035A (en) 2017-12-08

Similar Documents

Publication Publication Date Title
US8909978B2 (en) Remote access diagnostic mechanism for communication devices
US7197634B2 (en) System and method for updating device firmware
US8468389B2 (en) Firmware recovery system and method of baseboard management controller of computing device
US10275330B2 (en) Computer readable non-transitory recording medium storing pseudo failure generation program, generation method, and generation apparatus
US10896087B2 (en) System for configurable error handling
US11314866B2 (en) System and method for runtime firmware verification, recovery, and repair in an information handling system
CN108932249B (en) Method and device for managing file system
US11860718B2 (en) Register reading method and apparatus, device, and medium
US11953976B2 (en) Detecting and recovering from fatal storage errors
US20140143597A1 (en) Computer system and operating method thereof
TWI518680B (en) Method for maintaining file system of computer system
US20030154339A1 (en) System and method for interface isolation and operating system notification during bus errors
CN107451035B (en) Error state data providing method for computer device
CN110471814B (en) Control method for error reporting function of server device
US9176806B2 (en) Computer and memory inspection method
TWI602054B (en) Method of providing error status data for computer device
TWI757606B (en) Server device and communication method between baseboard management controller and programmable logic unit thereof
US11797368B2 (en) Attributing errors to input/output peripheral drivers
US20240012651A1 (en) Enhanced service operating system capabilities through embedded controller system health state tracking
US20230055136A1 (en) Systems and methods to flush data in persistent memory region to non-volatile memory using auxiliary processor
CN116302641A (en) Fault memory isolation method and device and electronic equipment
CN117170921A (en) Device correctable error processing method and device, computer device and storage medium
CN112346922A (en) Server device and communication protocol method thereof
CN115827027A (en) Data processing method, data processing device, storage medium and electronic equipment
CN112084049A (en) Method for monitoring resident program of baseboard management controller

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant