CN112256467B - Error type judging system and method thereof - Google Patents

Error type judging system and method thereof Download PDF

Info

Publication number
CN112256467B
CN112256467B CN202011146802.5A CN202011146802A CN112256467B CN 112256467 B CN112256467 B CN 112256467B CN 202011146802 A CN202011146802 A CN 202011146802A CN 112256467 B CN112256467 B CN 112256467B
Authority
CN
China
Prior art keywords
error type
error
information
type information
processing unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011146802.5A
Other languages
Chinese (zh)
Other versions
CN112256467A (en
Inventor
于宝在
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inventec Pudong Technology Corp
Inventec Corp
Original Assignee
Inventec Pudong Technology Corp
Inventec Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inventec Pudong Technology Corp, Inventec Corp filed Critical Inventec Pudong Technology Corp
Priority to CN202011146802.5A priority Critical patent/CN112256467B/en
Publication of CN112256467A publication Critical patent/CN112256467A/en
Application granted granted Critical
Publication of CN112256467B publication Critical patent/CN112256467B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0745Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in an input/output transactions management context
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides an error type judging system and a method thereof, wherein the error type judging system comprises a servo operation unit, a basic input and output system, a logic processing module and a management control module. The servo operation unit sends out an error occurrence signal containing first error information when executing an execution program and generating errors. When the basic input and output system receives the error occurrence signal, the corresponding error type information is found out according to an error type comparison table and is defined as corresponding error type information so as to transmit an error type signal containing the corresponding error type information. When the logic processing module receives the error type signal, the corresponding error type information is analyzed, the corresponding error type information is stored, and the corresponding error type information is transmitted to the management control module.

Description

Error type judging system and method thereof
Technical Field
The present invention relates to an error type determination system and method thereof, and more particularly, to an error type determination system and method thereof for determining an error type when an execution error occurs.
Background
With the advance of network technology, a server plays an indispensable role in the life of people, and generally speaking, in the operation process of a system of the server, no matter the system enters a power-on state from a power-off state, returns to a working state from a sleep state or a normal operation state, an Uncorrectable Error (UCE Error) is generated with a certain probability, thereby causing the problem of the computer crash.
In the prior art, since the personnel in the production line cannot accurately determine the type of the uncorrectable error, and thus cannot effectively know what reason causes the uncorrectable error, the personnel in the department of charge of the Basic Input/Output System (BIOS), the personnel in the department of charge of the substrate Management Controller (BMC), and the personnel in the department of charge of other hardware need to go to the production line for debugging (debug), which causes time waste of all personnel and affects the production efficiency, and thus the prior art still has room for improvement.
Disclosure of Invention
In view of the problems of inconvenience of a large number of personnel and reduction of production efficiency caused by the fact that field personnel cannot determine the type of an error when the existing server generates the error in the operation process in the prior art, the invention provides an error type judgment system and a method thereof for judging the type of the error when the error occurs in execution, so as to solve the problems in the prior art.
To achieve the above and other related objects, a first aspect of the present invention provides an error type determination system, comprising: the servo operation unit is used for sending an error occurrence signal containing first error information when executing an execution program and generating the first error information; a Basic Input/Output System (BIOS), comprising: a first storage unit for storing an error type comparison table containing a plurality of second error messages and a plurality of error type messages corresponding to the second error messages respectively; and a first processing unit, electrically connected to the first storage unit, communicatively connected to the servo operation unit, for finding out the second error information corresponding to the first error information according to the error type comparison table when receiving the error occurrence signal, finding out the corresponding error type information according to the error type comparison table and the corresponding second error information, and defining the corresponding error type information as a corresponding error type information, so as to transmit an error type signal containing the corresponding error type information through an information transmission protocol; a logic processing module, comprising: a second storage unit; the second processing unit is electrically connected with the second storage unit, is in communication connection with the first processing unit, and is used for analyzing the corresponding error type information when receiving the error type signal, storing the corresponding error type information in the second storage unit and transmitting the corresponding error type information; and a management control module, which is connected with the second processing unit in a communication way and is used for receiving the corresponding error type information.
In an embodiment of the first aspect, the management control module is further configured to transmit an error type checking instruction to the second processing unit to trigger the second processing unit to transmit the corresponding error type information to the management control module.
In an embodiment of the first aspect, the Logic processing module is a Complex Programmable Logic Device (CPLD).
In an embodiment of the first aspect, the Management control module is a Baseboard Management Controller (BMC).
In an embodiment of the first aspect, the information transmission protocol is a Serial General Purpose Input/Output (SGPIO) protocol.
A second aspect of the present invention provides an error type determination method implemented by the error type determination system of the first aspect of the present invention, and including the steps of: (a) utilizing the first processing unit of the BIOS to determine whether the error occurrence signal containing the first error information sent by the servo operation unit executing the execution program and generating the first error information is received; (b) when the determination result in the step (a) is yes, finding out the second error information corresponding to the first error information by using the first processing unit of the bios according to the error type comparison table, finding out the corresponding error type information according to the error type comparison table and the corresponding second error information, defining the corresponding error type information as the corresponding error type information, and transmitting the error type signal including the corresponding error type information through the information transmission protocol; (c) receiving the error type signal by using the second processing unit of the logic processing module, analyzing the corresponding error type information, storing the corresponding error type information in the second storage unit, and transmitting the corresponding error type information; receiving the corresponding error type information by using the management control module to display the corresponding error type information; and (c) when the judgment result of the step (a) is negative, repeatedly executing the step (a).
In an embodiment of the second aspect, the Logic processing module is a Complex Programmable Logic Device (CPLD).
In an embodiment of the second aspect, the Management control module is a Baseboard Management Controller (BMC).
In an embodiment of the second aspect, the information transmission protocol is a serial general Purpose Input/Output (SGPIO) protocol.
As described above, after the error type determination system and method provided by the present invention are adopted, since the error type comparison table is established in the bios in advance, when an error occurs during the operation of the server, the bios can immediately identify what type of error occurs, and can directly transmit the type of error to the logic processing module to trigger the logic processing module to store and transmit the type of error to the management control module, so that field personnel can know the type of error occurring through the management control module, thereby quickly finding the source of the error, effectively reducing inconvenience of other personnel, and effectively improving the processing efficiency when an error occurs.
Drawings
FIG. 1 is a block diagram of an error type determination system according to a preferred embodiment of the present invention.
Fig. 2 is a flowchart illustrating a method for determining an error type according to a preferred embodiment of the invention.
Description of the element reference numerals
1 error type judging system
11 servo operation unit
12 basic input-output system
121 first storage unit
1211 error type comparison table
122 first processing unit
13 logic processing module
131 second storage unit
1311 corresponding error type information
132 second processing unit
14 management control module
S1 error occurrence signal
S2 error type signal
S3 error notification signal
S4 error type checking instruction
S101 to S104
Detailed Description
The following describes in more detail embodiments of the present invention with reference to the schematic drawings. The advantages and features of the present invention will become apparent from the following description and from the claims. It is to be noted that the drawings are in a very simplified form and are not to scale, which is intended merely for the purpose of facilitating and distinctly claiming the embodiments of the present invention.
Referring to fig. 1, fig. 1 is a block diagram illustrating an error type determination system according to a preferred embodiment of the invention. As shown in the drawings, the error type determination System 1 of the present invention includes a servo operation unit 11, a Basic Input/Output System (BIOS) 12, a logic processing module 13, and a management control module 14. In the preferred embodiment of the present invention, the error type determination system 1 is applied to a server (not shown), and the servo operation Unit 11 may be, for example, a Central Processing Unit (CPU), a Microcontroller (MCU) or other processor with Processing function in the server, or a Processing module in the following basic input/output system 12, which is specifically designed in practice.
The bios 12 includes a first storage unit 121 and a first processing unit 122, the first storage unit 121 may be, for example, a conventional memory having a function of storing data, and the first storage unit 121 stores an error type comparison table 1211 including a plurality of second error information and a plurality of error type information corresponding to each of the second error information.
For example, the first storage unit 121 of the preferred embodiment of the present invention stores the second error information in a bit manner, so the second error information is, for example, 0x10000000, 0x20000000 and 0x 40000000. In addition, the Error type information includes, for example, a Multi Bit ECC Memory Error (multibit ECC Memory Error), a Parity Error (PERR) and a System Error (SERR), but is not limited thereto in other embodiments. In addition, the corresponding relationship stored in the error type comparison table 1211 according to the preferred embodiment of the present invention can be, for example, the following table.
Table 1 error type comparison table
Error type information Second error information
Multi-bit corrected error memory errors 0x10000000
Parity error checking for normal data 0x20000000
System error 0x40000000
The first processing unit 122 can be, for example, an existing processor with processing function, electrically connected to the first storage unit 121, and communicatively connected to the servo operation unit 11, and in addition, the first processing unit 122 can also be integrated with the servo operation unit 11 as the above-mentioned processing module and disposed in the bios 12, which depends on the practical design. The communication links described in the preferred embodiment of the present invention are all wired communication links, and in other embodiments, the communication links may be wireless communication links, depending on the actual design.
The Logic processing module 13 can be, for example, a Complex Programmable Logic Device (CPLD). The logic processing module 13 includes a second storage unit 131 and a second processing unit 132, and the second storage unit 131 may be, for example, an existing memory having a function of storing data. The second processing unit 132 can be, for example, a conventional processor with processing function, and is electrically connected to the second storage unit 131 and communicatively connected to the first processing unit 122.
The Management control module 14 is, for example, a Baseboard Management Controller (BMC), and is communicatively connected to the second processing unit 132.
The servo operation unit 11 is used for sending an error occurrence signal S1 containing a first error message when an execution program is executed and the first error message occurs. The execution program is, for example, a boot program, an operating system program, or other operating programs, and in a preferred embodiment of the present invention, the first error information is, for example, 0x 40000000.
When receiving the error occurrence signal S1, the first processing unit 122 finds one of the second error information corresponding to the first error information according to the error type comparison table 1211 (i.e., finds the corresponding 0x40000000 according to the preferred embodiment of the present invention), finds one of the corresponding error type information according to the error type comparison table 1211 and the corresponding one of the second error information (i.e., finds the 0x40000000 corresponding to the system error in the error type information according to the preferred embodiment of the present invention), and defines the corresponding one of the error type information as a corresponding error type information 1311 (i.e., defines the system error as a corresponding error type information 1311), so as to transmit an error type signal S2 including the corresponding error type information 1311 through an information transmission protocol.
The information transmission protocol is, for example, a Serial General Purpose Input/Output (SGPIO) protocol, i.e., the first storage unit stores bits to adapt to the transmission method of the Serial General Purpose Input/Output protocol, and more specifically, the basic Input/Output system 12 is connected to the logic processing module 13 through the pins of the Serial General Purpose Input/Output system 12 in the preferred embodiment of the present invention, and the error type signal S2 is transmitted to the second processing unit 132 of the logic processing module 13 through the pins of the Serial General Purpose Input/Output system 12 in the preferred embodiment of the present invention.
When receiving the error type signal S2, the second processing unit 132 analyzes the corresponding error type information 1311, thereby knowing that the error type information of the current error is a system error, and stores the corresponding error type information 1311 in the second storage unit 131. In addition, the second processing unit 132 transmits the corresponding error type information 1311 to the management control module 14 in a manner that an error notification signal S3 containing the corresponding error type information 1311 is transmitted to the management control module 14.
After receiving the error notification signal S3, the management control module 14 can analyze the corresponding error type information 1311, and further display the corresponding error type information 1311 through the display device, that is, the on-site production line personnel can know what the error type of the error occurred this time through the management control module 14 in real time.
In addition, in the preferred embodiment of the present invention, the management control module 14 first transmits an error type check command S4 to the second processing unit 132, and the second processing unit 132 further transmits an error notification signal S3 containing the corresponding error type information 1311 to the management control module 14. That is, in the preferred embodiment of the present invention, the on-site production line personnel can trigger the supervisory control module 14 to send the error type check command S4 after the error occurs. In other embodiments, the second processing unit 132 of the logic processing module 13 may actively send the corresponding error type information 1311 to the management control module 14 whenever an error occurs, which depends on the actual design.
Referring to fig. 2, fig. 2 is a flowchart illustrating a method for determining an error type according to a preferred embodiment of the invention. The preferred embodiment of the present invention further provides an error type determination method, which is implemented by using the error type determination system shown in fig. 1, and comprises the following steps S101 to S104.
Step S101: the first processing unit 122 of the bios 12 determines whether an error occurrence signal S1 containing the first error message sent by the servo operation unit 11 executing the program and generating the first error message is received.
Step S102: the first processing unit 122 of the bios 12 finds one of the second error messages corresponding to the first error message according to the error type comparison table 1211, finds one of the second error messages corresponding to the first error message according to the error type comparison table 1211 and the corresponding one of the second error messages, and defines the corresponding one of the error messages as a corresponding error message 1311, so as to transmit an error message S2 including the corresponding error message 1311 according to the message transmission protocol.
Step S103: the second processing unit 132 of the logic processing module 13 receives the error type signal S2, parses the corresponding error type information 1311, stores the corresponding error type information 1311 in the second storage unit 131, and sends the corresponding error type information 1311.
Step S104: the management control module 14 is utilized to receive the corresponding error type information 1311 to display the corresponding error type information 1311.
Other detailed descriptions of each step are already mentioned in the above paragraphs, and thus are not repeated herein.
In summary, after the error type determination system and method provided by the present invention are adopted, since the error type comparison table is established in the bios in advance, when an error occurs during the operation of the server, the bios can immediately identify what type of error, and can directly transmit the type of error to the logic processing module to trigger the logic processing module to store and transmit the type of error to the management control module, so that field personnel can know the type of error when the error occurs, and thus can quickly find the source of the error, thereby effectively reducing inconvenience of other personnel, and effectively improving processing efficiency when the error occurs.
The above detailed description of the preferred embodiments is intended to more clearly describe the features and spirit of the present invention, and is not intended to limit the scope of the present invention by the preferred embodiments disclosed above. On the contrary, it is intended to cover various modifications and equivalent arrangements included within the scope of the claims.

Claims (9)

1. An error type determination system, comprising:
the servo operation unit is used for sending an error occurrence signal containing first error information when executing an execution program and generating the first error information;
a basic input output system, comprising:
a first storage unit for storing an error type comparison table containing a plurality of second error messages and a plurality of error type messages corresponding to the second error messages respectively; and
a first processing unit, electrically connected to the first storage unit, communicatively connected to the servo operation unit, for finding out the second error information corresponding to the first error information according to the error type comparison table when receiving the error occurrence signal, finding out the corresponding error type information according to the error type comparison table and the corresponding second error information, and defining the corresponding error type information as a corresponding error type information, so as to transmit an error type signal containing the corresponding error type information through an information transmission protocol;
a logic processing module, comprising:
a second storage unit; and
the second processing unit is electrically connected with the second storage unit, is in communication connection with the first processing unit, and is used for analyzing the corresponding error type information when receiving the error type signal, storing the corresponding error type information in the second storage unit and transmitting the corresponding error type information; and
and the management control module is in communication connection with the second processing unit and is used for receiving the corresponding error type information.
2. The error type determination system according to claim 1, characterized in that: the management control module is further configured to transmit an error type check instruction to the second processing unit to trigger the second processing unit to transmit the corresponding error type information to the management control module.
3. The error type determination system according to claim 1, characterized in that: the logic processing module is a complex programmable logic device.
4. The error type determination system according to claim 1, characterized in that: the management control module is a substrate management controller.
5. The error type determination system according to claim 1, characterized in that: the information transmission protocol is a serial general input and output protocol.
6. An error type determination method implemented by the error type determination system of claim 1, comprising the steps of:
(a) utilizing the first processing unit of the BIOS to determine whether the error occurrence signal containing the first error information sent by the servo operation unit executing the execution program and generating the first error information is received;
(b) when the determination result in the step (a) is yes, finding out the second error information corresponding to the first error information by using the first processing unit of the bios according to the error type comparison table, finding out the corresponding error type information according to the error type comparison table and the corresponding second error information, defining the corresponding error type information as the corresponding error type information, and transmitting the error type signal including the corresponding error type information through the information transmission protocol;
(c) receiving the error type signal by using the second processing unit of the logic processing module, analyzing the corresponding error type information, storing the corresponding error type information in the second storage unit, and transmitting the corresponding error type information; and
(d) receiving the corresponding error type information by using the management control module to display the corresponding error type information;
and (c) when the judgment result of the step (a) is negative, repeatedly executing the step (a).
7. The error type determination method according to claim 6, wherein: the logic processing module is a complex programmable logic device.
8. The error type determination method according to claim 6, wherein: the management control module is a substrate management controller.
9. The error type determination method according to claim 6, wherein: the information transmission protocol is a serial general input and output protocol.
CN202011146802.5A 2020-10-23 2020-10-23 Error type judging system and method thereof Active CN112256467B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011146802.5A CN112256467B (en) 2020-10-23 2020-10-23 Error type judging system and method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011146802.5A CN112256467B (en) 2020-10-23 2020-10-23 Error type judging system and method thereof

Publications (2)

Publication Number Publication Date
CN112256467A CN112256467A (en) 2021-01-22
CN112256467B true CN112256467B (en) 2022-08-02

Family

ID=74264897

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011146802.5A Active CN112256467B (en) 2020-10-23 2020-10-23 Error type judging system and method thereof

Country Status (1)

Country Link
CN (1) CN112256467B (en)

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6731206B2 (en) * 2001-12-28 2004-05-04 First International Computer Inc. Failure announcing voice system
CN104424041A (en) * 2013-08-23 2015-03-18 鸿富锦精密工业(深圳)有限公司 System and method for processing error
CN104424084A (en) * 2013-08-27 2015-03-18 鸿富锦精密电子(天津)有限公司 System error information detection system and method for server
DK3121726T3 (en) * 2014-06-24 2018-05-22 Huawei Tech Co Ltd PROCEDURE FOR TROUBLESHOOTING, RELATED DEVICE AND COMPUTER
TW201712543A (en) * 2015-09-29 2017-04-01 鴻海精密工業股份有限公司 Method for detecting fault of server and device using the same
CN107357694A (en) * 2016-05-10 2017-11-17 佛山市顺德区顺达电脑厂有限公司 Error event reporting system and its method during startup self-detection
CN109446002B (en) * 2018-10-17 2022-02-18 郑州云海信息技术有限公司 Jig plate, system and method for grabbing SATA hard disk by server
CN109947612A (en) * 2019-03-26 2019-06-28 苏州浪潮智能科技有限公司 A kind of method and device reading BIOS POST code by setting BMC SDR
CN209690904U (en) * 2019-06-14 2019-11-26 山东超越数控电子股份有限公司 A kind of storage medium realizing BMC and being interacted with BIOS information

Also Published As

Publication number Publication date
CN112256467A (en) 2021-01-22

Similar Documents

Publication Publication Date Title
CN100440157C (en) Detecting correctable errors and logging information relating to their location in memory
TWI229796B (en) Method and system to implement a system event log for system manageability
US11681595B2 (en) Techniques and system for optimization driven by dynamic resilience
US20080140895A1 (en) Systems and Arrangements for Interrupt Management in a Processing Environment
CN109165138B (en) Method and device for monitoring equipment fault
WO2022228499A1 (en) Pcie fault self-repairing method, apparatus and device, and readable storage medium
US11182232B2 (en) Detecting and recovering from fatal storage errors
US20240053812A1 (en) Power supply control method and apparatus, and server and non-volatile storage medium
CN115934389A (en) System and method for error reporting and handling
CN116049249A (en) Error information processing method, device, system, equipment and storage medium
CN117389790B (en) Firmware detection system, method, storage medium and server capable of recovering faults
WO2024082844A1 (en) Fault detection apparatus and detection method for random access memory
CN115587055A (en) Bus transmission method, system, device and storage medium
CN112256467B (en) Error type judging system and method thereof
US20230366951A1 (en) Power failure monitoring device and power failure monitoring method
TWI767378B (en) Error type determination system and method thereof
US20200285520A1 (en) Information processor, information processing system, and method of processing information
CN115509786A (en) Method, device, equipment and medium for reporting fault
CN114003416B (en) Memory error dynamic processing method, system, terminal and storage medium
CN113722143A (en) Program flow monitoring method and device, electronic equipment and storage medium
CN113037507B (en) Intelligent network card system with error detection function and error detection method
CN112346922B (en) Server device and communication protocol method thereof
CN111949431B (en) Fatal error providing method and fatal error identification method for system-on-chip product
US11797368B2 (en) Attributing errors to input/output peripheral drivers
US11422876B2 (en) Systems and methods for monitoring and responding to bus bit error ratio events

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant