CN116185678A - Fault log recording method, system method and equipment - Google Patents

Fault log recording method, system method and equipment Download PDF

Info

Publication number
CN116185678A
CN116185678A CN202211581975.9A CN202211581975A CN116185678A CN 116185678 A CN116185678 A CN 116185678A CN 202211581975 A CN202211581975 A CN 202211581975A CN 116185678 A CN116185678 A CN 116185678A
Authority
CN
China
Prior art keywords
data
cpu
management controller
fault log
fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211581975.9A
Other languages
Chinese (zh)
Inventor
毛阿利
刘劲楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XFusion Digital Technologies Co Ltd
Original Assignee
XFusion Digital Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XFusion Digital Technologies Co Ltd filed Critical XFusion Digital Technologies Co Ltd
Priority to CN202211581975.9A priority Critical patent/CN116185678A/en
Publication of CN116185678A publication Critical patent/CN116185678A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0721Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment within a central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0769Readable error formats, e.g. cross-platform generic formats, human understandable formats

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application provides a fault log recording method, a fault log recording system, a fault log recording method and fault log recording equipment, and relates to the technical field of computer equipment. In the method, after the computer equipment fails, a CPU generates a fault log so as to record the fault information of the computer equipment. After receiving the first fragmented data of the fault log sent by the CPU, the management controller writes at least one first byte of the fault log included in the first fragmented data into a log file, and actively sends a target interrupt signal to the CPU to instruct the CPU to continue sending the fault log file, so that the second fragmented data of the fault log sent by the CPU according to the target interrupt signal is quickly received. Because the CPU can continuously send the second piece of data of the fault log according to the target interrupt signal, the time interval between the first piece of data and the second piece of data can be shortened, so that the time for the management controller to receive the fault log is effectively shortened, and the CPU can quickly enter a subsequent fault processing flow.

Description

Fault log recording method, system method and equipment
Technical Field
The present disclosure relates to the field of computer devices, and in particular, to a fault log recording method, a fault log recording system, and a fault log recording device.
Background
Currently, a management controller of a computer device is typically configured with a log file for recording key information when the computer device fails (e.g., a system is down, crashed, restarted, etc.), and the key information may be used to analyze the root cause of the failure of the computer device.
In the related art, after a computer device fails, a central processing unit (central processing unit, CPU) generates a fault log before executing a fault processing procedure (such as crash or downtime), so as to record key information when the computer device fails. Then, the CPU sends the fault log to the management controller, and the management controller writes the fault log into a log file.
However, the communication manner between the CPU and the management controller follows a handshake mechanism, specifically, after the CPU transmits part of the data of the fault log to the management controller, the CPU sends query information to the management controller to query whether the management controller receives the part of the data, and then the CPU polls for replies of the management controller until the replies of the management controller are read, and confirms that the management controller has received the part of the data, other part of the data of the fault log can be continuously transmitted, which results in particularly long transmission time of the fault log and seriously affects the CPU to continuously execute the fault processing flow.
Disclosure of Invention
The embodiment of the application provides a fault log recording method, a fault log recording system method and fault log recording equipment, which can improve the transmission speed of fault logs, shorten the time for a management controller to receive the fault logs and facilitate a CPU to quickly enter a fault processing flow.
In order to achieve the above purpose, the embodiments of the present application adopt the following technical solutions:
in a first aspect, a fault log recording method is provided, for a computer device, where the computer device includes a central processing unit CPU and a management controller, and the management controller stores a log file; the method is performed by a management controller; the method comprises the following steps: receiving first fragment data of a fault log sent by a CPU (Central processing Unit); the fault log is used for recording fault information of the computer equipment, and the first fragmentation data comprises at least one first byte of the fault log; writing at least one first byte in the first fragmented data into a log file; sending a target interrupt signal to the CPU; the target interrupt signal is used for indicating the CPU to continuously send the fault log; receiving second fragment data of the fault log sent by the CPU; the second slice data is transmitted by the CPU according to the target interrupt signal, and the second slice data comprises at least one second byte of the fault log.
In the scheme, after the computer equipment fails, the CPU generates a fault log so as to record the fault information of the computer equipment. After receiving the first fragmented data of the fault log sent by the CPU, the management controller writes at least one first byte of the fault log included in the first fragmented data into a log file, and actively sends a target interrupt signal to the CPU to instruct the CPU to continue sending the fault log file, so that the second fragmented data of the fault log sent by the CPU according to the target interrupt signal is quickly received, and the second fragmented data includes at least one second byte of the fault log. After the CPU sends the first fragmented data of the fault log, the second fragmented data of the fault log can be continuously sent according to the received target interrupt signal, so that the waiting time of polling is saved relative to the reply of the CPU polling inquiry management controller, the time interval between the first fragmented data and the second fragmented data can be greatly shortened, the transmission speed of the fault log is improved, the time for the management controller to receive the fault log is effectively shortened, and the CPU is helped to quickly enter a subsequent fault processing flow.
In one possible implementation, the first slice data includes first data and second data, the first data and the second data including at least one byte; receiving first fragment data of a fault log sent by a CPU, wherein the first fragment data comprises: receiving first data sent by a CPU; sending a first interrupt signal to the CPU; the first interrupt signal is used for indicating the CPU to continue sending the fault log; receiving second data sent by a CPU; the second data is sent by the CPU according to the first interrupt signal.
In this implementation, a specific implementation of receiving first sliced data is provided. Because the first fragmentation data comprises first data and second data, the first data and the second data comprise at least one byte, and the management controller sends a first interrupt signal to the CPU after receiving the first data so as to instruct the CPU to continuously send the fault log, thereby realizing rapid receiving of the second data sent by the CPU according to the first interrupt signal. After the CPU sends the first data, the second data can be continuously sent according to the first interrupt signal, so that the waiting time of polling is saved relative to the reply of the CPU polling inquiry management controller, the time interval between the first data and the second data can be shortened, the transmission speed of the first fragmented data is improved, the time for the management controller to receive the first fragmented data is further effectively shortened, and the time for the management controller to receive the fault log is further shortened.
In another possible implementation manner, before receiving the first slice data of the fault log sent by the CPU, the method further includes: and sending target information to the CPU, wherein the target information is used for indicating the CPU to transmit the fault log according to the interrupt signal sent by the management controller.
In this implementation, before receiving the first slice data of the fault log sent by the CPU, the management controller sends the target information to the CPU, so as to implement that the CPU is set to transmit the fault log according to the interrupt signal sent by the management controller. Based on the setting, after the CPU has transmitted part of the fragmented data of the fault log, the CPU may continue to transmit other part of the data of the fault log according to the interrupt signal transmitted by the management controller.
In another possible implementation manner, the frame lengths of the first slice data and the second slice data are preset values; the management controller and the CPU transmit fault logs based on the target interface, and the preset value is the maximum value of the frame length of data transmitted by the target interface.
In this implementation manner, the first sliced data and the frame length of the first sliced data are set to be preset values, and the preset values are the maximum value of the frame length of the data transmitted by the target interface, so that the number of sliced data of the fault log is reduced, the transmission speed of the fault log is improved, and the transmission time of the fault log is further shortened.
In another possible implementation, the first piece of data further includes an identification of the management controller; writing at least one first byte in the first fragmented data to a log file, comprising: if the identification of the management controller is the same as the actual identification of the management controller, writing at least one first byte in the first fragment data into a log file; the identity of the management controller is identical with the actual identity of the management controller, which indicates that the data format of the first fragment data meets the requirement.
In this implementation manner, the first sliced data further includes an identifier of the management controller, after the management controller receives the first sliced data, the management controller first determines whether the identifier of the management controller included in the first sliced data is the same as an actual identifier of the management controller, if so, it is indicated that the data format of the first sliced data meets the requirement, and the management controller writes the first sliced data into the log file. The first data segment further comprises the identifier of the management controller, and when the management controller verifies that the data format of the first data segment meets the requirements based on the identifier of the management controller, at least one first byte is written into the log file, so that the management controller can accurately identify the content of the fault log.
In another possible implementation, the first sliced data further includes a first checksum, the first checksum being determined by the CPU based on the content of the first sliced data; writing at least one first byte in the first fragmented data to a log file, comprising: determining a second checksum of the first sliced data based on the content of the first sliced data; if the second checksum is the same as the first checksum, writing at least one first byte in the first fragmented data into the log file; wherein the second checksum is the same as the first checksum for indicating that the contents of the first fragmented data are unchanged.
In this implementation, the first sliced data also includes a first checksum that the CPU generates from the contents of the first sliced data. After receiving the first sliced data, the management controller firstly generates a second checksum according to the content of the first sliced data, judges whether the second checksum is identical to the first checksum or not, and if so, indicates that the content of the first sliced data is unchanged, and writes the first sliced data into a log file. By setting the first data segment to further comprise a first checksum, and when the management controller verifies that the content of the first data segment is unchanged according to the first checksum, writing at least one first byte into the log file, so that the at least one first byte with the wrong content is prevented from being stored, and the cause of the failure of the computer equipment cannot be accurately analyzed according to the failure log later.
In another possible implementation, the first slice data further includes a data offset of the at least one first byte, the data offset being used to indicate a recording position of the at least one first byte in the fault log.
In this implementation manner, by setting the data offset of the first slice data further includes at least one first byte, so that if the receiving order of the first slice data and the other slice data is different from the sending order of the first slice data and the other slice data, for example, the CPU sends the first slice data first and then sends the other slice data, and the management controller receives the other slice data first and then receives the first slice data, the management controller may write the at least one first byte into the log file according to the data offset of the at least one first byte, which is helpful for ensuring that the order of a plurality of bytes in the fault log stored by the management controller is the same as the order of a plurality of bytes in the fault log generated by the CPU, and further is helpful for ensuring the accuracy of the fault log stored by the management controller.
In another possible implementation manner, the first slice data further includes operation information; the operation information is used for indicating an operation type to which the first piece of data belongs, and the operation type comprises any one of write preparation, write data or write end.
In this implementation manner, by setting that the first piece of data further includes operation information, and the operation information indicates that the operation type to which the first piece of data belongs is write preparation, write data, or write end, the management controller is facilitated to accurately determine the transmission stage of the fault log according to the operation information.
In a second aspect, a fault log recording method is provided, and the fault log recording method is used for a computer device, wherein the computer device comprises a Central Processing Unit (CPU) and a management controller, and the management controller stores log files; the method is executed by a CPU; the method comprises the following steps: generating a fault log; the fault log is used for recording fault information of the computer equipment; sending first fragment data of the fault log to a management controller; the first shard data includes at least one first byte of the fault log; receiving a target interrupt signal sent by a management controller, wherein the target interrupt signal indicates that fault logs are continuously sent; in response to the target interrupt signal, second shard data of the fault log is sent to the management controller, the second shard data including at least one second byte of the fault log.
In this scheme, after the computer device fails, the CPU generates a failure log to record failure information of the computer device. After that, the CPU sends the first piece of data of the fault log to the management controller, and after receiving the target interrupt signal sent by the management controller, the CPU continues to send the second piece of data of the fault log to the management controller according to the target interrupt signal because the target interrupt signal indicates to continue sending the fault log. After the CPU sends the first fragmented data of the fault log, the second fragmented data of the fault log can be continuously sent according to the received target interrupt signal, so that the waiting time of polling is saved relative to the reply of the CPU polling inquiry management controller, the time interval between the first fragmented data and the second fragmented data can be greatly shortened, the transmission speed of the fault log is improved, the time for the CPU to send the fault log is effectively shortened, and the CPU is helped to quickly enter a subsequent fault processing flow.
In one possible implementation, the first slice data includes first data and second data, the first data and the second data including at least one byte; transmitting first shard data of the fault log to the management controller, comprising: transmitting the first data to a management controller; receiving a first interrupt signal sent by a management controller; the first interrupt signal indicates that the fault log continues to be sent; and sending the second data to the management controller according to the first interrupt signal.
In this implementation, a specific implementation is provided in which the first sliced data occurs. Since the first fragmented data includes first data and second data, the first data and the second data include at least one byte, when the first interrupt signal sent by the management controller is received after the CPU sends the first data to the management controller, the CPU continues to send the second data to the management controller according to the first interrupt signal because the first interrupt signal indicates to continue sending the fault log. After the first data is sent by the CPU, the second data can be continuously sent according to the first interrupt signal, so that the waiting time of polling is saved relative to the reply of the CPU polling inquiry management controller, therefore, the time interval between the first data and the second data can be shortened, the transmission speed of the first fragmented data is improved, the time for the CPU to send the first fragmented data is further effectively shortened, and the time for the CPU to send the fault log is further shortened.
In another possible implementation, before generating the fault log, the method further includes: receiving target information sent by a management controller; the target information is used for indicating the CPU to transmit the fault log according to the interrupt signal sent by the management controller.
In this implementation, before receiving the first slice data of the fault log sent by the CPU, the management controller sends the target information to the CPU, so as to implement that the CPU is set to transmit the fault log according to the interrupt signal sent by the management controller. Based on the setting, after the CPU has transmitted part of the fragmented data of the fault log, the CPU may continue to transmit other part of the data of the fault log according to the interrupt signal transmitted by the management controller.
In another possible implementation manner, the CPU and the management controller transmit the fault log through the target bus; the method further comprises the steps of: configuring a value of an interrupt register of a control target bus as a target value; the target value is used to indicate that the interrupt signal is in an active state.
In the implementation manner, the CPU and the management controller transmit fault logs through the target bus, and the CPU configures the value of the interrupt register of the control target bus as a target value so that the interrupt signal transmitted by the target bus is in a valid state, thereby ensuring that the CPU can successfully respond to the interrupt signal transmitted by the management controller.
In another possible implementation manner, the frame lengths of the first slice data and the second slice data are preset values; wherein the preset value is greater than or equal to a preset threshold.
In the implementation manner, the first sliced data and the frame length of the first sliced data are set to be preset values, and the preset values are larger than or equal to the preset threshold, so that the frame lengths of the first sliced data and the second sliced data can be ensured to be larger by setting the proper preset threshold, the number of the sliced data of the fault log is reduced, the transmission speed of the fault log is improved, and the transmission time of the fault log is further shortened.
In another possible implementation, the first shard data includes an identification of the management controller; the identifier of the management controller is used for indicating whether the data format of the first fragment data meets the requirement.
In this implementation manner, the identifier of the management controller is further included in the first sliced data, so that the management controller can determine whether the data format of the first sliced data meets the requirement based on the identifier of the management controller after receiving the first sliced data, which is helpful for avoiding that the management controller stores sliced data whose data format does not meet the requirement, so that the management controller cannot accurately identify fault log content.
In another possible implementation, the first sliced data includes a first checksum, which is determined by the CPU based on the content of the first sliced data; wherein the first checksum is used to indicate whether the content of the first fragmented data is changed.
In this implementation manner, the first checksum is determined by the CPU based on the content of the first sliced data, and the first checksum can indicate whether the content of the first sliced data is changed, so that after receiving the first sliced data, the management controller can determine whether the content of the first sliced data is changed according to the first checksum, which is helpful to avoid that the management controller stores the first sliced data with changed content, and consequently cannot accurately analyze the cause of the failure of the computer device according to the failure log.
In another possible implementation, the first slice data further includes a data offset of the at least one first byte, the data offset being used to indicate a recording position of the at least one first byte in the fault log.
In this implementation manner, by setting the data offset of the first slice data further includes at least one first byte, so that if the receiving order of the first slice data and the other slice data is different from the sending order of the first slice data and the other slice data, for example, the CPU sends the first slice data first and then sends the other slice data, and the management controller receives the other slice data first and then receives the first slice data, the management controller may write the at least one first byte into the log file according to the data offset of the at least one first byte, which is helpful for ensuring that the order of a plurality of bytes in the fault log stored by the management controller is the same as the order of a plurality of bytes in the fault log generated by the CPU, and further is helpful for ensuring the accuracy of the fault log stored by the management controller.
In another possible implementation manner, the first slice data further includes operation information; the operation information is used for indicating an operation type to which the first piece of data belongs, and the operation type comprises any one of write preparation, write data or write end.
In this implementation manner, the first piece of data includes operation information, where the operation information indicates that an operation type to which the first piece of data belongs is write preparation, write data, or write end, which helps the management controller accurately determine a transmission stage of the fault log according to the operation information.
In a third aspect, a system method is provided for a computer device, where the computer device includes a central processing unit CPU and a management controller, where the management controller stores a log file; the system method comprises the following steps: the CPU generates a fault log; the fault log is used for recording fault information of the computer equipment; the CPU sends first fragment data of the fault log to the management controller; the first shard data includes at least one first byte of the fault log; the management controller receives first sliced data of the fault log sent by the CPU and writes at least one first byte in the first sliced data into a log file; the management controller sends a target interrupt signal to the CPU; the target interrupt signal is used for indicating to continue sending the fault log; the CPU receives a target interrupt signal sent by the management controller and sends second fragment data of the fault log to the management controller according to the target interrupt signal; the second shard data includes at least one second byte of the fault log; the management controller receives second piece data of the fault log sent by the CPU.
It should be noted that, in the third aspect, the management controller may also perform any one of the possible implementation manners of the first aspect, and the CPU may also perform any one of the possible implementation manners of the second aspect, which are not described herein.
In a fourth aspect, there is provided a fault log recording device, the device comprising: the functional units for executing any of the methods provided in the first aspect, and actions executed by the respective functional units are implemented by hardware or implemented by hardware executing corresponding software. For example, the fault log recording means may include: a receiving unit, a writing unit and a transmitting unit; the receiving unit is used for controlling the management controller to receive the first fragment data of the fault log sent by the CPU; the fault log is used for recording fault information of the computer equipment, and the first fragmentation data comprises at least one first byte of the fault log; a writing unit for controlling the management controller to write at least one first byte in the first slice data into the log file; a transmitting unit for controlling the management controller to transmit a target interrupt signal to the CPU; the target interrupt signal is used for indicating the CPU to continuously send the fault log; the receiving unit is also used for controlling the management controller to receive fault log second fragment data sent by the CPU; wherein the second fragmented data is sent by the CPU according to the target interrupt signal, the second fragmented data comprising at least one second byte of the fault log.
In a fifth aspect, there is provided a fault log recording device, the device comprising: functional units for performing any of the methods provided in the second aspect, the actions performed by the respective functional units are implemented by hardware or by hardware executing corresponding software. For example, the fault log recording means may include: the device comprises a processing unit, a sending unit and a receiving unit; the processing unit is used for controlling the CPU to generate a fault log; the fault log is used for recording fault information of the computer equipment; a sending unit for controlling the CPU to send the first fragment data of the fault log to the management controller; the first shard data includes at least one first byte of the fault log; the receiving unit is used for controlling the CPU to receive a target interrupt signal sent by the management controller, wherein the target interrupt signal indicates that the fault log is continuously sent; and the sending unit is also used for controlling the CPU to send second piece of data of the fault log to the management controller according to the target interrupt signal, wherein the second piece of data comprises at least one second byte of the fault log.
In a sixth aspect, there is provided a computer device comprising a central processing unit CPU and a management controller, the management controller storing a log file; the management controller is used for receiving first fragment data of the fault log sent by the CPU; the fault log is used for recording fault information of the computer equipment, and the first fragmentation data comprises at least one first byte of the fault log; the management controller is further used for writing at least one first byte in the first sliced data into the log file; the management controller is also used for sending a target interrupt signal to the CPU; the target interrupt signal is used for indicating to continue the CPU to send the fault log; the management controller is also used for receiving fault log second fragment data sent by the CPU; wherein the second fragmented data is sent by the CPU according to the target interrupt signal, the second fragmented data comprising at least one second byte of the fault log.
It should be noted that, in the sixth aspect, the management controller may also perform any one of the possible implementation manners of the first aspect, which is not described herein.
In a seventh aspect, a computer device is provided, the computer device comprising a central processing unit CPU and a management controller, the management controller storing a log file; a CPU for generating a fault log; the fault log is used for recording fault information of the computer equipment; the CPU is also used for sending the first fragment data of the fault log to the management controller; the first shard data includes at least one first byte of the fault log; the CPU is also used for receiving a target interrupt signal sent by the management controller, wherein the target interrupt signal indicates that the fault log is continuously sent; the CPU is further used for sending second piece of data of the fault log to the management controller according to the target interrupt signal, wherein the second piece of data comprises at least one second byte of the fault log.
It should be noted that, in the seventh aspect, the CPU may further perform any one of the possible implementation manners of the second aspect, which is not described herein.
In an eighth aspect, there is provided a computer device including a central processing unit CPU and a management controller, the management controller storing a log file, the central processing unit CPU being connected to the management controller; the CPU generates a fault log; the fault log is used for recording fault information of the computer equipment; the CPU sends first fragment data of the fault log to the management controller; the first shard data includes at least one first byte of the fault log; the management controller receives first sliced data of the fault log sent by the CPU and writes at least one first byte in the first sliced data into a log file; the management controller sends a target interrupt signal to the CPU; the target interrupt signal is used for indicating the CPU to continuously send the fault log; the CPU receives a target interrupt signal sent by the management controller and sends second fragment data of the fault log to the management controller according to the target interrupt signal; the second shard data includes at least one second byte of the fault log; the management controller receives second piece data of the fault log sent by the CPU.
It should be noted that, in the eighth aspect, the management controller may further perform any one of the possible implementation manners of the first aspect, and the CPU may further perform any one of the possible implementation manners of the second aspect, which are not described herein.
In one possible implementation, the central processing unit CPU and management controller are connected via an enhanced serial peripheral interface ESPI bus.
In a ninth aspect, there is provided a computer device comprising: comprising the following steps: the device comprises a processor and a memory, wherein the processor is connected with the memory. The memory is configured to store computer-executable instructions, and the processor executes the computer-executable instructions stored in the memory, thereby implementing any one of the methods provided in the first aspect, or implementing any one of the methods provided in the second aspect.
In a tenth aspect, there is provided a chip comprising: a processor and interface circuit; the interface circuit is used for receiving the code instruction and transmitting the code instruction to the processor; a processor for executing code instructions to perform any of the methods provided in the first aspect above, or to perform any of the methods provided in the second aspect above.
In an eleventh aspect, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed on a computer, cause the computer to perform any one of the methods provided in the first aspect, or to perform any one of the methods provided in the second aspect.
In a twelfth aspect, there is provided a computer program product comprising computer-executable instructions which, when run on a computer, cause the computer to perform any one of the methods provided in the first aspect or to perform any one of the methods provided in the second aspect.
The technical effects caused by any implementation manner of the third aspect to the twelfth aspect may refer to the technical effects caused by different implementation manners of the first aspect, and are not repeated herein.
Drawings
Fig. 1 is a system architecture diagram of a computer device according to an embodiment of the present application;
FIG. 2 is a flowchart of a fault log recording method according to an embodiment of the present application;
FIG. 3 is a flowchart of another fault log recording method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a log file according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of another fault log recording device according to an embodiment of the present application;
fig. 6 is a schematic diagram of another fault log recording device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
Wherein, in the description of the present application, "/" means that the related objects are in a "or" relationship, unless otherwise specified, for example, a/B may mean a or B; the term "and/or" in this application is merely an association relation describing an association object, and means that three kinds of relations may exist, for example, a and/or B may mean: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural.
Also, in the description of the present application, unless otherwise indicated, "a plurality" means two or more than two. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.
In addition, in order to clearly describe the technical solutions of the embodiments of the present application, in the embodiments of the present application, the words "first", "second", and the like are used to distinguish the same item or similar items having substantially the same function and effect. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ. Meanwhile, in the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion that may be readily understood.
First, an application scenario of the embodiment of the present application is described in an exemplary manner.
With the increasing number of computer devices running on a network, computer device failures (such as system downtime, crashes, or reboots) become an unavoidable phenomenon. In order to record key information when a computer device fails, so as to analyze abnormal parameters of the computer device and thereby locate a root cause of the computer device failure, a log recording technology is proposed in the related art. Specifically, a log file is stored in a management controller of a computer device, and the log file is used for storing a fault log for recording key information when the computer device fails.
Currently, after a computer device fails, a central processing unit (central processing unit, CPU) of the computer device generates a fault log before performing a fault processing procedure (such as a crash or downtime processing procedure) for recording key information when the fault occurs. Then, the CPU sends the fault log to the management controller, and the management controller writes the fault log into a log file.
However, the communication manner between the CPU and the management controller follows a handshake mechanism, specifically, after the CPU transmits part of the data of the fault log to the management controller, the CPU sends query information to the management controller to query whether the management controller receives the part of the data. And then, the CPU polls the reply of the query management controller until the reply of the management controller is read, and confirms that the management controller has received the partial data, and then the other partial data of the fault log can be continuously transmitted.
It will be appreciated that since the CPU performs multiple tasks simultaneously, wherein the multiple tasks include querying the recovery of the management controller. For the plurality of tasks, the CPU employs a polling execution mechanism. For example, the CPU executes task 1, task 2, task 3 and task 4 (task querying the management controller for a reply), the CPU executes task 2 during the idle period of task 1, executes task 3 during the idle period of task 2, executes task 4 during the idle period of task 3, and reads the management controller for a reply if this polling is performed. Then the next time the poll is waited, a reply is read to the management controller, and the management controller is confirmed to have successfully received the partial data, then the other partial data of the fault log is transmitted continuously. Obviously, this results in a particularly long transmission time of the fault log, which seriously affects the continued execution of the fault handling procedure by the CPU.
In view of this, the embodiment of the application provides a fault log recording method, after a computer device fails, a CPU generates a fault log to record fault information of the computer device. After receiving the first fragmented data of the fault log sent by the CPU, the management controller writes at least one first byte of the fault log included in the first fragmented data into a log file, and actively sends a target interrupt signal to the CPU to instruct the CPU to continue sending the fault log file, so that the second fragmented data of the fault log sent by the CPU according to the target interrupt signal is quickly received, and the second fragmented data includes at least one second byte of the fault log. After the CPU sends the first fragmented data of the fault log, the second fragmented data of the fault log can be continuously sent according to the target interrupt signal, so that the waiting time of polling is saved relative to the reply of the CPU polling inquiry management controller, the time interval between the first fragmented data and the second fragmented data can be greatly shortened, the transmission speed of the fault log is improved, the time for the management controller to receive the fault log is effectively shortened, and the CPU is helped to quickly enter a subsequent fault processing flow.
Next, an exemplary description is given of a system architecture of an embodiment of the present application.
As shown in fig. 1, is a system architecture diagram of a computer device.
In terms of hardware, the computer device may include a CPU and a management controller. The CPU is connected with the management controller through a target bus. Based on this, a target bus is used between the CPU and the management controller, and data (such as a fault log) is transferred based on a target interface of a target protocol supported by the target bus.
The computer device may specifically be a terminal device or a network device. The terminal device may be referred to as: a terminal, user Equipment (UE), a terminal device, an access terminal, a subscriber unit, a subscriber station, a mobile station, a remote terminal, a mobile device, a user terminal, a wireless communication device, a user agent, a user equipment, or the like. The terminal apparatus may specifically be an augmented reality (augmented reality, AR) device, a Virtual Reality (VR) device, a tablet, a notebook, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (personal digital assistant, PDA), or the like. The network device may be a server or the like in particular. The server may be one physical or logical server, or may be two or more physical or logical servers sharing different responsibilities, and cooperate to implement various functions of the server.
It should be noted that the embodiments of the present application are not limited to the specific form of the computer device, and the above are merely exemplary descriptions. In the following embodiments, only a computer device is described as an example of a server.
Optionally, the target bus is an enhanced serial peripheral interface (enhanced serial peripheral interface, ESPI) bus. The target protocol is an intelligent platform management interface (intelligent platform management interface, IPMI) protocol, and the target interface is a Block Transfer (BT) interface.
It can be understood that, through ESPI bus connection, compare through PCIe bus connection between CPU and the management controller of this application, do not need to install corresponding drive on CPU and management controller and adapt, solved complicated system compatibility carefully problem, effectively reduced the information transmission threshold between CPU and the management controller.
It should be noted that, the type of the target bus is not limited in the embodiment of the present application, and any bus supporting transmission of the fault log according to the interrupt signal in the related art may be used as the target bus in the embodiment of the present application.
In addition, the embodiment of the present application does not limit the target transmission protocol and the target interface supported by the target bus, and any transmission protocol and interface supporting transmission of the fault log according to the interrupt signal in the related art may be used as the target transmission protocol and the target interface used by the target bus in the embodiment of the present application.
In the following embodiments, the target bus is an ESPI bus, the target transport protocol is an IPMI protocol, and the target interface is a BT interface.
Optionally, the CPU further comprises an interrupt register, a cache register and a control register. The ESPI bus is connected to the interrupt register, the cache register, and the control register, respectively.
In the embodiment of the present application, the value stored in the interrupt register is used to indicate whether the interrupt signal is in a valid state. The information stored in the cache register is used for caching information to be transmitted by using the ESPI bus. The value stored in the control register is used for indicating the states of the CPU, the management controller and the cache register.
Alternatively, the management controller is a controller entirely independent of the CPU.
By way of example, the management controller may include a supervisory management controller external to the computer device, a management chip external to the CPU, a baseboard management controller (baseboard management controller, BMC), a system management module (system management mode, SMM), and the like. It should be noted that the embodiments of the present application are not limited to the specific form of the management controller, and the above is merely exemplary. In the following embodiments, only a management controller will be described as an example of a BMC.
It should be noted that different computer devices may be referred to as BMCs differently, for example, some computer devices may be referred to as BMCs, some computer devices may be referred to as iLO, and another computer device may be referred to as iDRAC. Either called BMC or iLO or iracc may be understood as BMC in embodiments of the present invention.
Optionally, the management controller includes a memory. The memory is used for storing log files. The memory may be, for example, a flash memory (also referred to as flash memory).
Optionally, the management controller further includes a software program such as a log module, a first driving module, and the like. The log module and the first driving module are both stored in the memory.
In some embodiments, the management controller implements writing the received fault log to the log file by executing the log module. For example, the management controller monitors the ESPI bus by executing the log module, and if data is transmitted through the ESPI bus, the data is written into the log file in the memory.
In some embodiments, the target transport protocol is an IPMI protocol, and when the target interface is a BT interface, the first driver module includes a first IPMI driver and a first BT interface. That is, the first driver module is a BT interface driver module of the IPMI protocol.
The management controller realizes BT interface transmission data based on IPMI protocol through the first driving module. Specifically, the management controller loads the first IPMI driver in the first driver module, initializes the first BT interface to implement data transmission of the BT interface based on the IPMI protocol, for example, receives a fault log sent by the CPU, and transmits an interrupt signal to the CPU.
In terms of software, the computer device includes an Operating System (OS) and processor firmware, which are run by a CPU.
In the process of operating the OS, the CPU may generate a fault log in the following embodiment. The ESPI bus may be initially configured while the CPU is running the processor firmware.
By way of example, the processor Firmware (also referred to as a processor Firmware program) may be Firmware such as Firmware, basic input output system (basic input output system, BIOS), manageability engine (management engine, ME), microcode, or intelligent management unit (intelligent management unit, IMU).
It should be noted that the embodiments of the present application are not limited to the specific form of the processor firmware, and the above are merely exemplary illustrations. In the following embodiments, only the BIOS is taken as an example for the processor firmware.
Optionally, the BIOS includes a software module such as a bus configuration module. The CPU performs initialization configuration on the ESPI bus through the execution bus configuration module.
Optionally, the OS includes a second driver module.
In the case where the first driver module is a BT interface driver module of the IPMI protocol, the second driver module includes a second IPMI driver and a second BT interface. After the OS is started, the CPU loads a second IPMI driver to initialize the second BT interface. Thus, during the operation of the OS by the CPU, data transmission with the management controller can be performed through the second BT interface.
Optionally, in terms of software, the computer device further comprises an application program, the application program being run by the CPU. Wherein the application may be used to generate a fault log in the embodiments described below.
It should be noted that, the system architecture and the application scenario described in the embodiments of the present application are for more clearly describing the technical solution of the embodiments of the present application, and do not constitute a limitation on the technical solution provided in the embodiments of the present application, and those skilled in the art can know that, with the evolution of the system architecture and the appearance of a new service scenario, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.
For ease of understanding, the fault logging method provided in the present application is described below by way of example with reference to the accompanying drawings. The method is applicable to the computer device shown in fig. 1.
The following embodiments of the present application will exemplarily describe a scheme of a fault logging method in two parts.
The first section, in conjunction with fig. 2, describes a configuration procedure for causing the CPU to transmit the fault log in accordance with the interrupt signal sent by the management controller. The interrupt signal may be a target interrupt signal, a first interrupt signal, or the like in the following embodiments.
In the embodiment of the present application, the interrupt signal is named as a target interrupt signal, a first interrupt signal, etc. to distinguish the interrupt signals sent by the management controller at different times.
The second section, in conjunction with fig. 3 to 4, describes a process in which the CPU transmits the fault log according to the interrupt signal transmitted from the management controller.
FIG. 2 is a flow chart illustrating a fault logging method according to an exemplary embodiment. Illustratively, the method shown in FIG. 2 includes the following S201-S202.
S201: the management controller sends the target information to the CPU.
The target information is used for indicating the CPU to transmit the fault log according to the interrupt signal sent by the management controller.
Optionally, the interrupt signal is a preset type of interrupt signal.
For example, the interrupt signal of the preset type may be an IRQ9 interrupt signal. In other words, the CPU may transmit the fault log according to the preset type of interrupt signal, but not transmit the fault log according to other types of interrupt signals.
Alternatively, the CPU may transmit the fault log when the interrupt signal of the preset type is in an active state, and not transmit the fault log when the interrupt signal of the preset type is in a non-active state. For example, when the value of the interrupt register connected to the ESPI bus is a target value, the interrupt signal of the preset type is in an active state.
In one example, it may be determined whether the interrupt signal is a preset type of interrupt signal according to a format of the interrupt signal. For example, if the format of the interrupt signal conforms to the preset format, the interrupt signal is a preset type of interrupt signal.
In another example, it may be determined whether the interrupt signal is a preset type of interrupt signal according to a type identification of the interrupt signal. For example, if the interrupt signal includes a type identifier for indicating a preset type, the interrupt signal is the interrupt signal of the preset type.
In yet another example, it may be determined whether the interrupt signal is a preset type of interrupt signal according to the type of interface transmitting the interrupt signal. For example, a correspondence between an interrupt signal of a preset type and a target interface is pre-established, and if the interrupt signal is received through the target interface, the interrupt signal is an interrupt signal of the preset type.
Hereinafter, the implementation of S201 will be exemplarily described using an ESPI bus between the management controller and the CPU and transmitting a fault log based on the BT interface of the IPMI protocol.
In some embodiments, the OS includes a second driver module, and after the OS is started, in a process that the CPU runs the OS, the CPU loads a second IPMI driver included in the second BT driver to initialize a second BT interface included in the second driver module. In the process of loading the second IPMI driver by the CPU, the second IPMI driver sends a configuration command to the management controller to determine whether to support or enable the interrupt signal, specifically, whether the CPU needs to transmit the fault log according to the interrupt signal sent by the management controller.
As described above, the management controller stores the first driving module. After the management controller is powered on, a first IPMI driver included in the first driver module is loaded to initialize a first BT interface included in the first driver module. In the process of loading the first IPMI driver by the management controller, the first IPMI driver receives a configuration command sent by the second IPMI driver, and responds to the configuration command, the first IPMI driver sends target information to the second IPMI driver so as to instruct the CPU to transmit a fault log according to an interrupt signal sent by the management controller.
In one example, the configuration command that the management controller receives the CPU and sends may be "set BMC global enables command (set BMC global enable command)", and the management controller sends the target information to the CPU in response to the configuration command is "00h".
Optionally, the CPU loads the second IPMI driver, and the CPU further sends a first command to the management controller, where the first command is used to determine a data format of the fragmented data of the fault log transmitted between the management controller and the CPU, such as the first fragmented data, the second fragmented data, and the like of the fault log. The management controller returns, in response to the first command, first information indicating a data format of the fragmented data of the fault log to the CPU.
In some embodiments, the first information indicates that the shard data of the fault log includes an identification of the management controller. Wherein, the identification of the management controller is used for indicating whether the data format of the fragmented data meets the requirement. For example, bits 5 to 7 of the fragmented data may be used to store the identity of the management controller.
In this embodiment, by setting that the fragmented data of the fault log includes the identifier of the management controller, the data format of the fragmented data is used to indicate whether the data format of the fragmented data meets the requirement, so that after receiving the first fragmented data, the management controller can determine whether the data format of the first fragmented data meets the requirement based on the identifier of the management controller, which is helpful for avoiding that the management controller stores fragmented data with a data format that does not meet the requirement, and the management controller cannot accurately identify the content of the fault log.
In some embodiments, the first information indicates that the shard data of the fault log includes a first checksum.
Wherein the first checksum is determined by the CPU based on the contents of the sliced data, the first checksum being used to indicate whether the contents of the sliced data are changed. For example, bit 9 of the shard data, which may be a fault log, is used to store the first checksum.
In this embodiment, the fragmented data by setting the fault log further includes a first checksum which is determined by the CPU based on the content of the fragmented data, and which can indicate whether the content of the fragmented data is changed. Therefore, after the management controller receives the sliced data, whether the content of the sliced data is changed or not can be judged according to the first checksum of the sliced data, and the situation that the management controller stores the sliced data with changed content, which causes that the follow-up failure of the computer equipment can not be accurately analyzed according to the fault log, is avoided.
In some embodiments, the first information indicates that the shard data of the fault log includes a data offset.
Wherein the data offset is used to indicate a recording position of at least one byte of the fault log in the fault log. For example, bits 10 to 13 of the fragmented data, which may be a fault log, are used to store the data offset.
In this embodiment, the data offset is further included in the sliced data of the fault log, so when transmitting multiple different sliced data of the fault log, if the receiving order of one sliced data and other sliced data is different from the sending order of the one sliced data and other sliced data, for example, the CPU sends the one sliced data first and then sends the other sliced data, and the management controller receives the other sliced data first and then receives the one sliced data, the management controller can write the one sliced data into the log file according to the record position indicated by the data offset, which helps to ensure that the order of multiple bytes in the fault log stored by the management controller is the same as the order of multiple bytes in the fault log generated by the CPU, and further helps to ensure the accuracy of the fault log stored by the management controller.
In some embodiments, the first information indicates that the shard data of the fault log includes operational information.
The operation information is used for indicating an operation type to which the fragmented data belongs, and the operation type comprises any one of write preparation (00 h), write data (01 h) or write end (03 h). For example, bit 8 of the shard data, which may be a fault log, is used to store the operation information. Where "h" is used to characterize hexadecimal.
In this embodiment, by setting the sliced data of the fault log to further include operation information, the operation information is used to indicate that the operation type to which the sliced data belongs is write preparation, write data or write end, which helps the management controller accurately determine the transmission stage of the fault log according to the operation information.
In some embodiments, the first information further indicates that the shard data of the fault log includes at least one byte of the fault log.
For example, the 14 th to nth bits of the fragmented data may be used to store the fault log. Wherein n is a positive integer greater than 14. At least one byte of the fault log is in ASCII code form.
Optionally, the first command is further for determining a frame length of the fragmented data of the fault log. Wherein the first information further indicates a preset value of a frame length of the fragmented data. For example, bit 1, which may be the slice data, is used to indicate the frame length.
In some embodiments, the management controller and the CPU transmit the fault log based on the target interface, and the preset value is a maximum value of a frame length of data transmitted by the target interface.
For example, when the management controller and the CPU transmit the fault log based on the BT interface of the IPMI protocol, since the maximum value of the frame length of the data transmitted by the BT may be 255 bytes, the preset value is 255 bytes.
In this embodiment, by setting the preset value as the maximum value of the frame length of the data transmitted by the target interface, the number of fragmented data of the fault log is reduced, so that the transmission speed of the fault log is improved, and the transmission time of the fault log is further shortened.
In other embodiments, the management controller and the CPU communicate a fault log based on the target interface. The preset value belongs to a section formed by a preset threshold value and the maximum value of the frame length of the data transmitted by the target interface. The section may be closed or open.
For example, when the management controller and the CPU transmit the fault log based on the BT interface of the IPMI protocol, since the maximum value of the frame length of the data transmitted by the BT interface may be 255 bytes, at this time, the maximum value of the frame length of the data transmitted by the target interface is 255 bytes. Based on this, the preset threshold may be 50 bytes, 100 bytes, 150 bytes, 200 bytes, 250, or the like.
It should be noted that, the specific value of the preset threshold is not limited in the embodiment of the present application, and the dynamic setting may be performed according to the byte number of the historically transmitted fault log. The number of bytes of the fault log transmitted in the history is larger, and the preset threshold value is also set larger, so that the number of the fragmented data of the fault log is reduced.
In this embodiment, the preset value is set to belong to a section formed by the preset threshold and the maximum value of the frame length of the data transmitted by the target interface, so that it is helpful to ensure that the frame lengths of the first and second fragmented data are sufficiently large, so as to reduce the number of fragmented data of the fault log, further improve the transmission speed of the fault log, and further help to shorten the transmission time of the fault log.
S202: the CPU receives the target information sent by the management controller.
In some embodiments, after the CPU receives the target information sent by the management controller, that is, after the IPMI driver (i.e., the second IPMI driver) on the CPU side receives the target information, the CPU starts a response function to the interrupt signal sent by the management controller, so that the CPU can respond to the interrupt signal sent by the management controller to execute the interrupt program corresponding to the interrupt signal, and further, the CPU can transmit the fault log according to the interrupt signal sent by the management controller.
In an exemplary embodiment, after starting a response function to an interrupt signal sent by the management controller, the CPU receives a target interrupt signal sent by the management controller, and then can successfully respond to the target interrupt signal to execute a first interrupt program corresponding to the target interrupt signal, and read response information sent by the management controller, so that the CPU can transmit a fault log according to the interrupt signal sent by the management controller.
In the above embodiment, before the CPU generates the fault log, the CPU is set to transmit the fault log according to the interrupt signal sent by the management controller through the target information with the management controller. Based on the setting, after the CPU has transmitted part of the fragmented data of the fault log, the CPU may continue to transmit other part of the data of the fault log according to the interrupt signal transmitted by the management controller. Optionally, the fault log recording method further includes: and initializing and configuring an interrupt register, a cache register and a controller register connected with the ESPI bus.
Optionally, the CPU configures the value of the interrupt register to which the ESPI bus is connected as the target value.
Wherein the target value is used for indicating that the interrupt signal is in an active state.
In some embodiments, the interrupt register includes a plurality of bits. In the BIOS starting process, the CPU configures each bit in the plurality of bits to be a preset numerical value, so that the value of the interrupt register is configured to be a target value.
Illustratively, the interrupt register includes 8 bits, and the CPU configures the 8 bits to 00000001 (binary), so that the value of the interrupt register is configured to 01h to indicate that the interrupt signal is in a valid state.
In the above embodiment, the management controller transmits the interrupt signal using the ESPI bus, by configuring the value of the interrupt register to which the ESPI bus is connected to the target value, so that the interrupt signal is in an active state.
Because the CPU and the management controller transmit fault logs through the ESPI bus, the CPU configures the value of the interrupt register connected with the ESPI bus as a target value, so that the interrupt signal received by the CPU is in a valid state, i.e. the management controller uses the interrupt signal transmitted to the CPU by the ESPI bus, such as a target interrupt signal, a first interrupt signal and the like in the following embodiment, thereby ensuring that the CPU can successfully respond to the interrupt signal, and further helping to ensure that the CPU can continuously transmit the fault logs according to the interrupt signal.
The above-mentioned "the CPU configures the value of the interrupt register connected to the ESPI bus as the target value" may be regarded as being executed by the CPU calling the bus configuration module.
Based on the above-described configuration of S201-S202, and the configuration of "configure the value of the interrupt register connected to the ESPI bus to the target value", the management controller uses the ESPI bus and the BT interface based on the IPMI protocol to send an interrupt signal, such as the target interrupt signal, the first interrupt signal, etc., to the CPU, and then the CPU sends the fault log to the management controller according to the interrupt signal.
On the basis, when the CPU uses the ESPI bus and the BT interface based on the IPMI protocol sends the fault log to the management controller, if the frame length of the first and second sliced data is 255 bytes, the transmission speed can reach 4.0 megabytes per second (MB/s) -8.0MB/s actually when the fault log is transmitted between the CPU and the management controller by using the ESPI bus through test, and the transmission requirement of the fault log can be completely met.
Optionally, the CPU configures the value of the cache register to which the ESPI bus is connected to zero.
In some implementations, when the CPU sends the fault log to the management controller using the ESPI bus, the fault log is written to the cache register, and the management controller reads the fault log from the cache register, thereby enabling the fault log to be transferred to the management controller.
Optionally, the cache register comprises a plurality of bits. For example, the plurality of bits may be 8 bits for storing 1 byte. Alternatively, the plurality of bits is N times 8 bits, where N is an ongoing number greater than 1, for storing the plurality of bytes.
In some embodiments, during BIOS boot, the CPU configures each bit of the plurality of bits to be 0.
Optionally, the CPU configures the control registers connected to the ESPI bus to zero.
In some embodiments, during BIOS boot, the CPU configures each bit of the plurality of bits to be 0.
Optionally, the control register comprises a plurality of bits. Illustratively, the controller register includes 8 bits (bits). Wherein the definition of each bit is shown in table 1.
TABLE 1
Figure BDA0003991597140000131
The 0 th bit is used for indicating whether the write buffer area is cleared, and if the value of the bit is set to 0, the control register clears the write buffer area. The 1 st bit is used for indicating whether the read buffer is cleared, and if the value of the bit is set to 0, the control register clears the read buffer. Bit 2 is used to indicate whether the CPU can use the ESPI bus to send messages, and if the value of the bit is 0, the CPU can use the ESPI bus to send messages. Bit 3 is used to indicate whether the management controller can send a message using the ESPI bus, and if the value of this bit is 0, it indicates that the management controller can send a message using the ESPI bus. The 4 th bit is used for indicating whether an event occurs, if the value of the bit is 0, the fault log transmission event does not occur, and if the value of the bit is 1, the fault log transmission event occurs. The 6 th bit is used to indicate whether the CPU is busy, if the value of the bit is 0, it indicates that the CPU is not busy, and if the value of the bit is 1, it indicates that the CPU is busy. The 7 th bit is used to indicate whether the management controller is busy, if the value of the bit is 0, the management controller is not busy, and if the value of the bit is 1, the management controller is busy.
Wherein, the writing buffer area and the reading buffer area refer to buffer registers. When the CPU writes information to be transmitted (such as fragment data of the fault log, inquiry information, etc.) to the cache register to transmit the information to the management controller, the cache register is a write cache of the CPU. When the management controller reads the buffer register to receive the information sent by the CPU, the buffer register is a read buffer area of the management controller. Similarly, when the management controller writes information to be sent (such as response information) into the cache register to send the information to the CPU, the cache register is a write cache area of the management controller. When the CPU reads the buffer register to receive the information sent by the management controller, the buffer register is a read buffer area of the CPU.
Optionally, the address of the control register, the address of the buffer register, and the address of the interrupt register are consecutive addresses.
For example, the address of the controller register is E4h, the address of the buffer register is E5h, and the address of the interrupt register is E6h.
The above is the first part of the embodiments of the present application. Hereinafter, the second part of the embodiment of the present application will be described with reference to fig. 3 to 4.
FIG. 3 is a flow chart illustrating a fault logging method according to an exemplary embodiment. Illustratively, the method includes S301-S309 as follows.
S301: the CPU generates a fault log.
The fault log is used for recording fault information of the computer equipment.
Optionally, the fault information includes operation state information, operation parameter information of the computer device, and index information of a fault component on the computer device, index information of an operating system, and the like. In addition, the fault information may further include fault location information, fault type information, and the like.
It should be noted that the failure information may include all information that can be used to determine the cause of the failure of the computer device.
The fault information may be fault information in a first latest preset time period before the computer device fails, in a fault process and in a second latest preset time period after the computer device fails. For example, the operational status information may be operational status information for a first last preset time period before the computer device failed.
In some embodiments, after the OS running on the CPU fails, such as a system crash, downtime, restart, etc., the OS executes a failure processing flow. After the CPU monitors that the OS enters a fault processing flow, the running state information, the running parameter information, the index information of a fault component, the index information of an operating system, the fault position information, the fault type information and the like of the computer equipment are collected, and a fault log is generated. After the OS breaks down, the collected fault information is sent to the management controller, and the management controller writes the log file, so that even if the OS cannot be started normally, the reason of the OS breaking down can be analyzed based on the fault log stored by the management controller.
In other embodiments, a computer device includes a plurality of components, and a CPU generates a fault log after detecting a fault in a target component of the plurality of components. After the target component fails, the collected failure information is generated into a failure log and sent to the management controller, and the management controller writes the log file, so that the failure log of the target component can be prevented from being lost when the target component is restarted outside the OS.
In one example, the CPU generated fault log may be an OS generated fault log run by the CPU. In another example, the CPU generated fault log may be an application running by the CPU generated fault log.
S302: the CPU sends first fragmented data of the fault log to the management controller.
Wherein the first shard data includes at least one first byte of the fault log, the first shard data indicating that the at least one first byte is written to the log file. Therefore, after the management controller receives the first fragment data, at least one first byte is written into the log file instead of other files, so that the fault logs of the computer equipment are stored in a centralized mode, the method is beneficial to accurately acquiring all fault logs when the fault reasons are analyzed later, missing of the fault logs when the fault reasons are analyzed is avoided, and the accuracy of fault reason analysis is affected.
Optionally, the CPU determines a plurality of pieces of data based on the fault log. The plurality of slice data comprises first slice data and second slice data, and the first slice data and the second slice data are different slice data.
The first and second pieces of slice data may be any two pieces of slice data among the plurality of pieces of slice data.
In some embodiments, the first shard data includes only at least one first byte of the fault log. For example, the first fragmented data may include one byte of the fault log, or may also include a plurality of bytes of the fault log.
In other embodiments, the first shard data includes not only at least one first byte of the fault log, but also the target byte. Illustratively, the target byte includes an identification of the management controller, a first checksum of the first sliced data, a data offset of at least one first byte, and operational information of the first sliced data, etc. The description about the target byte will be described in the following embodiments, and will not be described in detail here.
Illustratively, after generating the fault log, the CPU queries a control register connected to the ESPI bus, determines whether the buffer register is cleared based on the 0 th bit of the control register, determines whether the CPU can send a message based on the 2 nd bit of the control register, and determines whether the management controller is busy based on the 7 th bit of the controller register, and if the buffer register is cleared, the CPU can send information, and the management controller is not busy, the CPU sends the first slice data to the management controller.
Optionally, the frame length of the first fragment data is a preset value; the preset value is greater than or equal to a preset threshold. Through setting the first sliced data and the frame length of the first sliced data as preset values, wherein the preset values are larger than or equal to preset thresholds, the frame lengths of the first sliced data and the second sliced data can be ensured to be larger through setting proper preset thresholds, so that the number of the sliced data of the fault log is reduced, the transmission speed of the fault log is improved, and the transmission time of the fault log is further shortened.
It should be noted that, the data format of the second slice data is the same as the data format of the first slice data, for example, the frame length of the second slice data is also a preset value, in addition, the second slice data may only include at least one second byte of the fault log, where the second byte is different from the first byte, or the second slice data may also further include the target byte, so the related description of the second slice data may refer to the related description of the first slice data, which is not described in detail later.
In some embodiments, the CPU turns on the BT interface and sends the first fragmented data of the fault log to the management controller using the ESPI bus, e.g., writes the first fragmented data to a cache register to which the ESPI bus is connected, thereby enabling the first fragmented data of the fault log to be sent to the management controller.
S303: the management controller receives first fragment data of the fault log sent by the CPU.
In some embodiments, the management controller monitors whether the CPU has the fragmented data of the fault log transmitted by monitoring a chip select signal line on the ESPI bus, and when determining that the fragmented data of the fault log is transmitted, the management controller reads the buffer register to receive the first fragmented data. For example, it may be that the chip select signal line is high indicating that the management controller is to receive data (i.e., the fragmented data of the fault log).
S304: the management controller writes at least one first byte of the first fragmented data to the log file.
In some embodiments, after receiving the first shard data, the management controller writes the first shard data to the log file.
In other embodiments, after receiving the first sliced data, the management controller determines whether the first sliced data meets the requirement, if yes, writes the first sliced data into the log file, and if no, ends.
Various implementations are included for how to determine whether the first sliced data meets the requirements. Hereinafter, by means 1 to 2, an exemplary description will be made.
Mode 1: and judging whether the first fragment data meets the requirement according to the identification of the management controller included in the first fragment data.
In some embodiments, after receiving the first sliced data of the fault log, the management controller first determines whether the identifier of the management controller included in the first sliced data is the same as the actual identifier of the management controller, and if the determination result is the same, which indicates that the data format of the first sliced data meets the requirement, the management controller writes the first sliced data of the fault log into the log file. If the judging result is different, the data format of the first fragment data is not satisfied, and the first fragment data of the fault log is not written into the log file.
In the above mode 1, the first fragmented data further includes the identifier of the management controller, and when the management controller verifies that the data format of the first fragmented data meets the requirement based on the identifier of the management controller, at least one first byte is written into the log file, so that the management controller is facilitated to accurately identify the content of the fault log.
Mode 2: and judging whether the first sliced data meets the requirement according to the first checksum included in the first sliced data.
Wherein the first checksum is determined by the CPU based on the content of the first fragment data.
In some embodiments, after receiving the first piece of data of the fault log, the management controller generates a second checksum according to the content of the first piece of data, determines whether the second checksum is identical to the first checksum, and if the determination result is identical, which indicates that the content of the first piece of data is changed, the management controller writes the first piece of data of the fault log into the log file. If the judging result is different, the content of the first fragment data is changed, and the first fragment data of the fault log is not written into the log file.
Note that, the above-described modes 1 and 2 may be executed separately, for example, only mode 1 or only mode 2 may be executed. Of course, the above-described mode 1 and mode 2 may also be performed simultaneously, which is not limited in the embodiment of the present application.
In the above manner 2, by setting that the first sliced data further includes a first checksum, where the first checksum is generated by the CPU according to the content of the first sliced data, and when the management controller verifies that the content of the first sliced data is not changed according to the first checksum, at least one first byte is written into the log file, so that it is helpful to avoid storing at least one first byte of the content error, so that the cause of the failure of the computer device cannot be accurately analyzed according to the failure log later.
Optionally, the first slice data includes a data offset of the at least one first byte, the data offset being used to indicate a recording position of the at least one first byte in the fault log.
In some embodiments, when the management controller receives the first sliced data and writes at least one first byte in the first sliced data into the log file, a recording position of the at least one first byte in the fault log file may be determined according to the data offset, so as to ensure that the writing position of the at least one first byte is correct.
In the above embodiment, the first sliced data further includes the data offset of at least one first byte, so if the receiving order of the first sliced data and other sliced data is different from the sending order of the first sliced data and other sliced data, for example, the CPU sends the first sliced data first and then sends other sliced data, and the management controller receives the other sliced data first and then receives the first sliced data, the management controller may write at least one first byte into the log file according to the data offset of at least one first byte, which helps to ensure that the order of a plurality of bytes in the fault log stored by the management controller is the same as the order of a plurality of bytes in the fault log generated by the CPU, and further helps to ensure the accuracy of the fault log stored by the management controller.
Optionally, the first fragment data includes operation information; the operation information is used for indicating an operation type to which the first piece of data belongs, and the operation type comprises any one of write preparation, write data or write end.
In some embodiments, the management controller determines the transmission stage of the fault log according to the operation indicated by the operation information included in the first piece of data after receiving the first piece of data.
Illustratively, if the operation information indicates write preparation, it is indicated that the first sliced data is the first sliced data of the fault log transmitted by the CPU. And if the operation information indicates that the writing is finished, the first piece of the data is the last piece of the data of the fault log transmitted by the CPU. If the operation information indicates writing data, it indicates that the first sliced data is the sliced data of the middle part of the fault log, that is, the first sliced data is not the first sliced data and the last sliced data is not the last sliced data.
In the above embodiment, by setting that the first slice data further includes operation information, and the operation information indicates that the operation type to which the first slice data belongs is write preparation, write data, or write end, the management controller is facilitated to accurately determine the transmission stage of the fault log according to the operation information.
Optionally, the log file includes a plurality of subfiles. Wherein the number of bytes writable by each of the plurality of subfiles is a target value. In this way, by limiting the number of bytes that each subfile can write, it helps to avoid excessive occupation of the memory space of the management controller.
In some embodiments, the total remaining space of the plurality of subfiles has a writable byte count greater than or equal to a total byte count of the at least one first byte, and the management controller writes the at least one first byte to the total remaining space.
For example, the plurality of subfiles includes a first subfile including a portion of a total remaining space, and the number of bytes writable by the portion of the remaining space is greater than or equal to the total number of bytes of the at least one first byte, and the management controller writes the at least one first byte to the first subfile.
In other embodiments, the total remaining space of the plurality of subfiles may be written with a number of bytes that is less than the total number of bytes of the at least one first byte, and the management controller writes a portion of the at least one first byte into the total remaining space.
In addition, as shown in fig. 4, another part of the at least one first byte is written into the target space, and the writing time of the fault log in the target space is earlier than the writing time of the fault log in the non-target space. In this way, the loop covers the earliest written fault log, helping to reduce the storage space occupied by the fault log.
S305: the management controller sends a target interrupt signal to the CPU to instruct the CPU to continue sending fault logs.
Alternatively, the CPU is instructed to continue sending the fault log by the target interrupt signal, including various implementations, as exemplified below by way 1 through 2.
In mode 1, the target interrupt signal is used to instruct the CPU to perform an operation of reading response information sent by the management controller, the response information being used to indicate that the fault log has been successfully received.
In some embodiments, after the CPU sends the first fragment data to the management controller, the CPU also sends query information to the management controller, where the query information is used to query whether the fault log is successfully received. After successfully receiving the first fragment data, the management controller returns response information to the CPU in response to the inquiry information, wherein the response information is used for indicating that the fault log is successfully received. After the management controller finishes sending the response information, the management controller also sends a target interrupt signal to the CPU, wherein the target interrupt signal corresponds to a first interrupt program, and the first interrupt program is used for reading the response information sent by the management controller.
For example, the CPU executes task 1, task 2, task 3, and task 4 (task of querying the reply of the management controller) simultaneously. Based on the scheme in the related art, the CPU executes task 2 in the idle period of task 1, executes task 3 in the idle period of task 2, and executes task 4 in the idle period of task 3, and if the polling is performed this time, reads the reply to the management controller. Then the next time the poll is waited, a reply is read to the management controller, and the management controller is confirmed to have successfully received the partial data, then the other partial data of the fault log is transmitted continuously. Based on the scheme of the application, the CPU can execute task 4 preferentially after receiving the target interrupt signal, read the response information sent by the management controller, and continue to send the fault log after confirming that the fault log is successfully received, so that the task 4 does not need to wait for execution when polling next time, and the time interval for sending different fragmented data can be shortened greatly.
In the mode 1, since the CPU receives the interrupt signal and then preferentially executes the interrupt program corresponding to the interrupt signal, by setting the target interrupt signal to instruct the CPU to execute the operation of reading the response information sent by the management controller, the CPU can preferentially read the response information of the management controller after receiving the target interrupt signal, thereby avoiding the CPU from polling the response information returned by the management controller, improving the speed of the CPU to confirm that the management controller has successfully received the fault log, and further shortening the time interval between the CPU and sending different fragment data, and further shortening the sending time of the fault log.
Mode 2, the target interrupt signal is used to instruct the CPU to perform an operation of sending the fault log.
In some embodiments, the management controller sends the target interrupt signal to the CPU after successfully receiving the first fragment data. The target interrupt signal corresponds to a second interrupt routine that is a fault log sent to the management controller.
In the mode 2, since the CPU executes the interrupt program corresponding to the interrupt signal after receiving the interrupt signal, by setting the target interrupt signal for instructing the CPU to execute the operation of transmitting the fault log, it is possible to achieve that the CPU is instructed to continue transmitting the fault log by the target interrupt signal. In addition, after the CPU receives the interrupt signal, the interrupt program corresponding to the interrupt signal can be preferentially executed, so that the CPU is instructed to continuously send the fault log through the target interrupt signal, the time interval between different pieces of data sent by the CPU can be shortened, and the sending time of the fault log is shortened.
In some embodiments, after receiving each byte of the first sliced data, the management controller sends a target interrupt signal to the CPU using the ESPI bus. In this way, a faster transmission of the target interrupt signal is facilitated, which in turn facilitates a further shortening of the time interval between the first fragmented data and the second allocated data.
In other embodiments, the management controller sends a target interrupt signal to the CPU after writing at least one first byte of the first fragmented data to the log file, helping to ensure the integrity of the at least one first byte written to the log file.
S306: the CPU receives a target interrupt signal sent by the management controller.
S307: and the CPU sends second fragment data of the fault log to the management controller according to the target interrupt signal.
Wherein the second shard data includes at least one second byte of the fault log; the second shard data indicates that at least one second byte is written to the fault log file.
In some embodiments, based on the mode 1 in S305, after receiving the target interrupt signal, the cpu responds to the target interrupt signal to execute the first interrupt program corresponding to the target interrupt signal, that is, after reading the response information sent by the management controller. After reading the response information, the CPU confirms that the management controller has successfully received the fault log (i.e., the first fragmented data), based on which the CPU continues to send the second fragmented data to the management controller.
In other embodiments, based on the mode 2 in S305, after receiving the target interrupt signal, the cpu executes the second interrupt program corresponding to the target interrupt signal, that is, sends the fault log to the management controller, in response to the target interrupt signal. Based on this, the CPU continues to transmit the second piece of data to the management controller.
In some embodiments, if the OS fails and the second piece of data is the last piece of data of the fault log, after the second piece of data is sent to the management controller, the OS running by the CPU executes the fault processing flow.
S308: the management controller receives second piece data of the fault log sent by the CPU.
Note that the implementation principle of S308 is the same as that of S303, and thus, with respect to the implementation procedure of S308 and the related description, reference may be made to S303 described above, which is not described in detail herein.
S309: the management controller writes at least one second byte of the second piece of data to the log file.
Note that the implementation principle of S309 is the same as that of S304, and thus, regarding the implementation procedure and the related description of S309, reference may be made to S304 described above, which is not described in detail herein.
In the above embodiment, after the computer device fails, the CPU generates a failure log to record failure information of the computer device. And after receiving the first sliced data of the fault log, the management controller writes at least one first byte of the fault log included in the first sliced data into a log file, and actively transmits a target interrupt signal to the CPU to instruct the CPU to continue transmitting the fault log file. After receiving the target interrupt signal, the CPU sends second piece of data of the fault log to the management controller according to the target interrupt signal, so that the management controller can quickly receive the second piece of data of the fault log, and the second piece of data comprises at least one second byte of the fault log. After the CPU sends the first fragmented data of the fault log, the second fragmented data of the fault log is continuously sent according to the first interrupt signal, so that the waiting time of polling is saved relative to the reply of the CPU polling inquiry management controller, the time interval between the first fragmented data and the second fragmented data can be greatly shortened, the transmission speed of the fault log is improved, the transmission time of the fault log is effectively shortened, and the CPU is helped to enter the subsequent fault processing flow rapidly.
Hereinafter, taking the first piece of data as an example, a transmission procedure of each piece of data of the plurality of pieces of piece of data of the fault log will be exemplarily described.
Alternatively, the first tile data may include a plurality of data. The plurality of data includes first data and second data. The first data and the second data are any two data of the plurality of data, and the first data and the second data are different data.
Wherein the first data and the second data may comprise one byte. Alternatively, a plurality of bytes may be included. The embodiments of the present application are not limited in this regard.
The process of transferring the first sliced data between the CPU and the management controller may include the following steps one to six.
Step one: the CPU sends the first data to the management controller.
Step two: the management controller receives first data sent by the CPU.
Step three: the management controller sends a first interrupt signal to the CPU to instruct the CPU to continue sending fault logs.
Step four: the CPU receives a first interrupt signal sent by the management controller.
Step five: the CPU sends second data to the management controller according to the first interrupt signal.
Step six: the management controller receives the second data sent by the CPU.
The implementation principle of the first to sixth steps is the same as that of the S302 to S308, and therefore, the implementation process and the related description of the first to sixth steps may refer to the S302 to S308, and are not repeated here.
In the above embodiment, the first slice data includes first data and second data, where the first data and the second data include at least one byte, and the management controller sends a first interrupt signal to the CPU after receiving the first data, so as to instruct the CPU to continue sending the fault log, thereby implementing fast receiving of the second data sent by the CPU according to the first interrupt signal. After the CPU sends the first data, the second data can be continuously sent according to the first interrupt signal, so that the waiting time of polling is saved relative to the reply of the CPU polling inquiry management controller, therefore, the time interval between the first data and the second data can be shortened, the transmission speed of the first fragmented data is improved, the time for the management controller to receive the first fragmented data is further effectively shortened, and the time for the management controller to receive the fault log is further shortened.
The foregoing description of the solution provided in the embodiments of the present application has been mainly presented in terms of a method. In order to achieve the above functions, the data fault log recording device includes a hardware structure and/or a software module for executing the respective functions. Those of skill in the art will readily appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiment of the present application, according to the above method, the fault log recording device may be exemplarily divided into functional modules, for example, the fault log recording device may include each functional module corresponding to each functional division, or two or more functions may be integrated into one processing module. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present application, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation.
By way of example, fig. 5 shows a schematic diagram of one possible structure of the fault logging device (denoted as fault logging device 500) referred to in the above embodiment, the fault logging device 500 comprising a receiving unit 501, a writing unit 502 and a sending unit 503. A receiving unit 501, configured to control the management controller to receive first slice data of the fault log sent by the CPU; the fault log is used to record fault information of the computer device, and the first shard data includes at least one first byte of the fault log. For example, S303 shown in fig. 3. A writing unit 502, configured to control the management controller to write at least one first byte in the first slice data into the log file. For example, S304 shown in fig. 3. A transmitting unit 503 for controlling the management controller to transmit a target interrupt signal to the CPU; the target interrupt signal is used for indicating the CPU to continue sending the fault log. For example, S305 shown in fig. 3. A receiving unit 501, configured to control the management controller to receive second slice data of the fault log sent by the CPU; the second slice data is transmitted by the CPU according to the target interrupt signal, and the second slice data comprises at least one second byte of the fault log. For example, S308 shown in fig. 3.
Optionally, the first slice data includes first data and second data, and the first data and the second data include at least one byte; the receiving unit 501 is specifically configured to control the management controller: receiving first data sent by a CPU; sending a first interrupt signal to the CPU; the first interrupt signal is used for indicating the CPU to continue sending the fault log; receiving second data sent by a CPU; the second data is transmitted by the CPU in response to the first interrupt signal.
Optionally, the sending unit 503 is further configured to control the management controller: and sending target information to the CPU, wherein the target information is used for indicating the CPU to transmit the fault log according to the interrupt signal sent by the management controller.
Optionally, the first slice data and the frame length of the first slice data are preset values; wherein the preset value is greater than or equal to a preset threshold.
Optionally, the first shard data further includes an identification of the management controller; the writing unit 502 has a controller for controlling management: if the identification of the management controller is the same as the actual identification of the management controller, writing the first fragment data of the fault log into a log file; the identity of the management controller is identical with the actual identity of the management controller, which indicates that the data format of the first fragment data meets the requirement.
Optionally, the first sliced data further includes a first checksum, the first checksum being determined by the CPU based on the content of the first sliced data; the writing unit 502 has a controller for controlling management: determining a second checksum of the first sliced data based on the content of the first sliced data; if the second checksum is the same as the first checksum, writing the first fragment data of the fault log into a log file; wherein the second checksum being the same as the first checksum indicates that the contents of the first fragmented data have not been altered.
Optionally, the first slice data further includes a data offset of the at least one first byte, the data offset being used to indicate a recording position of the at least one first byte in the fault log.
Optionally, the first fragment data further includes operation information; the operation information is used for indicating an operation type to which the first piece of data belongs, and the operation type comprises any one of write preparation, write data or write end.
For a specific description of the above alternative modes, reference may be made to the foregoing method embodiments, and details are not repeated here. In addition, any explanation and description of the beneficial effects of the fault log recording device 500 provided above may refer to the corresponding method embodiments described above, and will not be repeated.
By way of example, fig. 6 shows a schematic diagram of one possible structure of the fault logging device (denoted as fault logging device 600) referred to in the above-described embodiment, the fault logging device 600 comprising a processing unit 601, a transmitting unit 602 and a receiving unit 603. A processing unit 601, configured to control the CPU to generate a fault log; the fault log is used for recording fault information of the computer equipment. For example, S301 shown in fig. 3. A sending unit 602, configured to control the CPU to send first slice data of the fault log to the management controller; the first shard data includes at least one first byte of the fault log. For example, S302 shown in fig. 3. And a receiving unit 603, configured to control the CPU to receive a target interrupt signal sent by the management controller, where the target interrupt signal indicates that the fault log is continuously sent. For example, S306 shown in fig. 3. The sending unit 602 is further configured to control the CPU to send second slice data of the fault log to the management controller according to the target interrupt signal; the second shard data includes at least one second byte of the fault log. For example, S307 shown in fig. 3.
Optionally, the first slice data includes first data and second data, and the first data and the second data include at least one byte; the sending unit 602 is specifically configured to control the CPU: transmitting the first data to a management controller; receiving a first interrupt signal sent by a management controller; the first interrupt signal indicates that the fault log continues to be sent; and sending the second data to the management controller according to the first interrupt signal.
Optionally, the receiving unit 603 is further configured to control the CPU to receive the target information sent by the management controller; the target information is used for indicating the CPU to transmit the fault log according to the interrupt signal sent by the management controller.
Optionally, the first slice data and the frame length of the first slice data are preset values; wherein the preset value is greater than or equal to a preset threshold.
Optionally, the first shard data further includes an identification of the management controller; the identification of the management controller is used for indicating whether the data format of the first fragment data meets the requirement.
Optionally, the first sliced data further includes a first checksum, the first checksum being determined by the CPU based on the content of the first sliced data; the first checksum is used to indicate whether the contents of the first piece of data are changed.
Optionally, the first slice data further includes a data offset of the at least one first byte, the data offset being used to indicate a recording position of the at least one first byte in the fault log.
Optionally, the first fragment data further includes operation information; the operation information is used for indicating an operation type to which the first piece of data belongs, and the operation type comprises any one of write preparation, write data or write end.
Optionally, the CPU and the management controller transmit fault logs through a target bus; the processing unit 601 is further configured to control the CPU: configuring a value of an interrupt register of a control target bus as a target value; the target value is used for indicating that the interrupt signal sent by the management controller is in an active state.
For a specific description of the above alternative modes, reference may be made to the foregoing method embodiments, and details are not repeated here. In addition, any explanation and description of the beneficial effects of the fault log recording device 600 provided above may refer to the corresponding method embodiments described above, and will not be repeated.
The embodiment of the application also provides computer equipment, which comprises a Central Processing Unit (CPU) and a management controller, wherein the management controller stores log files; the management controller is used for receiving first fragment data of the fault log sent by the CPU; the fault log is used for recording fault information of the computer equipment, and the first fragmentation data comprises at least one first byte of the fault log; the management controller is further used for writing at least one first byte in the first sliced data into the log file; the management controller is also used for sending a target interrupt signal to the CPU; the target interrupt signal is used for indicating to continue sending the fault log; the management controller is also used for receiving fault log second fragment data sent by the CPU; wherein the second fragmented data is sent by the CPU according to the target interrupt signal, the second fragmented data comprising at least one second byte of the fault log.
The embodiment of the application also provides computer equipment, which comprises a Central Processing Unit (CPU) and a management controller, wherein the management controller stores log files; a CPU for generating a fault log; the fault log is used for recording fault information of the computer equipment; the CPU is also used for sending the first fragment data of the fault log to the management controller; the first shard data includes at least one first byte of the fault log; the CPU is also used for receiving a target interrupt signal sent by the management controller, wherein the target interrupt signal indicates that the fault log is continuously sent; the CPU is further used for sending second piece of data of the fault log to the management controller according to the target interrupt signal, wherein the second piece of data comprises at least one second byte of the fault log.
The embodiment of the application also provides a computer device, which comprises a processor and a memory, wherein the processor is connected with the memory, the memory stores computer execution instructions, and the processor realizes the data processing method in the embodiment when executing the computer execution instructions. The embodiments of the present application do not set any limit to the specific form of the computer device. For example, the computer device may be a terminal device or a network device. Wherein the terminal device may be referred to as: a terminal, user Equipment (UE), a terminal device, an access terminal, a subscriber unit, a subscriber station, a mobile station, a remote terminal, a mobile device, a user terminal, a wireless communication device, a user agent, a user equipment, or the like. The terminal device may be a mobile phone, an augmented reality (augmented reality, AR) device, a Virtual Reality (VR) device, a tablet, a notebook, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (personal digital assistant, PDA), or the like. The network device may be a server or the like in particular. The server may be one physical or logical server, or may be two or more physical or logical servers sharing different responsibilities, and cooperate to implement various functions of the server.
Embodiments of the present application also provide a computer-readable storage medium having stored thereon a computer program which, when run on a computer, causes the computer to perform a method performed by any one of the computer devices provided above.
For the explanation of the relevant content and the description of the beneficial effects in any of the above-mentioned computer-readable storage media, reference may be made to the above-mentioned corresponding embodiments, and the description thereof will not be repeated here.
The embodiment of the application also provides a chip. The chip has integrated therein control circuitry and one or more ports for implementing the functions of the computer device described above. Optionally, the functions supported by the chip may be referred to above, and will not be described herein. Those of ordinary skill in the art will appreciate that all or a portion of the steps implementing the above-described embodiments may be implemented by a program to instruct associated hardware. The program may be stored in a computer readable storage medium. The above-mentioned storage medium may be a read-only memory, a random access memory, or the like. The processing unit or processor may be a central processing unit, a general purpose processor, an application specific integrated circuit (application specific integrated circuit, ASIC), a microprocessor (digital signal processor, DSP), a field programmable gate array (field programmable gate array, FPGA) or other programmable logic device, transistor logic device, hardware components, or any combination thereof.
Embodiments of the present application also provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform any of the methods of the above embodiments. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, a website, computer, server, or data center via a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. Computer readable storage media can be any available media that can be accessed by a computer or data storage devices including one or more servers, data centers, etc. that can be integrated with the media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., SSD), etc.
It should be noted that the above-mentioned devices for storing computer instructions or computer programs, such as, but not limited to, the above-mentioned memories, computer-readable storage media, communication chips, and the like, provided in the embodiments of the present application all have non-volatility (non-transparency).
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using a software program, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, a website, computer, server, or data center via a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. Computer readable storage media can be any available media that can be accessed by a computer or data storage devices including one or more servers, data centers, etc. that can be integrated with the media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
Although the present application has been described herein in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed application, from a review of the figures, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
Although the present application has been described in connection with specific features and embodiments thereof, it will be apparent that various modifications and combinations can be made without departing from the spirit and scope of the application. Accordingly, the specification and drawings are merely exemplary illustrations of the present application as defined in the appended claims and are considered to cover any and all modifications, variations, combinations, or equivalents that fall within the scope of the present application. It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims (10)

1. The fault log recording method is used for computer equipment, wherein the computer equipment comprises a Central Processing Unit (CPU) and a management controller, and the management controller stores log files; the method comprises the following steps:
receiving first fragment data of a fault log sent by the CPU; the fault log is used for recording fault information of the computer equipment, and the first slicing data comprises at least one first byte of the fault log;
writing the at least one first byte in the first sliced data to the log file;
sending a target interrupt signal to the CPU; the target interrupt signal is used for indicating to continue sending the fault log;
receiving second fragment data of the fault log sent by the CPU; the second slice data is sent by the CPU according to the target interrupt signal, and the second slice data includes at least one second byte of the fault log.
2. The method according to claim 1, wherein the method further comprises:
and sending target information to the CPU, wherein the target information is used for indicating the CPU to transmit the fault log according to the interrupt signal sent by the management controller.
3. A method according to claim 1 or 2, characterized in that,
the frame lengths of the first fragment data and the second fragment data are preset values; and the management controller and the CPU transmit the fault log based on a target interface, wherein the preset value is the maximum value of the frame length of data transmitted by the target interface.
4. A method according to any of claims 1-3, wherein the first shard data further comprises an identification of the management controller; the writing the at least one first byte in the first sliced data to the log file includes:
and if the identifier of the management controller is the same as the actual identifier of the management controller, writing the at least one first byte in the first fragment data into the log file.
5. The method according to any one of claims 1 to 4, wherein,
the first slice data comprises a data offset of the at least one first byte, the data offset being used to indicate a recording position of the at least one first byte in the fault log; and/or
The first fragment data comprises operation information; the operation information is used for indicating an operation type of the first sliced data, and the operation type comprises any one of write preparation, write data or write end.
6. The fault log recording method is used for computer equipment, wherein the computer equipment comprises a Central Processing Unit (CPU) and a management controller, and the management controller stores log files; the method is performed by the CPU; the method comprises the following steps:
generating a fault log; the fault log is used for recording fault information of the computer equipment;
sending first fragmented data of the fault log to the management controller; the first shard data includes at least one first byte of the fault log;
receiving a target interrupt signal sent by the management controller; the target interrupt signal indicates to continue sending the fault log;
according to the target interrupt signal, sending second fragment data of the fault log to the management controller; the second shard data includes at least one second byte of the fault log.
7. The method of claim 6, wherein the method further comprises:
receiving target information sent by the management controller; the target information is used for indicating the CPU to transmit the fault log according to the interrupt signal sent by the management controller.
8. The method according to claim 6 or 7, wherein the CPU and the management controller transmit the fault log using a target bus, the target bus being connected to an interrupt register of the CPU; the method further comprises the steps of:
configuring the value of the interrupt register of the connection of the target bus as a target value; the target value is used to indicate that the interrupt signal is in an active state.
9. A system method, characterized by being used for a computer device, the computer device comprising a central processing unit CPU and a management controller, the management controller storing a log file; the method comprises the following steps:
the CPU generates a fault log; the fault log is used for recording fault information of the computer equipment;
the CPU sends first fragment data of the fault log to the management controller; the first shard data includes at least one first byte of the fault log;
the management controller receives first sliced data of the fault log sent by the CPU and writes at least one first byte in the first sliced data into the log file;
the management controller sends a target interrupt signal to the CPU; the target interrupt signal is used for indicating the CPU to continue sending the fault log;
The CPU receives a target interrupt signal sent by the management controller and sends second fragment data of the fault log to the management controller according to the target interrupt signal; the second shard data includes at least one second byte of the fault log;
and the management controller receives second fragment data of the fault log sent by the CPU.
10. The computer equipment is characterized by comprising a Central Processing Unit (CPU) and a management controller, wherein the management controller stores log files, and the CPU is connected with the management controller;
the CPU generates a fault log; the fault log is used for recording fault information of the computer equipment;
the CPU sends first fragment data of the fault log to the management controller; the first shard data includes at least one first byte of the fault log;
the management controller receives first sliced data of the fault log sent by the CPU and writes at least one first byte in the first sliced data into the log file;
the management controller sends a target interrupt signal to the CPU; the target interrupt signal is used for indicating the CPU to continue sending the fault log;
The CPU receives a target interrupt signal sent by the management controller and sends second fragment data of the fault log to the management controller according to the target interrupt signal; the second shard data includes at least one second byte of the fault log;
and the management controller receives second fragment data of the fault log sent by the CPU.
CN202211581975.9A 2022-12-09 2022-12-09 Fault log recording method, system method and equipment Pending CN116185678A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211581975.9A CN116185678A (en) 2022-12-09 2022-12-09 Fault log recording method, system method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211581975.9A CN116185678A (en) 2022-12-09 2022-12-09 Fault log recording method, system method and equipment

Publications (1)

Publication Number Publication Date
CN116185678A true CN116185678A (en) 2023-05-30

Family

ID=86437291

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211581975.9A Pending CN116185678A (en) 2022-12-09 2022-12-09 Fault log recording method, system method and equipment

Country Status (1)

Country Link
CN (1) CN116185678A (en)

Similar Documents

Publication Publication Date Title
JP5128708B2 (en) SAS controller with persistent port configuration
JP4641546B2 (en) Method and system for handling input / output (I / O) errors
US20070165520A1 (en) Port trunking between switches
CN114868117B (en) Peer-to-peer storage device messaging over control bus
WO2014150352A1 (en) Network controller sharing between smm firmware and os drivers
CA3129982A1 (en) Method and system for accessing distributed block storage system in kernel mode
CN114817105A (en) Method and device for device enumeration, computer device and storage medium
CN116724297A (en) Fault processing method, device and system
CN115576613A (en) PXE-based operating system installation method, computing device and communication system
CN116483600A (en) Memory fault processing method and computer equipment
CN116107690A (en) Virtual machine memory management method and computing device
CN115292077A (en) Kernel exception handling method and system
US20240086339A1 (en) Systems, methods, and devices for accessing a device operating system over an interconnect
CN117453242A (en) Application updating method of virtual machine, computing equipment and computing system
CN116185678A (en) Fault log recording method, system method and equipment
CN114880266B (en) Fault processing method and device, computer equipment and storage medium
CN116302625A (en) Fault reporting method, device and storage medium
CN116204214A (en) BMC upgrading method, device and system, electronic equipment and storage medium
CN116578316A (en) Firmware updating method, device, server and storage medium of equipment
CN115454896A (en) SMBUS-based SSD MCTP control message verification method and device, computer equipment and storage medium
CN111767082A (en) Computing chip starting method and device and computer system
CN115202803A (en) Fault processing method and device
CN111427814A (en) Inter-core communication method based on AMP system, terminal and storage medium
CN117667465B (en) Code sharing method, device, switch, multi-host system, equipment and medium
CN118051366A (en) Fault processing method and computing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination