CN114780271A - Hardware fault processing method and electronic equipment - Google Patents

Hardware fault processing method and electronic equipment Download PDF

Info

Publication number
CN114780271A
CN114780271A CN202210315783.7A CN202210315783A CN114780271A CN 114780271 A CN114780271 A CN 114780271A CN 202210315783 A CN202210315783 A CN 202210315783A CN 114780271 A CN114780271 A CN 114780271A
Authority
CN
China
Prior art keywords
hardware
information
address
address information
fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210315783.7A
Other languages
Chinese (zh)
Inventor
黄树福
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN202210315783.7A priority Critical patent/CN114780271A/en
Publication of CN114780271A publication Critical patent/CN114780271A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0775Content or structure details of the error report, e.g. specific table structure, specific error fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application discloses a hardware fault processing method and electronic equipment, wherein the method comprises the following steps: responding to hardware faults of electronic equipment and triggering a machine abnormity checking mechanism, and acquiring fault information which is generated by a machine abnormity detection mechanism and used for describing the hardware faults; calling a pre-constructed relation table; and acquiring hardware information of the fault hardware based on the corresponding relation between the first address information and the second address information and the incidence relation between the second address information and the hardware information. By using the method, the hardware information of the fault hardware can be directly acquired on the operating system level, so that maintenance personnel can quickly remove the fault, and the method is beneficial to shortening the overhaul time.

Description

Hardware fault processing method and electronic equipment
Technical Field
The present disclosure relates to the field of electronic devices, and in particular, to a hardware fault handling method and an electronic device.
Background
A Machine Check Error (MCE) mechanism is usually preset in a processor core of the electronic device, and when the processor detects a hardware fault, the MCE is triggered to detect the hardware fault and store generated fault information in a special module register (MSR), so that a maintenance worker can analyze and process the fault conveniently. However, the fault information usually does not include hardware information capable of uniquely identifying the faulty hardware, the operating system layer cannot directly determine which hardware is faulty based on the fault information, and the maintenance personnel usually need to obtain bottom layer information to analyze which hardware is faulty, so that the maintenance personnel is not facilitated to quickly remove the fault, which brings obstacles to the maintenance work.
Disclosure of Invention
The application provides a hardware fault processing method and electronic equipment, and the technical scheme adopted by the embodiment of the application is as follows:
a hardware fault handling method comprises the following steps:
responding to hardware faults of electronic equipment and triggering a machine abnormity checking mechanism, and acquiring fault information which is generated by a machine abnormity detection mechanism and used for describing the hardware faults; the fault information comprises first address information, and the first address information is used for identifying the storage position of data triggering hardware faults in a memory space or an IO space;
calling a pre-constructed relation table; the relationship table comprises hardware information of each hardware of the electronic equipment and an incidence relationship between each hardware information and second address information, wherein the hardware information is used for uniquely identifying the hardware, and the second address information is used for identifying a storage space allocated to the hardware in an internal memory space or an IO space;
and acquiring hardware information of the fault hardware based on the corresponding relation between the first address information and the second address information and the incidence relation between the second address information and the hardware information.
In some embodiments, the method further comprises:
executing initialization operation on the electronic equipment through BIOS or UEFI, and acquiring the hardware information and the second address information of each piece of hardware;
and constructing the relation table through a BIOS or UEFI based on the association relation between the hardware information and the second address information.
In some embodiments, the method further comprises:
acquiring hardware information of a first type of hardware and an address range of a storage space allocated to the first type of hardware in a memory space or an IO space;
constructing a first relation table based on the address range and the hardware information of the first type of hardware; and/or
Acquiring hardware information of second hardware and an allocation rule for allocating a storage space for the second hardware in an internal memory space or an IO space;
and constructing a second relation table based on the distribution rule and the hardware information of the second type of hardware.
In some embodiments, the obtaining hardware information of the faulty hardware based on the corresponding relationship between the first address information and the second address information and the association relationship between the second address information and the hardware information includes:
determining the type of the fault hardware based on the first address information and an address space mapping table;
under the condition that the fault hardware is first-class hardware, determining an address range corresponding to the first address information in the first relation table, and acquiring hardware information associated with the address range;
and under the condition that the fault hardware is second-class hardware, acquiring the hardware information of the second-class hardware based on the first address information and the allocation rule recorded in the second relation table.
In some embodiments, the failure information further includes a first identifier for identifying validity of the first address information;
the calling of the pre-constructed relationship table comprises the following steps:
in the case where it is determined that the first address information is valid based on the first identifier, a pre-constructed relationship table is retrieved.
In some embodiments, the method further comprises:
and recording an error log based on the hardware information of the failed hardware.
An electronic device, comprising:
the first acquisition module is used for responding to hardware faults of the electronic equipment and triggering a machine abnormity checking mechanism, and acquiring fault information which is generated by the machine abnormity detection mechanism and used for describing the hardware faults; the fault information comprises first address information, and the first address information is used for identifying the storage position of data triggering hardware fault in a memory space or an IO space;
the calling module is used for calling a pre-constructed relation table; the relationship table comprises hardware information of each hardware of the electronic equipment and an incidence relationship between each hardware information and second address information, wherein the hardware information is used for uniquely identifying the hardware, and the second address information is used for identifying a storage space allocated to the hardware in an internal memory space or an IO space;
and the determining module is used for acquiring the hardware information of the fault hardware based on the corresponding relation between the first address information and the second address information and the incidence relation between the second address information and the hardware information.
In some embodiments, the electronic device further comprises:
the second obtaining module is used for executing initialization operation on the electronic equipment through a BIOS (basic input output System) or UEFI (unified extensible firmware interface), and obtaining the hardware information and the second address information of each piece of hardware;
and the first construction module is used for constructing the relation table based on the association relation between the hardware information and the second address information through BIOS or UEFI.
In some embodiments, the electronic device further comprises:
the third acquisition module is used for acquiring hardware information of the first type of hardware and an address range of a storage space allocated to the first type of hardware in a memory space or an IO space;
the second building module is used for building a first relation table based on the address range and the hardware information of the first type of hardware; and/or
The fourth acquisition module is used for acquiring the hardware information of the second type of hardware and the allocation rule for allocating the storage space for the second type of hardware in the memory space or the IO space;
and the third building module is used for building a second relation table based on the distribution rule and the hardware information of the second type of hardware.
In some embodiments, the determining module is specifically configured to:
determining the type of the fault hardware based on the first address information and an address space mapping table;
under the condition that the fault hardware is first-class hardware, determining an address range corresponding to the first address information in the first relation table, and acquiring hardware information associated with the address range;
and under the condition that the fault hardware is second-class hardware, acquiring the hardware information of the second-class hardware based on the first address information and the allocation rule recorded in the second relation table.
The hardware fault processing method of the embodiment of the application is characterized in that a relation table is constructed in advance, the relation table comprises hardware information of each piece of hardware of electronic equipment, second address information of each piece of hardware, and an association relation between the hardware information and the second address information, the hardware information can displace and identify the hardware, the second address information can identify a storage space allocated for the hardware in a memory space or an IO space, under the condition that the electronic equipment has hardware fault and triggers a machine anomaly detection Mechanism (MCE), fault information and the relation table generated by the MCE mechanism are obtained, the fault information comprises first address information, the first address information can identify the storage position of data triggering the hardware fault in the memory space or the IO space, and based on the corresponding relation between the first address information and the second address information and the association relation between the second address information and the hardware information, hardware information of the failed hardware can be determined. Therefore, the fault hardware can be directly determined on the operating system level, and the maintenance personnel can quickly remove the hardware fault, which is beneficial to shortening the overhaul time.
Drawings
FIG. 1 is a flowchart of an embodiment of a hardware fault handling method according to an embodiment of the present application;
FIG. 2 is a flowchart of another embodiment of a hardware fault handling method according to an embodiment of the present application;
fig. 3 is a block diagram of an embodiment of an electronic device according to an embodiment of the present application;
fig. 4 is a block diagram of another embodiment of an electronic device according to an embodiment of the present application.
Detailed Description
Various aspects and features of the present application are described herein with reference to the drawings.
It will be understood that various modifications may be made to the embodiments of the present application. Accordingly, the foregoing description should not be considered as limiting, but merely as exemplifications of embodiments. Those skilled in the art will envision other modifications within the scope and spirit of the application.
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the application and, together with a general description of the application given above, and the detailed description of the embodiments given below, serve to explain the principles of the application.
These and other characteristics of the present application will become apparent from the following description of a preferred form of embodiment, given as a non-limiting example, with reference to the attached drawings.
It should also be understood that, although the present application has been described with reference to some specific examples, a person of skill in the art shall certainly be able to achieve many other equivalent forms of application, having the characteristics as set forth in the claims and hence all coming within the field of protection defined thereby.
The above and other aspects, features and advantages of the present application will become more apparent in view of the following detailed description when taken in conjunction with the accompanying drawings.
Specific embodiments of the present application are described hereinafter with reference to the drawings; however, it is to be understood that the disclosed embodiments are merely exemplary of the application, which can be embodied in various forms. Well-known and/or repeated functions and constructions are not described in detail to avoid obscuring the application of unnecessary or unnecessary detail. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the present application in virtually any appropriately detailed structure.
The specification may use the phrases "in one embodiment," "in another embodiment," "in yet another embodiment," or "in other embodiments," which may each refer to one or more of the same or different embodiments in accordance with the application.
The embodiment of the application provides a hardware fault processing method, which is executed on an operating system level of electronic equipment and is used for processing fault information generated by a Machine anomaly checking Mechanism (MCE) and determining fault hardware, so that maintenance personnel can quickly remove hardware faults to shorten maintenance time.
Fig. 1 is a flowchart of a hardware fault processing method according to an embodiment of the present application, and referring to fig. 1, the hardware fault processing method according to the embodiment of the present application may specifically include the following steps.
S110, responding to the hardware fault of the electronic equipment and triggering a machine abnormity checking mechanism, and acquiring fault information which is generated by the machine abnormity detecting mechanism and used for describing the hardware fault. The fault information comprises first address information, and the first address information is used for identifying a storage position of data triggering hardware fault in a memory space or an IO space.
When a processor (CPU) of the electronic device executes a certain instruction and has an Error, the processor may trigger fault detection, and if it is determined that the instruction execution Error or the operation execution Error is caused by a hardware fault of the electronic device, a processor core may trigger a Machine Check Error (MCE). And detecting the hardware fault through the MCE mechanism, and generating fault information for describing the hardware fault. This fault information is typically stored in a hardware fault detection Architecture (MCA) register bank that includes a plurality of special Module (MSR) registers, such as, for example, an IA32_ MCG _ CAP MSR register, an IA32_ MCG _ STATUS MSR register, an IA32_ MCG _ CTL MSR register, an IA32_ MCG _ EXT _ CTL MSR register, an IA32_ MCi _ CTL MSR register, an IA32_ MCi _ STATUS MSR register, an IA32_ MCi _ ADDR MSR register, and so forth.
The fault information includes this information stored in the MCA register bank, and the first address information is typically stored in the IA32_ MCi _ ADDR MSR register as part of the fault information. The first address information points to a storage location of the data triggering the hardware fault in the memory space or the IO space, that is, the first address information points to data triggering the processor to perform fault detection. The data may be an instruction itself executed by the processor, and the first address information points to a storage location of the instruction in the memory space or the IO space. The data may also be target data to be operated on by a certain instruction, and in this case, the first address information points to a storage location of the target data in a memory space or an IO space.
Optionally, after the electronic device has a hardware failure and triggers the MCE mechanism, and the MCE mechanism detects the hardware failure and generates failure information, the operating system layer of the electronic device may, in response to the event, obtain all or part of the failure information from the MCA register set, but at least include the first address information in the IA32_ MCi _ ADDR MSR register.
And S120, calling a pre-constructed relation table. The relationship table includes hardware information of each hardware of the electronic device and an association relationship between each hardware information and second address information, where the hardware information is used to uniquely identify the hardware, and the second address information is used to identify a storage space allocated to the hardware in an internal memory space or an IO space.
The hardware of the electronic device includes, but is not limited to, a memory, a network card, a sound card, a graphics card (GPU), an optical disc drive, or other hardware devices inside the electronic device, such as other hardware devices connected to a motherboard through a PCI interface or a PCI-e interface. In the initialization process of the electronic device, a memory address space or an IO address space is allocated to each hardware, the memory address space may include one or more address ranges, the address ranges are used to identify a storage space allocated to the hardware in the memory space, the IO address space may also include one or more address ranges, and the address ranges are used to identify a storage space allocated to the hardware in the IO space.
On this basis, the second address information may include a memory address space or an IO address space of the hardware, and may also include an address range for identifying a storage space allocated to the hardware in the memory space or the IO space, or the second address information may also include a rule or the like for allocating a storage space to the hardware in the memory space or the IO space, as long as the included information can determine the storage space allocated to the hardware in the memory space or the IO space.
The processor and hardware may interoperate based on the memory space during operation of the electronic device. For example, data in hardware may be called into the memory space for the processor to read; or the data which needs to be sent to the hardware by the processor can be firstly stored in the storage space and sent to the hardware through the storage space; or instructions for controlling the hardware are sent to the storage space, so that a driver of the hardware can call the instructions from the storage space to execute the operation.
Wherein the hardware information is used to uniquely identify the hardware. Alternatively, the hardware information may include connection location information of hardware on the motherboard, vendor information, field replaceable unit information (FRU info), and the like. For example, the hardware information may include a location tag of the hardware, a Part Number (Part Number), a Serial Number (Serial Number), a manufacturer, a model Number, and so on, the location tag being used to identify a connection location of the hardware on the motherboard.
Alternatively, the relationship table may be pre-constructed by the BIOS, UEFI, or Operating System (OS) based on the hardware information and the second address information during the initialization process or after the operating system is started. The method can respond to the hardware failure of the electronic equipment and trigger the MCE mechanism, acquire failure information and call the relation table. It should be noted that, the order of the two steps of obtaining the fault information and calling the relationship table is not clear, the fault information may be obtained first, then the relationship table is called, the relationship table may be called first, then the fault information is obtained, or the fault information may be obtained and the relationship table may be called synchronously.
S130, acquiring hardware information of the fault hardware based on the corresponding relation between the first address information and the second address information and the incidence relation between the second address information and the hardware information.
Under the condition of acquiring the first address information and the relation table, the storage space of which hardware the data triggering the hardware fault is located can be determined, and further the MCE mechanism can be determined to be triggered due to the hardware fault, the hardware is the fault hardware, and the hardware information of the fault hardware can be acquired. For example, a location label, part number or serial number of the malfunctioning hardware, etc. is obtained.
Optionally, when both the first address information and the second address information include address ranges, it may be directly determined which address range included in the second address information the address range included in the first address information is located, and if the address range included in the first address information falls into an address range included in the second address information, it may be determined that hardware corresponding to the second address information is faulty hardware.
Optionally, when the second address information includes a rule for allocating a storage space for the hardware in a memory space or an IO space, the memory address space or the IO address space of each hardware may be obtained and determined based on the rule, and then the memory address space or the IO address space of which the first address information is located may be determined, that is, the faulty hardware may be determined, and then the hardware information of the faulty hardware may be obtained from the relationship table based on the association relationship recorded in the relationship table.
The hardware fault processing method of the embodiment of the application is characterized in that a relation table is constructed in advance, the relation table comprises hardware information of each piece of hardware of electronic equipment, second address information of each piece of hardware, and an association relation between the hardware information and the second address information, the hardware information can displace and identify the hardware, the second address information can identify a storage space allocated for the hardware in a memory space or an IO space, under the condition that the electronic equipment has hardware fault and triggers a machine anomaly detection Mechanism (MCE), fault information and the relation table generated by the MCE mechanism are obtained, the fault information comprises first address information, the first address information can identify the storage position of data triggering the hardware fault in the memory space or the IO space, and based on the corresponding relation between the first address information and the second address information and the association relation between the second address information and the hardware information, hardware information of the failed hardware can be determined. Therefore, the fault hardware can be directly determined on the operating system level, and the maintenance personnel can quickly remove the hardware fault, which is beneficial to shortening the overhaul time.
In some embodiments, the failure information further includes a first identifier for identifying validity of the first address information; step S120, calling a pre-constructed relation table, comprising:
and in the case that the first address information is determined to be effective based on the first identifier, calling a pre-constructed relation table.
The 58 th BIT (BIT58) of the IA32_ MCi _ STATUS MSR register is used as an identification BIT for identifying the validity of the first address information, and when the value in BIT58 is 1, the first address information is indicated to be valid, and when the value in BIT58 is 0, the first address information is indicated to be invalid.
On this basis, the value recorded in the BIT58 in the IA32_ MCi _ STATUS MSR register can be obtained before comparing the first address information with the relationship table. If the value of BIT58 is 1, the first address information is compared with the relation table, and if the value of BIT58 is 0, the comparison of the first address information with the relation table is abandoned, and other modes can be selected to acquire the hardware information of the failed hardware to avoid detection errors or invalid data processing.
As shown in conjunction with fig. 2, in some embodiments, the method further comprises:
executing initialization operation on the electronic equipment through BIOS or UEFI, and acquiring the hardware information and the second address information of each piece of hardware;
and constructing the relation table through a BIOS or UEFI based on the association relation between the hardware information and the second address information.
Optionally, after the electronic device is started, the BIOS or UEFI may perform an initialization operation on the electronic device, where the initialization operation includes two steps of detecting hardware and allocating resources to the hardware. The BIOS or UEFI can acquire hardware information of each piece of hardware in the hardware detection process, such as a position tag of the hardware, manufacturer information and the like. After the hardware detection is completed, the BIOS or UEFI allocates resources to each hardware, where the allocating includes allocating a memory address space or an IO address space to the hardware, that is, allocating a storage space to the hardware in the memory space or the IO address space. Therefore, the hardware information of each hardware, the memory address space or the IO address space of each hardware can be recorded through the BIOS or UEFI, and the relation table is constructed based on the hardware information of each hardware and the memory address space or the IO address space of each hardware. Therefore, the convenient condition that the BIOS or UEFI can acquire the hardware information and the second address information in the initialization process is fully utilized, the pre-construction of the relation table is simply and easily realized, and the relation table can be directly called by the operating system layer under the condition that the MCE mechanism is triggered.
In some embodiments, the method further comprises:
acquiring hardware information of a first type of hardware and an address range of a storage space allocated to the first type of hardware in a memory space or an IO space;
constructing a first relation table based on the address range and the hardware information of the first type of hardware; and/or
Acquiring hardware information of second hardware and an allocation rule for allocating a storage space for the second hardware in an internal memory space or an IO space;
and constructing a second relation table based on the distribution rule and the hardware information of the second type of hardware.
In specific implementation, because the rules of the electronic device for allocating the memory address space or the IO address space to various types of hardware are different, the storage space of a part of hardware (i.e., the first type of hardware) is relatively concentrated, the address space of the type of hardware includes a small number of address ranges, the hardware information and the address ranges of the first type of hardware can be directly obtained, and the first relationship table is constructed based on the address ranges and the hardware information of the first type of hardware. Optionally, the first type of hardware may include hardware devices such as a network card, a sound card, a video card, and the like.
There may also be a relatively dispersed storage space of another part of hardware (i.e. the second type of hardware), where the address space of this type of hardware contains a large number of address ranges, and if each address range is recorded one by one, the construction time of the relation table is long, and the data size of the relation table is large. Therefore, the hardware information of the second type of hardware can be obtained for the second type of hardware, the allocation rule for allocating the storage space for the second type of hardware in the memory space or the IO space is set, and the second relation table is constructed based on the hardware information and the allocation rule of the second type of hardware. Optionally, the second type of hardware may include, for example, a memory device, and may obtain hardware information such as a memory tag and a serial number of the memory device, and obtain a System Address function (SAD) or a Target Address function (TAD) including an Address space allocation rule of the memory device. Hardware information such as the memory tag and the serial number, and SAD or TAD are recorded in the second relation table.
In some embodiments, the step S130, acquiring hardware information of the faulty hardware based on the corresponding relationship between the first address information and the second address information and the association relationship between the second address information and the hardware information, may include:
determining the type of the fault hardware based on the first address information and an address space mapping table;
under the condition that the fault hardware is first-class hardware, determining an address range corresponding to the first address information in the first relation table, and acquiring hardware information associated with the address range;
and under the condition that the fault hardware is second-class hardware, acquiring the hardware information of the second-class hardware based on the first address information and the allocation rule recorded in the second relation table.
Optionally, the electronic device may allocate address spaces for various types of hardware by distinguishing the types of hardware. For example, the electronic device may predetermine a first partition and a second partition in the memory space, where the first partition and the second partition are both a larger memory space. When one piece of hardware is determined to be of a first type, the storage space is selected to be allocated to the piece of hardware in the first partition, and when one piece of hardware is determined to be of a second type, the storage space is selected to be classified to the piece of hardware in the second partition. On the basis, an address space mapping table can be constructed based on the address range of the first partition and the address range of the second partition, and the address space mapping table comprises the corresponding relation between the address range of the first partition and the first type of hardware and the corresponding relation between the address range of the second partition and the second type of hardware. However, the address space mapping table may not contain the mapping relationship between a specific hardware and its specific address space.
On the basis, the address space mapping table can be called in response to the MCE mechanism being triggered, the first address information is compared with the address space mapping table, and whether the failed hardware belongs to the first type of hardware or the second type of hardware is preliminarily determined. And under the condition that the fault hardware is determined to be the first type of hardware, calling the first relation table, searching an address range corresponding to the first address information in the first relation table, and acquiring hardware information associated with the address range. Under the condition that the fault hardware is determined to be the second type of hardware, the second relation table can be called, the allocation rule can be obtained from the second relation table, the address range of each second type of hardware can be determined based on the allocation rule, then the address range of each second type of hardware is compared with the first address information, the second type of hardware which the fault hardware is determined, and further the hardware information of the fault hardware is obtained from the second relation table.
It is to be understood that, in implementation, the first address information may also be selected to be firstly compared with the first relation table to determine whether the first relation table has an address range corresponding to the first address information, and if the first relation table has an address range corresponding to the first address information, it may be determined that the failed hardware is the first type device, and address information corresponding to the address range may be acquired. If the first relation table does not have the address range corresponding to the first address information, the fault hardware is not the first type of hardware, the second relation table can be called, the distribution rule can be obtained, and the address range of each second type of hardware is determined based on the distribution rule. And then, comparing the first address information with the address range of each second type of hardware, and acquiring the hardware information of the second type of hardware with faults.
In some embodiments, the method further comprises: and recording an error log based on the hardware information of the failed hardware. Optionally, after the hardware information of the faulty hardware is obtained, the hardware information of the faulty hardware may be recorded in an error log, so that a maintenance worker may directly retrieve the hardware information of the faulty hardware from the error log. Of course, other information related to the failure, such as failure information and failure time, may also be recorded in the error log.
Referring to fig. 3, an embodiment of the present application further provides an electronic device, including:
a first obtaining module 201, configured to, in response to a hardware failure occurring in an electronic device and triggering a machine anomaly checking mechanism, obtain failure information generated by the machine anomaly detection mechanism and used for describing the hardware failure; the fault information comprises first address information, and the first address information is used for identifying the storage position of data triggering the hardware fault in a memory space or an IO space;
a calling module 202, configured to call a pre-constructed relationship table; the relationship table comprises hardware information of each hardware of the electronic equipment and an incidence relationship between each hardware information and second address information, wherein the hardware information is used for uniquely identifying the hardware, and the second address information is used for identifying a storage space allocated to the hardware in an internal memory space or an IO space;
the determining module 203 is configured to obtain hardware information of the faulty hardware based on a corresponding relationship between the first address information and the second address information and an association relationship between the second address information and the hardware information.
In some embodiments, the electronic device further comprises:
the second obtaining module is used for executing initialization operation on the electronic equipment through a BIOS (basic input output System) or UEFI (unified extensible firmware interface), and obtaining the hardware information and the second address information of each piece of hardware;
and the first construction module is used for constructing the relation table based on the association relation between the hardware information and the second address information through BIOS or UEFI.
In some embodiments, the electronic device further comprises:
the third acquisition module is used for acquiring the hardware information of the first type of hardware and the address range of the storage space allocated to the first type of hardware in the memory space or the IO space;
the second building module is used for building a first relation table based on the address range and the hardware information of the first type of hardware; and/or
The fourth acquisition module is used for acquiring the hardware information of the second type of hardware and the allocation rule for allocating the storage space for the second type of hardware in the memory space or the IO space;
and the third building module is used for building a second relation table based on the distribution rule and the hardware information of the second type of hardware.
In some embodiments, the determining module 203 is specifically configured to:
determining the type of the fault hardware based on the first address information and an address space mapping table;
under the condition that the fault hardware is first-class hardware, determining an address range corresponding to the first address information in the first relation table, and acquiring hardware information associated with the address range;
and under the condition that the fault hardware is second-class hardware, acquiring the hardware information of the second-class hardware based on the first address information and the allocation rule recorded in the second relation table.
In some embodiments, the failure information further includes a first identifier for identifying validity of the first address information; the retrieval module 202 is specifically configured to:
and in the case that the first address information is determined to be effective based on the first identifier, calling a pre-constructed relation table.
In some embodiments, the electronic device further comprises:
and the recording module is used for recording the error log based on the hardware information of the fault hardware.
Referring to fig. 4, an electronic device is further provided in the embodiments of the present application, and the electronic device includes at least a memory 301 and a processor 302, where the memory 301 stores a program, and the processor 302 implements the method according to any of the above embodiments when executing the program on the memory 301.
As will be appreciated by one of skill in the art, embodiments of the present application may be provided as a method, electronic device, computer-readable storage medium, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied in the media. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The processor may be a general purpose processor, a digital signal processor, an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof. A general purpose processor may be a microprocessor or any conventional processor or the like.
Such memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
The readable storage medium may be a magnetic disk, an optical disk, a DVD, a USB, a Read Only Memory (ROM), a Random Access Memory (RAM), etc., and the specific form of the storage medium is not limited in this application.
The above embodiments are only exemplary embodiments of the present application, and are not intended to limit the present application, and the protection scope of the present application is defined by the claims. Various modifications and equivalents may be made to the disclosure by those skilled in the art within the spirit and scope of the disclosure, and such modifications and equivalents should also be considered as falling within the scope of the disclosure.

Claims (10)

1. A hardware fault handling method comprises the following steps:
responding to hardware faults of electronic equipment and triggering a machine abnormity checking mechanism, and acquiring fault information which is generated by a machine abnormity detection mechanism and used for describing the hardware faults; the fault information comprises first address information, and the first address information is used for identifying the storage position of data triggering hardware faults in a memory space or an IO space;
calling a pre-constructed relation table; the relationship table comprises hardware information of each hardware of the electronic equipment and an incidence relationship between each hardware information and second address information, wherein the hardware information is used for uniquely identifying the hardware, and the second address information is used for identifying a storage space allocated to the hardware in an internal memory space or an IO space;
and acquiring hardware information of the fault hardware based on the corresponding relation between the first address information and the second address information and the incidence relation between the second address information and the hardware information.
2. The method of claim 1, wherein the method further comprises:
executing initialization operation on the electronic equipment through BIOS or UEFI, and acquiring the hardware information and the second address information of each hardware;
and constructing the relation table through a BIOS or UEFI based on the association relation between the hardware information and the second address information.
3. The method of claim 1, wherein the method further comprises:
acquiring hardware information of a first type of hardware and an address range of a storage space allocated to the first type of hardware in a memory space or an IO space;
constructing a first relation table based on the address range and the hardware information of the first type of hardware; and/or
Acquiring hardware information of second-class hardware and an allocation rule for allocating a storage space for the second-class hardware in an internal memory space or an IO space;
and constructing a second relation table based on the distribution rule and the hardware information of the second type of hardware.
4. The method according to claim 3, wherein the acquiring hardware information of the faulty hardware based on the correspondence between the first address information and the second address information and the association between the second address information and the hardware information comprises:
determining the type of the fault hardware based on the first address information and an address space mapping table;
under the condition that the fault hardware is first-class hardware, determining an address range corresponding to the first address information in the first relation table, and acquiring hardware information associated with the address range;
and under the condition that the fault hardware is second-class hardware, acquiring the hardware information of the second-class hardware based on the first address information and the allocation rule recorded in the second relation table.
5. The method of claim 1, wherein the failure information further comprises a first identifier for identifying validity of the first address information;
the calling of the pre-constructed relationship table comprises the following steps:
in the case where it is determined that the first address information is valid based on the first identifier, a pre-constructed relationship table is retrieved.
6. The method of claim 1, wherein the method further comprises:
and recording an error log based on the hardware information of the failed hardware.
7. An electronic device, comprising:
the first acquisition module is used for responding to hardware faults of the electronic equipment and triggering a machine abnormity checking mechanism, and acquiring fault information which is generated by the machine abnormity detection mechanism and used for describing the hardware faults; the fault information comprises first address information, and the first address information is used for identifying the storage position of data triggering hardware faults in a memory space or an IO space;
the calling module is used for calling a pre-constructed relation table; the relationship table comprises hardware information of each hardware of the electronic equipment and an incidence relationship between each hardware information and second address information, wherein the hardware information is used for uniquely identifying the hardware, and the second address information is used for identifying a storage space allocated to the hardware in an internal memory space or an IO space;
and the determining module is used for acquiring the hardware information of the fault hardware based on the corresponding relation between the first address information and the second address information and the incidence relation between the second address information and the hardware information.
8. The electronic device of claim 7, wherein the electronic device further comprises:
the second obtaining module is used for executing initialization operation on the electronic equipment through a BIOS (basic input output System) or UEFI (unified extensible firmware interface), and obtaining the hardware information and the second address information of each piece of hardware;
and the first construction module is used for constructing the relation table based on the association relation between the hardware information and the second address information through the BIOS or UEFI.
9. The electronic device of claim 7, wherein the electronic device further comprises:
the third acquisition module is used for acquiring hardware information of the first type of hardware and an address range of a storage space allocated to the first type of hardware in a memory space or an IO space;
the second building module is used for building a first relation table based on the address range and the hardware information of the first type of hardware; and/or
The fourth acquisition module is used for acquiring the hardware information of the second type of hardware and the allocation rule for allocating the storage space for the second type of hardware in the memory space or the IO space;
and the third building module is used for building a second relation table based on the distribution rule and the hardware information of the second type of hardware.
10. The electronic device of claim 9, wherein the determination module is specifically configured to:
determining the type of the fault hardware based on the first address information and an address space mapping table;
under the condition that the fault hardware is first-class hardware, determining an address range corresponding to the first address information in the first relation table, and acquiring hardware information associated with the address range;
and under the condition that the fault hardware is second-class hardware, acquiring the hardware information of the second-class hardware based on the first address information and the allocation rule recorded in the second relation table.
CN202210315783.7A 2022-03-28 2022-03-28 Hardware fault processing method and electronic equipment Pending CN114780271A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210315783.7A CN114780271A (en) 2022-03-28 2022-03-28 Hardware fault processing method and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210315783.7A CN114780271A (en) 2022-03-28 2022-03-28 Hardware fault processing method and electronic equipment

Publications (1)

Publication Number Publication Date
CN114780271A true CN114780271A (en) 2022-07-22

Family

ID=82424719

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210315783.7A Pending CN114780271A (en) 2022-03-28 2022-03-28 Hardware fault processing method and electronic equipment

Country Status (1)

Country Link
CN (1) CN114780271A (en)

Similar Documents

Publication Publication Date Title
US5410545A (en) Long-term storage of controller performance
US7971112B2 (en) Memory diagnosis method
US6934879B2 (en) Method and apparatus for backing up and restoring data from nonvolatile memory
US6883116B2 (en) Method and apparatus for verifying hardware implementation of a processor architecture in a logically partitioned data processing system
US7219258B2 (en) Method, system, and product for utilizing a power subsystem to diagnose and recover from errors
CN109117327A (en) A kind of hard disk detection method and device
US10430267B2 (en) Determine when an error log was created
US20120124420A1 (en) Reset method and monitoring apparatus
US20020095625A1 (en) Identifying field replaceable units responsible for faults detected with processor timeouts utilizing IPL boot progress indicator status
US20060277444A1 (en) Recordation of error information
CN103164316B (en) Hardware monitor
US8510611B2 (en) Computer apparatus
US20230009868A1 (en) Error information processing method and device, and storage medium
US6976191B2 (en) Method and apparatus for analyzing hardware errors in a logical partitioned data processing system
CN111221775B (en) Processor, cache processing method and electronic equipment
CN117149644A (en) Memory overflow detection method, device, operating system, equipment and storage medium
US6934888B2 (en) Method and apparatus for enhancing input/output error analysis in hardware sub-systems
CN114780271A (en) Hardware fault processing method and electronic equipment
CN115292082A (en) Method and system for processing Assert downtime fault in BIOS starting process
US11593209B2 (en) Targeted repair of hardware components in a computing device
US7308616B2 (en) Method, apparatus, and computer program product for enhanced diagnostic test error reporting utilizing fault isolation registers
US10922023B2 (en) Method for accessing code SRAM and electronic device
JP2015130023A (en) Information recording device, information processor, information recording method and information recording program
US11797373B2 (en) System and method for managing faults in integrated circuits
CN113645056B (en) Method and system for positioning fault of intelligent network card

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination