CN116954972A - Fault positioning method, positioning device, terminal equipment and storage medium - Google Patents

Fault positioning method, positioning device, terminal equipment and storage medium Download PDF

Info

Publication number
CN116954972A
CN116954972A CN202310930165.8A CN202310930165A CN116954972A CN 116954972 A CN116954972 A CN 116954972A CN 202310930165 A CN202310930165 A CN 202310930165A CN 116954972 A CN116954972 A CN 116954972A
Authority
CN
China
Prior art keywords
version
basic input
output system
management controller
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310930165.8A
Other languages
Chinese (zh)
Inventor
曹永禄
党光跃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yiyike Data Equipment Technology Co ltd
Original Assignee
Shenzhen Yiyike Data Equipment Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yiyike Data Equipment Technology Co ltd filed Critical Shenzhen Yiyike Data Equipment Technology Co ltd
Priority to CN202310930165.8A priority Critical patent/CN116954972A/en
Publication of CN116954972A publication Critical patent/CN116954972A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2205Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
    • G06F11/2221Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested to test input/output devices or peripheral units

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application is suitable for the technical field of computers, and provides a fault positioning method, a fault positioning device, terminal equipment and a storage medium. The fault locating method is applied to a server, the server comprises a basic input output system and a baseboard management controller, and the fault locating method comprises the following steps: the base plate management controller obtains the starting time of the basic input output system, and based on the starting time and the preset starting time length, after the server is determined to be in a fault state, the basic input output system is switched from a first version to a second version, wherein the first version is a version which does not comprise log information of the basic input output system, the second version is a version which comprises log information of the basic input output system, the basic input output system outputs the log information, and the base plate management controller locates the fault position of the server based on the log information. The technical scheme provided by the application improves the efficiency of locating the fault of the server.

Description

Fault positioning method, positioning device, terminal equipment and storage medium
Technical Field
The present application belongs to the field of computer technology, and in particular, relates to a fault positioning method, a fault positioning device, a terminal device, and a storage medium.
Background
Typically, the basic input output system (basic input output system, BIOS) version of a server is divided into a formal version and a debug version. The results compiled from the BIOS debug version typically contain debug information, without any optimization of the overall code, providing a developer with powerful application debugging capabilities. The BIOS formal version is used by clients, does not store debugging information, optimizes codes at the same time, achieves the effects of minimum codes and optimal speed, and brings convenience for the clients to use.
However, when the problem that the starting is suspended and the like occurs in the BIOS starting process and the debugging information is required to perform fault location, a version of BIOS debugging version is required to be re-recorded, the fault location is performed through the debugging information in the BIOS debugging version, and after the problem is solved, the formal version of the BIOS is re-recorded for a customer to use, so that the whole processing process is complicated, and the fault location efficiency is lower.
Therefore, how to improve the efficiency of server fault location is a urgent problem to be solved.
Disclosure of Invention
The embodiment of the application provides a fault positioning method, a fault positioning device, terminal equipment and a storage medium, which improve the efficiency of server fault positioning.
In a first aspect, an embodiment of the present application provides a fault locating method, applied to a server, where the server includes a basic input output system and a baseboard management controller, the method includes: the baseboard management controller obtains the starting time of the basic input/output system; the baseboard management controller determines that the server is in a fault state based on the starting time and a preset starting time length; the basic input and output system is switched from a first version to a second version, wherein the first version is a version which does not comprise log information of the basic input and output system, and the second version is a version which comprises log information of the basic input and output system; the basic input/output system outputs log information; the baseboard management controller locates a failure location of the server based on the log information.
In one possible implementation, the method further includes: the baseboard management controller obtains version switching indication information, wherein the version switching indication information is used for indicating the basic input and output system to be switched from the first version to the second version; the basic input output system switches from the first version to the second version based on the version switch indication information.
In one possible implementation, the server further includes a complex programmable logic device and a first general purpose input output port, the method further comprising: the baseboard management controller obtains printing grade indication information, wherein the printing grade indication information is used for indicating the printing grade of the log information; the complex programmable logic device determines the combination state of the first general input/output port based on the printing grade indication information, and different printing grades correspond to different combination states; the basic input output system outputs the log information based on the combined state of the first general input output port.
In one possible implementation manner, the server further includes a central processor, a memory, and a second general purpose input/output port, and the method includes: in the initialization stage of the central processing unit or the initialization stage of the memory, the basic input output system monitors the initialization state of the central processing unit or the memory; the basic input/output system determines the change information of the second general input/output port based on the initialization state of the central processing unit or the memory, wherein the change information of the second general input/output port is used for indicating the level change of the second general input/output port; and the baseboard management controller locates the fault position of the server as the central processing unit or the memory based on the change information of the second general input/output port.
In one possible implementation manner, the baseboard management controller locates the fault location of the server as the central processing unit or the memory based on the change information of the second general purpose input/output port, including: when the initialization stage of the central processing unit is performed, the change information of the second general input/output port indicates that the signal of the second general input/output port is changed from low level to high level, and then the baseboard management controller locates the fault position of the server as the central processing unit; and in the initialization stage of the memory, the change information of the second general input/output port indicates that the signal of the second general input/output port is changed from low level to high level, and the baseboard management controller locates the fault position of the server as the memory.
In one possible implementation manner, the server further includes an external device, and the method includes: in the initialization stage of the external equipment, the basic input/output system monitors the initialization state of the external equipment; the basic input/output system determines an indicated value based on the initialization state of the external equipment, and different indicated values correspond to different initialization states of the external equipment; writing the indicated value into a shared memory, wherein the shared memory is the memory agreed by the baseboard management controller and the basic input and output system.
In one possible implementation, the method further includes: in the initialization stage of the external device, the baseboard management controller acquires the indicated value from the shared memory based on a preset period; and the baseboard management controller locates the fault position of the server as the external equipment based on the indicated value.
In a second aspect, an embodiment of the present application provides a fault locating device applied to a server, where the server includes a basic input output system and a baseboard management controller, the device includes: the acquisition module is used for acquiring the starting time of the basic input/output system by the baseboard management controller; the determining module is used for determining that the basic input/output system is in a fault state based on the starting time and the preset starting time length by the baseboard management controller; the switching module is used for switching the basic input and output system from a first version to a second version, wherein the first version is a version which does not comprise log information of the basic input and output system, and the second version is a version which comprises log information of the basic input and output system; the output module is used for outputting log information by the basic input/output system; and the positioning module is used for positioning the fault position of the server based on the log information by the baseboard management controller.
In a third aspect, an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the method according to the first aspect or any implementation manner of the first aspect when executing the computer program.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program which, when executed by a processor, implements a method according to the first aspect or any one of the implementations.
In a fifth aspect, an embodiment of the present application provides a computer program product for, when run on a terminal device, causing the terminal device to perform the method according to the first aspect or any implementation manner of the first aspect.
Compared with the prior art, the embodiment of the application has the beneficial effects that: and the baseboard management controller in the server acquires the starting time of the basic input output system, and after determining that the server is in a fault state based on the starting time and the preset starting time, the basic input output system is switched from a first version which does not comprise log information of the basic input output system to a second version which comprises log information of the basic input output system, and outputs the log information based on the second version, and the baseboard management controller locates the fault position of the server based on the log information. Compared with the problem that the basic input and output system of the server is started to hang up and the like in the starting process, when debugging information is needed to perform fault location, the debugging version of the basic input and output system is re-burnt, and the fault location is performed through the debugging information in the debugging version, so that the fault location efficiency of the server is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of a server according to an embodiment of the application;
FIG. 2 is a flow chart of a fault locating method according to an embodiment of the present application;
FIG. 3 is a flow chart illustrating another fault locating method according to an embodiment of the present application;
FIG. 4 is a block diagram illustrating a fault location device according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
For easy understanding, the technical scheme of the present application will be described in detail with reference to the accompanying drawings.
Fig. 1 is a schematic diagram of a server according to an embodiment of the application. As shown in fig. 1, the server 100 includes a basic input output system 110, a baseboard management controller 120, a complex programmable logic device 130, a first general purpose input output port 140, a second general purpose input output port 150, a shared memory 160, an intelligent platform management interface 170, and a system management bus 180.
In one possible implementation, the baseboard management controller 120 is configured to obtain a startup time of the bios 110, and determine that an abnormality occurs in the bios 110 during startup, that is, the server 100 is in a failure state, based on the startup time and a preset startup duration. After the baseboard management controller 120 determines that the server 100 is in a failure state, the bios 110 automatically switches from the first version to the second version and outputs log information based on the second version.
As an example, when the bios 110 is powered on and started, the baseboard management controller 120 automatically starts timing, when the bios 110 is started successfully, the baseboard management controller 120 is informed by the intelligent platform management interface 170, and after receiving a notification that the bios 110 is started successfully, the baseboard management controller 120 stops timing to obtain the starting duration of the bios 110, and if the starting duration of the bios 110 is less than or equal to the preset starting duration, it is determined that there is no abnormality in the starting of the bios 110, that is, the server 100 is in a normal state; if the baseboard management controller 120 starts timing from the time of powering on the bios 110, and a notification of successful startup of the bios 110 transmitted through the intelligent platform management interface 170 is not received within a preset startup time, it is determined that an abnormality occurs in the bios 110 during startup, that is, the server 100 is in a fault state, and alarm information of the abnormality in startup of the bios 110 is displayed in a page.
As one example, the first version is a version that does not include log information of the basic input output system 110, such as a formal version; the second version is a version including log information of the bios 110, such as a debug version. In actual operation, to ensure a user experience, the bios 110 is typically run in the first version.
As one example, when the baseboard management controller 120 determines that the server 100 is in a failure state, the bios 110 automatically switches from the first version to the second version and outputs log information of a default print level based on the second version.
The printing grade of the log information of the basic input/output system 110 is divided into three grades, namely low grade, middle grade and high grade, and the low grade only prints the log information of the error event, so that an initiator can quickly locate the abnormal stage of the basic input/output system 110 in the starting process; the middle level prints the log information of the error event, and also prints the situation of potential error, namely program branches which cannot be executed in the normal starting process, and the middle level prints information in more detail relative to the low level; the log information of the high level is most detailed, and includes information of the middle level, and also some important information in program execution, such as entry information of the program, calculation results, and the like, are printed. Typically, the default print level is a low level.
In another possible implementation, the baseboard management controller 120 obtains version switch indication information, where the version switch indication information is used to indicate that the bios 110 is switched from the first version to the second version, and the bios 110 is switched from the first version to the second version based on the version switch indication information.
As an example, a version option virtual control switch is added to the hypertext markup language (hyper text markup language, HTML) web page of the baseboard management controller 120, and the version option virtual control switch can implement switching of the version of the bios 110. When the bios 110 is operated using the first version, the worker gates the version option virtual control switch to the second version, and the baseboard management controller 120 may obtain version switching indication information for indicating switching the bios 110 from the first version to the second version, based on which the bios 110 is switched from the first version to the second version, and outputs log information based on the second version.
As an example, when the baseboard management controller 120 acquires version switching instruction information for instructing to switch the bios 110 from the first version to the second version, the bios 110 switches from the first version to the second version based on the version switching information and outputs log information of a default print level based on the second version.
In one possible implementation, the baseboard management controller 120 is further configured to obtain print grade indication information, where the print grade indication information is used to indicate a print grade of the log information; the complex programmable logic device 130 is configured to determine a combination status of the first general purpose input output port 140 based on the print level indication information, and the bios 110 is configured to output log information based on the combination status of the first general purpose input output port 140.
As an example, a print-level virtual control switch is added to the HTML web page of the baseboard management controller 120, and the print-level virtual control switch can control the bios 110 to output log information of different print levels. When the version option virtual control switch is gated to the first version, the print-level virtual control switch is disabled, defaulting to not output log information, i.e., when the bios 110 operates according to the first version, not outputting log information; when the version option virtual control switch is gated to the second version, the worker adjusts the print level of the log information output by the bios 110 through the print level virtual control switch, and the baseboard management controller 120 acquires print level indication information based on the gating operation of the worker on the print level virtual control switch.
As an example, the baseboard management controller 120 transfers the acquired print-level indication information to the complex programmable logic device 130 through the system management bus 180, the complex programmable logic device 130 parses the print-level indication information, determines a combination state of the first general purpose input output port 140 based on the parsed print-level indication information, and simultaneously resets the bios 110, monitors the combination state of the first general purpose input output port 140 during the start-up process after the bios 110 is restarted, and outputs log information based on the combination state of the first general purpose input output port 140.
Exemplary, the combined status of the first general purpose input output port 140 includes, but is not limited to, 00, 01, 10, and 11. Where 00 represents a formal version of the bios 110, 01 represents a low print-level debug version of the bios 110, 10 represents a middle print-level debug version of the bios 110, and 11 represents a high print-level debug version of the bios 110.
Accordingly, when the combination status of the first general input/output port 140 is 00, the bios 110 is normally started according to the formal version, and does not output log information; when the combination status of the first general input/output port 140 is 01, the bios 110 is started according to the debug version, and outputs log information of the low print level; when the combination status of the first general input/output port 140 is 10, the bios 110 is started according to the debug version, and outputs log information of the middle print level; when the combination status of the first general input output port 140 is 11, the bios 110 is started according to the debug version and outputs log information of the high print level.
In one possible implementation, during an initialization phase of a central processor (not shown in fig. 1) of the server 100 or an initialization phase of a memory (not shown in fig. 1) of the server 100, the bios 110 is configured to monitor an initialization state of the central processor or the memory, and determine change information of the second common input/output port 150 based on the initialization state of the central processor or the memory, where the change information of the second common input/output port 150 is used to indicate a level change of the second common input/output port 150; the baseboard management controller 120 is configured to locate the fault location of the server 100 as a cpu or a memory based on the change information of the second common input/output port 150, and display the fault information of the cpu or the fault information of the memory in a web page. Meanwhile, the baseboard management controller 120 performs a reset operation on the bios 110, and if the reset frequency exceeds the preset frequency, the bios 110 cannot continue, the baseboard management controller 120 performs a power-down operation on the server 100. For example, the preset number of times is 3.
As an example, at the initialization stage of the central processor of the server 100, the change information of the second general purpose input output port 150 indicates that the signal of the second general purpose input output port 150 is changed from low level to high level, and the baseboard management controller 120 locates the fault location of the server 100 as the central processor and displays the fault information of the central processor in the web page.
As another example, in the initialization stage of the memory of the server 100, the change information of the second common input/output port 150 indicates that the signal of the second common input/output port 150 is changed from low level to high level, and the baseboard management controller 120 locates the fault location of the server 100 as the memory and displays the fault information of the memory in the web page.
In one possible implementation manner, during an initialization phase of an external device (not shown in fig. 1) of the server 100, the bios 110 is configured to monitor an initialization state of the external device, determine an indication value based on the initialization state of the external device, and write the indication value into the shared memory 160, where the shared memory 160 is a memory agreed by the baseboard management controller 120 and the bios 110; the baseboard management controller 120 obtains an instruction value from the shared memory 160 based on a preset period, locates the fault location of the server 100 as an external device based on the instruction value, and displays fault information of the external device in a web page.
As an example, after the initialization of the cpu and the memory of the server 100 is completed normally, when the bios 110 starts scanning the external device during the startup process, a start instruction value (e.g., 0x 01) is written into the shared memory 160, after the initialization of the external device is successful, an initialization completion instruction value (e.g., 0x 02) is written into the shared memory 160, and when the external device is abnormal during the initialization process, an abnormal instruction value (e.g., 0x 03) is written into the shared memory 160. The baseboard management controller 120 acquires an instruction value from the shared memory 160 by means of a timing polling, and determines that the external device of the server 100 is in a fault state when the acquired instruction value is 0x 03.
Exemplary external devices for server 100 include, but are not limited to, external hard disk and peripheral component interconnect standard (peripheral component interconnect express, PCIe) devices.
It is to be understood that the system architecture shown in fig. 1 is only one example of a server provided by the present application, and that in other embodiments of the present application, the server 100 may include more or less components than illustrated, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware, for example, server 100 may also include a central processor, memory, and external devices, as the application is not limited in this regard.
Fig. 2 is a flow chart of a fault locating method according to an embodiment of the present application. As shown in fig. 2, the method includes at least S201 to S205. The method is applied to a server, an example of which may be the server 100 in fig. 1.
S201, the baseboard management controller obtains the starting time of the basic input and output system.
In one possible implementation, the baseboard management controller obtains the start time of the basic input output system when the basic input output system is powered on.
As an example, the baseboard management controller and the bios in the present embodiment may be the baseboard management controller 120 and the bios 110 in fig. 1, respectively.
S202, the baseboard management controller determines that the server is in a fault state based on the starting time and the preset starting time.
In one possible implementation manner, when the basic input/output system is powered on and started, the baseboard management controller obtains the starting time of the basic input/output system and automatically starts timing, when the basic input/output system is started successfully, the baseboard management controller is informed through the intelligent platform management interface, and the baseboard management controller stops timing after receiving the notification of the successful starting of the basic input/output system, so as to obtain the starting time of the basic input/output system.
As an example, if the starting duration of the basic input/output system determined by the baseboard management controller is less than or equal to the preset starting duration, it is determined that there is no abnormality in starting the basic input/output system, that is, the server is in a normal state.
As another example, if the baseboard management controller starts timing from the time of powering on the bios and starts the bios, and a notification of successful startup transmitted by the bios through the intelligent platform management interface is not received within a preset startup time, it is determined that the bios is abnormal during startup, that is, the server is in a fault state, and alarm information of the bios startup abnormality is displayed in a page.
As an example, the intelligent platform management interface in this embodiment may be the intelligent platform management interface 170 in fig. 1.
S203, the basic input output system is switched from a first version to a second version, wherein the first version is a version which does not comprise log information of the basic input output system, and the second version is a version which comprises log information of the basic input output system.
In one possible implementation, the bios automatically switches from the first version to the second version when the baseboard management controller determines that the server is in a failure state.
As an example, the first version is a formal version of the basic input output system and the second version is a debug version of the basic input output system. In actual operation, to ensure a user experience, the bios is typically run in the first version.
S204, the basic input and output system outputs log information.
In one possible implementation, when the baseboard management controller determines that the server is in a fault state, the bios automatically switches from the first version to the second version, and outputs log information of a default print level based on the second version.
The printing grade of the log information of the basic input/output system is divided into three grades, namely a low grade, a middle grade and a high grade, wherein the low grade only prints the log information with error events, so that an initiator can quickly locate the abnormal stage of the basic input/output system in the starting process; the middle level prints the log information of the error event, and also prints the situation of potential error, namely program branches which cannot be executed in the normal starting process, and the middle level prints information in more detail relative to the low level; the log information of the high level is most detailed, and includes information of the middle level, and also some important information in program execution, such as entry information of the program, calculation results, and the like, are printed. Typically, the default print level is a low level.
In another possible implementation manner, the baseboard management controller obtains print grade indication information, where the print grade indication information is used to indicate a print grade of the log information; the complex programmable logic device determines the combination state of the first general input/output port based on the printing grade indication information, and different printing grades correspond to different combination states; the basic input output system outputs log information based on the combined state of the first general input output port.
As an example, a print-level virtual control switch is added to an HTML web page of the baseboard management controller, and the print-level virtual control switch can control the bios to output log information of different print levels. When the current version of the basic input/output system is the first version, the printing grade virtual control switch is not available, and defaults to not outputting log information; when the current version of the basic input/output system is the second version, the staff adjusts the printing grade of the log information output by the basic input/output system through the printing grade virtual control switch, and the baseboard management controller acquires the printing grade indication information based on the gating operation of the staff on the printing grade virtual control switch.
As an example, the baseboard management controller transmits the acquired print grade indication information to the complex programmable logic device through the system management bus, the complex programmable logic device analyzes the print grade indication information, determines the combination state of the first general purpose input and output ports based on the analyzed print indication information, resets the basic input and output system, monitors the combination state of the first general purpose input and output ports in the starting process after the basic input and output system is reset and started, and outputs log information based on the combination state of the first general purpose input and output ports.
Exemplary, the combined status of the first general purpose input output port includes, but is not limited to, 00, 01, 10, and 11. Where 00 represents a formal version of the bios 110, 01 represents a low print level debug version of the bios, 10 represents a middle print level debug version of the bios, and 11 represents a high print level debug version of the bios.
Correspondingly, when the combination state of the first general input/output port is 00, the basic input/output system is normally started according to the formal version, and no log information is output; when the combination state of the first general input/output port is 01, the basic input/output system is started according to the debugging version, and log information of low printing grade is output; when the combination state of the first general input/output port is 10, the basic input/output system is started according to the debugging version, and log information of the middle print grade is output; when the combination state of the first general input/output port is 11, the basic input/output system is started according to the debug version, and outputs log information of the high print level.
As an example, the complex programmable logic device, the first general purpose input output port, and the system management bus in this embodiment may be the complex programmable logic device 130, the first general purpose input output port 140, and the system management bus 180 in fig. 1, respectively.
S205, the baseboard management controller locates the fault location of the server based on the log information.
In one possible implementation, the baseboard management controller checks software and hardware devices in the server based on log information output by the basic input/output system, and locates the fault location of the server.
It should be noted that, based on the log information, the specific implementation method of the fault location of the baseboard management controller may refer to the prior art, and will not be described herein.
In another possible implementation manner, in an initialization stage of a central processor of the server or an initialization stage of a memory of the server, the bios monitors an initialization state of the central processor or the memory, and determines change information of the second general purpose input/output port based on the initialization state of the central processor or the memory, where the change information of the second general purpose input/output port is used to indicate a level change of the second general purpose input/output port; the baseboard management controller locates the fault position of the server as the central processor or the memory based on the change information of the second general input/output port, and displays the fault information of the central processor or the fault information of the memory in the webpage. Meanwhile, the baseboard management controller performs reset operation on the basic input/output system, and if the reset times exceeds the preset times, the basic input/output system still cannot continue to perform, the baseboard management controller performs power-down operation on the server. For example, the preset number of times is 3.
As an example, at the initialization stage of the central processor of the server, the change information of the second general purpose input output port indicates that the signal of the second general purpose input output port is changed from low level to high level, the baseboard management controller locates the fault location of the server as the central processor, and displays the fault information of the central processor in the web page.
As another example, in the initialization stage of the memory of the server, the change information of the second common input/output port indicates that the signal of the second common input/output port is changed from low level to high level, and then the baseboard management controller locates the fault location of the server as the memory, and displays the fault information of the memory in the web page.
In another possible implementation manner, in an initialization stage of an external device of a server, a basic input/output system monitors an initialization state of the external device, determines an indicated value based on the initialization state of the external device, and writes the indicated value into a shared memory, wherein the shared memory is a memory agreed by a baseboard management controller and the basic input/output system; the baseboard management controller obtains an indicated value from the shared memory based on a preset period, positions the fault position of the server as the external equipment based on the indicated value, and displays the fault information of the external equipment in the webpage.
As an example, after the initialization of the cpu and the memory of the server is completed normally, when the bios starts scanning any external device during the startup process, a start instruction value (for example, 0x 01) is written in the shared memory, after the initialization of the external device is successful, an initialization completion instruction value (for example, 0x 02) is written in the shared memory, and when the external device is abnormal during the initialization, an abnormal instruction value (for example, 0x 03) is written in the shared memory. The baseboard management controller obtains the indicated value from the shared memory in a timing polling mode, when the obtained indicated value is 0x03, the external equipment of the server is determined to be in a fault state, and fault information of the external equipment is displayed in a webpage.
Exemplary external devices for the server include, but are not limited to, external hard disk and PCIe devices.
As an example, the second general purpose input output port and the shared memory in this embodiment may be the second general purpose input output port 150 and the shared memory 160 in fig. 1, respectively.
According to the technical scheme provided by the application, the baseboard management controller in the server acquires the starting time of the basic input output system, and based on the starting time and the preset starting time, after the server is determined to be in a fault state, the basic input output system is switched from a first version which does not comprise log information of the basic input output system to a second version which comprises log information of the basic input output system, and based on the second version, the baseboard management controller locates the fault position of the server based on the log information, and displays the fault information in the webpage, so that the fault efficiency of the positioning server is improved.
Fig. 3 is a flow chart of another fault locating method according to an embodiment of the present application. As shown in fig. 3, the method includes at least S301 to S304. The method is applied to a server, an example of which may be the server 100 in fig. 1.
S301, the baseboard management controller obtains version switching indication information, wherein the version switching indication information is used for indicating a basic input and output system to be switched from a first version to a second version.
In one possible implementation manner, a version option virtual control switch is added to an HTML web page of the baseboard management controller, and the version option virtual control switch can implement switching of a basic input output system version. The baseboard management controller virtually controls the switch based on the version options to acquire version switching indication information.
As an example, when the bios is running using the first version, the operator gates the version option virtual control switch to the second version, and the baseboard management controller may obtain the version switching indication information for indicating to switch the bios from the first version to the second version.
As an example, the baseboard management controller and the bios in the present embodiment may be the baseboard management controller 120 and the bios 110 in fig. 1, respectively.
S302, the basic input/output system is switched from the first version to the second version based on the version switching indication information.
In one possible implementation manner, after the baseboard management controller obtains the version switching indication information based on the version option virtual control switch, the bios performs version switching based on the version switching indication information.
As an example, in actual operation, the bios is usually operated in a formal version, and when the operator gates the version option virtual control switch to a debug version, the baseboard management controller may obtain version switching instruction information for instructing to switch the bios from the formal version to the debug version, and the bios is switched from the formal version to the debug version based on the version switching information.
S303, the basic input/output system outputs log information.
In one possible implementation, after the bios switches from the first version to the second version based on the version switch indication information, log information of a default print level is output based on the second version.
The printing grade of the log information of the basic input/output system is divided into three grades, namely a low grade, a middle grade and a high grade, wherein the low grade only prints the log information with error events, so that an initiator can quickly locate the abnormal stage of the basic input/output system in the starting process; the middle level prints the log information of the error event, and also prints the situation of potential error, namely program branches which cannot be executed in the normal starting process, and the middle level prints information in more detail relative to the low level; the log information of the high level is most detailed, and includes information of the middle level, and also some important information in program execution, such as entry information of the program, calculation results, and the like, are printed. Typically, the default print level is a low level.
In another possible implementation manner, the baseboard management controller obtains print grade indication information, where the print grade indication information is used to indicate a print grade of the log information; the complex programmable logic device determines the combination state of the first general input/output port based on the printing grade indication information, and different printing grades correspond to different combination states; the basic input output system outputs log information based on the combined state of the first general input output port.
As an example, a print-level virtual control switch is added to an HTML web page of the baseboard management controller, and the print-level virtual control switch can control the bios to output log information of different print levels. When the version option virtual control switch is gated to be the first version, the printing grade virtual control switch is not available and defaults to not output log information, namely, when the basic input and output system operates according to the first version, the log information is not output; when the version option virtual control switch is gated to be the second version, the staff adjusts the printing grade of the output log information of the basic input and output system through the printing grade virtual control switch, and the baseboard management controller acquires the printing grade indication information based on the gating operation of the staff on the printing grade virtual control switch.
As an example, the baseboard management controller transmits the acquired print grade indication information to the complex programmable logic device through the system management bus, the complex programmable logic device analyzes the print grade indication information, determines the combination state of the first general purpose input and output ports based on the analyzed print indication information, resets the basic input and output system, monitors the combination state of the first general purpose input and output ports in the starting process after the basic input and output system is reset and started, and outputs log information based on the combination state of the first general purpose input and output ports.
Exemplary, the combined status of the first general purpose input output port includes, but is not limited to, 00, 01, 10, and 11. Where 00 represents a formal version of the bios 110, 01 represents a low print level debug version of the bios, 10 represents a middle print level debug version of the bios, and 11 represents a high print level debug version of the bios. When the version option virtual control switch is gated to be the first version, the combination state of the first general input/output port defaults to 00.
Correspondingly, when the combination state of the first general input/output port is 00, the basic input/output system is normally started according to the formal version, and no log information is output; when the combination state of the first general input/output port is 01, the basic input/output system is started according to the debugging version, and log information of low printing grade is output; when the combination state of the first general input/output port is 10, the basic input/output system is started according to the debugging version, and log information of the middle print grade is output; when the combination state of the first general input/output port is 11, the basic input/output system is started according to the debug version, and outputs log information of the high print level.
As an example, the complex programmable logic device, the first general purpose input output port, and the system management bus in this embodiment may be the complex programmable logic device 130, the first general purpose input output port 140, and the system management bus 180 in fig. 1, respectively.
S304, the baseboard management controller locates the fault position of the server based on the log information.
It should be noted that, the specific implementation of S304 may refer to S205, which is not described herein.
According to the technical scheme provided by the application, the basic input/output system is switched from the first version to the second version based on the version switching indication information acquired by the baseboard management controller, and after switching to the second version, log information of different printing grades is output based on the printing grade indication information acquired by the baseboard management controller, so that fault location of the server is realized, the efficiency and accuracy of fault location of the server are improved, and the user experience is improved.
Fig. 4 is a block diagram of a fault locating device according to an embodiment of the present application, and for convenience of explanation, only a portion related to the embodiment of the present application is shown. Referring to fig. 4, the fault locating device 400 may include an acquisition module 401, a determination module 402, a switching module 403, an output module 404, and a locating module 405.
In one implementation, the apparatus 400 may be used to implement the method illustrated in fig. 2 described above. For example, the acquisition module 401 is used to implement S201, the determination module 402 is used to implement S202, the switching module 403 is used to implement S203, the output module 404 is used to implement S204, and the positioning module 405 is used to implement S205.
In another implementation, the apparatus 400 may also be used to implement the method illustrated in fig. 3 described above. For example, the acquisition module 401 is used to implement S201, the switching module 403 is used to implement S302, the output module 404 is used to implement S303, and the positioning module 405 is used to implement S304.
According to the technical scheme provided by the embodiment of the application, the baseboard management controller in the server acquires the starting time of the basic input output system, and based on the starting time and the preset starting time, after the server is determined to be in a fault state, the basic input output system is switched from a first version which does not comprise log information of the basic input output system to a second version which comprises log information of the basic input output system, and based on the second version, the baseboard management controller locates the fault position of the server based on the log information, so that the fault efficiency of the positioning server is improved.
It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.
Fig. 5 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 5, the terminal device 5 of this embodiment includes: at least one processor 50 (only one shown in fig. 5), a memory 51 and a computer program 52 stored in the memory 51 and executable on the at least one processor 50, the processor 50 implementing the steps in any of the method embodiments of fig. 2-3 described above when executing the computer program 52.
The terminal device 5 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a server, etc. The terminal device may include, but is not limited to, a processor 50, a memory 51. It will be appreciated by those skilled in the art that fig. 5 is merely an example of the terminal device 5 and is not meant to be limiting as the terminal device 5, and may include more or fewer components than shown, or may combine certain components, or different components, such as may also include input-output devices, network access devices, etc.
The processor 50 may be a central processing unit (Central Processing Unit, CPU), the processor 50 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 51 may in some embodiments be an internal storage unit of the terminal device 5, such as a hard disk or a memory of the terminal device 5. The memory 51 may in other embodiments also be an external storage device of the terminal device 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the terminal device 5. The memory 51 is used for storing an operating system, application programs, boot loader (BootLoader), data, other programs, etc., such as program codes of the computer program. The memory 51 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
The embodiment of the application also provides a network device, which comprises: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, which when executed by the processor performs the steps of any of the various method embodiments described above.
Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps for implementing the various method embodiments described above.
Embodiments of the present application provide a computer program product which, when run on a mobile terminal, causes the mobile terminal to perform steps that enable the implementation of the method embodiments described above.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (10)

1. A fault locating method for a server, the server including a basic input output system and a baseboard management controller, the method comprising:
the baseboard management controller obtains the starting time of the basic input/output system;
The baseboard management controller determines that the server is in a fault state based on the starting time and a preset starting time length;
the basic input and output system is switched from a first version to a second version, wherein the first version is a version which does not comprise log information of the basic input and output system, and the second version is a version which comprises log information of the basic input and output system;
the basic input/output system outputs log information;
the baseboard management controller locates a failure location of the server based on the log information.
2. The method according to claim 1, wherein the method further comprises:
the baseboard management controller obtains version switching indication information, wherein the version switching indication information is used for indicating the basic input and output system to be switched from the first version to the second version;
the basic input output system switches from the first version to the second version based on the version switch indication information.
3. The method of claim 1 or 2, wherein the server further comprises a complex programmable logic device and a first general purpose input output port, the method further comprising:
The baseboard management controller obtains printing grade indication information, wherein the printing grade indication information is used for indicating the printing grade of the log information;
the complex programmable logic device determines the combination state of the first general input/output port based on the printing grade indication information, and different printing grades correspond to different combination states;
the basic input output system outputs the log information based on the combined state of the first general input output port.
4. The method of claim 1, wherein the server further comprises a central processor, a memory, and a second general purpose input output port, the method comprising:
in the initialization stage of the central processing unit or the initialization stage of the memory, the basic input output system monitors the initialization state of the central processing unit or the memory;
the basic input/output system determines the change information of the second general input/output port based on the initialization state of the central processing unit or the memory, wherein the change information of the second general input/output port is used for indicating the level change of the second general input/output port;
And the baseboard management controller locates the fault position of the server as the central processing unit or the memory based on the change information of the second general input/output port.
5. The method of claim 4, wherein the baseboard management controller locating the fault location of the server as the cpu or the memory based on the change information of the second general purpose input/output port, comprising:
when the initialization stage of the central processing unit is performed, the change information of the second general input/output port indicates that the signal of the second general input/output port is changed from low level to high level, and then the baseboard management controller locates the fault position of the server as the central processing unit;
and in the initialization stage of the memory, the change information of the second general input/output port indicates that the signal of the second general input/output port is changed from low level to high level, and the baseboard management controller locates the fault position of the server as the memory.
6. The method of claim 4, wherein the server further comprises an external device, the method comprising:
In the initialization stage of the external equipment, the basic input/output system monitors the initialization state of the external equipment;
the basic input/output system determines an indicated value based on the initialization state of the external equipment, and different indicated values correspond to different initialization states of the external equipment;
writing the indicated value into a shared memory, wherein the shared memory is the memory agreed by the baseboard management controller and the basic input and output system.
7. The method of claim 6, wherein the method further comprises:
in the initialization stage of the external device, the baseboard management controller acquires the indicated value from the shared memory based on a preset period;
and the baseboard management controller locates the fault position of the server as the external equipment based on the indicated value.
8. A fault locating device for use with a server, the server including a basic input output system and a baseboard management controller, the device comprising:
the acquisition module is used for acquiring the starting time of the basic input/output system by the baseboard management controller;
the determining module is used for determining that the basic input/output system is in a fault state based on the starting time and the preset starting time length by the baseboard management controller;
The switching module is used for switching the basic input and output system from a first version to a second version, wherein the first version is a version which does not comprise log information of the basic input and output system, and the second version is a version which comprises log information of the basic input and output system;
the output module is used for outputting log information by the basic input/output system;
and the positioning module is used for positioning the fault position of the server based on the log information by the baseboard management controller.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the computer program.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 7.
CN202310930165.8A 2023-07-27 2023-07-27 Fault positioning method, positioning device, terminal equipment and storage medium Pending CN116954972A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310930165.8A CN116954972A (en) 2023-07-27 2023-07-27 Fault positioning method, positioning device, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310930165.8A CN116954972A (en) 2023-07-27 2023-07-27 Fault positioning method, positioning device, terminal equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116954972A true CN116954972A (en) 2023-10-27

Family

ID=88459997

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310930165.8A Pending CN116954972A (en) 2023-07-27 2023-07-27 Fault positioning method, positioning device, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116954972A (en)

Similar Documents

Publication Publication Date Title
US10210842B2 (en) Techniques of displaying host data on a monitor connected to a service processor during pre-boot initialization stage
CN112463689B (en) OCP card hot plug device, method and computer readable storage medium
CN105814541A (en) Computer device and memory starting method for computer device
CN114138644A (en) BMC (baseboard management controller) debugging method, monitoring method, system, device, equipment and medium
CN109582505B (en) BIOS option default value recovery system, method and device
US6904546B2 (en) System and method for interface isolation and operating system notification during bus errors
CN116644011B (en) Quick identification method, device and equipment of I2C equipment and storage medium
CN111475356A (en) System startup test information display method and related device
CN111813596A (en) Chip restarting method and device and computing equipment
CN116449800A (en) Control logic switching method, device, equipment and medium
CN116954972A (en) Fault positioning method, positioning device, terminal equipment and storage medium
CN114691223B (en) Method and device for transmitting BIOS logs through network
CN114461142B (en) Method, system, device and medium for reading and writing Flash data
US20050193259A1 (en) System and method for reboot reporting
CN115102937A (en) Server power source self-adaptive communication method, equipment and medium
CN114461471A (en) Method, device and medium for judging PCIE link training process state
CN113254304A (en) Method for determining shutdown type of server, server and storage medium
CN113821265A (en) Operating system control method and device, computer mainboard and readable storage medium
CN112673354B (en) System state detection method, system state device and terminal equipment
CN113849230A (en) Server starting method and device, electronic equipment and readable storage medium
CN113849367A (en) Server, management method, device and system thereof, electronic equipment and storage medium
JP2001331342A (en) Method for displaying information processor error and recording medium with error display program recorded thereon
CN111124730A (en) Error positioning method and device for server and computer storage medium
CN113836068B (en) PCIe display card control method and device, server and electronic equipment
CN114443446B (en) Hard disk indicator lamp control method, system, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication