CN111722954A - Server abnormity positioning method and device, storage medium and server - Google Patents

Server abnormity positioning method and device, storage medium and server Download PDF

Info

Publication number
CN111722954A
CN111722954A CN202010623604.7A CN202010623604A CN111722954A CN 111722954 A CN111722954 A CN 111722954A CN 202010623604 A CN202010623604 A CN 202010623604A CN 111722954 A CN111722954 A CN 111722954A
Authority
CN
China
Prior art keywords
server
bios
event log
system event
manager
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010623604.7A
Other languages
Chinese (zh)
Inventor
余新来
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dawning Information Industry Beijing Co Ltd
Original Assignee
Dawning Information Industry Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dawning Information Industry Beijing Co Ltd filed Critical Dawning Information Industry Beijing Co Ltd
Priority to CN202010623604.7A priority Critical patent/CN111722954A/en
Publication of CN111722954A publication Critical patent/CN111722954A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2268Logging of test results

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application provides a server abnormity positioning method, a device, a storage medium and a server, wherein the method comprises the following steps: when the server fails, inquiring a system event log stored in the mainboard manager, wherein the system event log comprises a first system event log, the first system event log is generated by reading a count value of a counter stored in a complex programmable logic device and a restarting type parameter, the count value of the counter is used for metering the restarting times of the server, and the restarting type parameter is used for representing the latest starting type of the server; acquiring restarting state information of the server according to the system event log; and judging the fault position of the server according to the restarting state information. According to the method and the device, the fault position of the server is judged according to the restarting state information, so that fault positioning is realized, the fault position can be found conveniently and rapidly, and the maintenance efficiency of the server is improved.

Description

Server abnormity positioning method and device, storage medium and server
Technical Field
The present disclosure relates to the field of server maintenance technologies, and in particular, to a method and an apparatus for locating server abnormality, a storage medium, and a server.
Background
With the rise of cloud computing, the number of X86 servers deployed in a data center has multiplied. Monitoring and diagnosing abnormal phenomena of the servers, particularly abnormal downtime and restart, is a very important work of server research and development and operation and maintenance departments. The server's motherboard manager assumes the responsibility of this monitoring for failures and abnormal reboots.
In the currently used technology, the motherboard manager relies on its record to record SEL events sent by the BIOS. And judging whether the server is started to which stage or not and whether abnormal restart occurs or not according to the event records sent by the BIOS. However, when the actual server fails abnormally, the BIOS has not yet reached the first instruction. Under the condition, it is difficult to judge what the reason of the black screen phenomenon occurs in the system, and whether the system is restarted or not cannot be judged, so that the fault phenomenon cannot be positioned.
In view of the above problems, no effective technical solution exists at present.
Disclosure of Invention
An object of the embodiments of the present application is to provide a method and an apparatus for server exception location, a storage medium, and a server, so as to improve server maintenance efficiency.
In a first aspect, an embodiment of the present application provides a server exception location method, where the server includes a processor, a complex programmable logic device, a motherboard manager, a BIOS, and a south bridge chip, where the complex programmable logic device is connected to the motherboard manager, the south bridge chip, and the processor, the south bridge chip is connected to the BIOS and the processor, and the BIOS is connected to the motherboard manager; the method is applied to the mainboard manager; the method is applied to the mainboard manager, and comprises the following steps:
when the server fails, inquiring a system event log stored in the mainboard manager, wherein the system event log comprises a first system event log, the first system event log is generated by reading a count value of a counter stored in a complex programmable logic device and a restarting type parameter, the count value of the counter is used for metering the restarting times of the server, and the restarting type parameter is used for representing the latest starting type of the server;
acquiring restarting state information of the server according to the system event log;
and judging the fault position of the server according to the restarting state information.
Optionally, in the method for locating an exception of a server according to the embodiment of the present application, before the step of querying a system event log stored in the motherboard manager, the method further includes:
when an EventTrigger interrupt signal of the complex programmable logic device is detected, reading a count value of a counter and a restart type parameter which are stored in the complex programmable logic device;
when the count value of the counter changes relative to the count value of the counter read last time, judging that the server is restarted, and generating a corresponding restart event record according to a restart type parameter;
and updating a first system event log in the system event logs according to the restart event record.
Optionally, in the server anomaly positioning method according to the embodiment of the present application, the system event log further includes a second system event log; the second system event log is used for judging the reaching stage after the system is restarted and enters the BIOS, and the second system event log is generated based on a plurality of running event records of the BIOS.
Optionally, in the server abnormal location method according to the embodiment of the present application, the restart state information includes: a reboot type of the server and a phase to which the server reboots.
Optionally, in the server exception location method according to the embodiment of the present application, the plurality of running event records include a BIOS start event record;
the method further comprises the steps of:
receiving a BIOS starting event record sent by the BIOS, wherein the BIOS starting event record is generated when the BIOS starts to start;
and updating the second system event log according to the BIOS starting event record.
Optionally, in the server exception positioning method according to the embodiment of the present application, the plurality of running event records further include a display initialization completion event record;
the method further comprises the steps of:
receiving a display parameter initialization completion event record sent by the BIOS, wherein the display parameter initialization completion event record is generated after the BIOS completes initialization operation on display parameters;
and updating the second system event log according to the display parameter initialization completion event record.
Optionally, in the server exception location method according to the embodiment of the present application, the plurality of running event records further include a BIOS start completion event record;
the method further comprises the steps of:
receiving a BIOS start-up completion event record sent by the BIOS, wherein the BIOS start-up completion event record is generated after the BIOS finishes start-up and transmits a control right to an operating system of the server;
and updating the second system event log according to the BIOS starting completion event record.
Optionally, in the method for locating an abnormality of a server according to the embodiment of the present application, the determining a fault location of the server according to the restart status information includes:
preliminarily screening out a server module with higher fault probability according to the restart type and the stage of restarting the server;
and confirming the fault position of the server from the screened server module with higher fault probability.
In a second aspect, an embodiment of the present application further provides a server exception locating device, where the server includes a processor, a complex programmable logic device, a motherboard manager, a BIOS, and a south bridge chip, where the complex programmable logic device is connected to the motherboard manager, the south bridge chip, and the processor, the south bridge chip is connected to the BIOS and the processor, and the BIOS is connected to the motherboard manager; the method is applied to the mainboard manager; the device comprises:
the system event log comprises a first system event log, the first system event log is generated by reading a count value of a counter stored in a complex programmable logic device and a restart type parameter, the count value of the counter is used for metering the restart times of the server, and the restart type parameter is used for representing the latest starting type of the server;
the acquisition module is used for acquiring the restarting state information of the server according to the system event log;
and the judging module is used for judging the fault position of the server according to the restarting state information.
In a third aspect, the present application further provides a storage medium having a computer program stored thereon, where the computer program is executed by a processor to execute the method according to any one of the above descriptions.
In a fourth aspect, an embodiment of the present application further provides a server, including a processor, a complex programmable logic device, a motherboard manager, a BIOS, and a south bridge chip, where the complex programmable logic device is connected to the motherboard manager, the south bridge chip, and the processor, respectively, and the BIOS is connected to the motherboard manager, the BIOS, the south bridge chip, and the processor in sequence;
the mainboard manager is used for executing the method of any one of the above items.
As can be seen from the above, the server exception location method, the apparatus, the storage medium, and the server provided in the embodiments of the present application query a system event log stored in the motherboard manager when the server fails, where the system event log includes a first system event log, the first system event log is generated by reading a count value of a counter stored in a complex programmable logic device and a restart type parameter, the count value of the counter is used to measure the number of times of restarting the server, and the restart type parameter is used to characterize the latest start type of the server; acquiring restarting state information of the server according to the system event log; and judging the fault position of the server according to the restarting state information, thereby realizing fault positioning, being convenient for finding out the fault position quickly and improving the maintenance efficiency of the server.
Additional features and advantages of the present application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a flowchart of a server anomaly positioning method according to an embodiment of the present application.
Fig. 2 is a schematic structural diagram of a server according to an embodiment of the present application.
Fig. 3 is a schematic structural diagram of a server anomaly positioning device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
Referring to fig. 1, fig. 1 is a flowchart illustrating a server anomaly locating method according to some embodiments of the present disclosure. Referring to fig. 2, fig. 2 is a schematic structural diagram of a server in the embodiment of the present application, where the server includes a processor 11, a complex programmable logic device 12, a motherboard manager 13, a BIOS (Basic Input Output System) 14, and a south bridge chip 15, where the complex programmable logic device 12 is connected to the motherboard manager 13, the south bridge chip 15, and the processor 11, the south bridge chip 15 is connected to the BIOS14 and the processor 11, and the BIOS14 is connected to the motherboard manager 13; the method is applied to the mainboard manager 13; the server anomaly positioning method is applied to the mainboard manager 13.
The server abnormity positioning method comprises the following steps:
s101, when the server fails, inquiring a system event log stored in the mainboard manager, wherein the system event log comprises a first system event log, the first system event log is generated by reading a counter value of a counter stored in a complex programmable logic device and a restarting type parameter, the counter value of the counter is used for metering the restarting times of the server, and the restarting type parameter is used for representing the latest starting type of the server.
And S102, acquiring the restarting state information of the server according to the system event log.
S103, judging the fault position of the server according to the restarting state information.
In step S101, the system event log includes a first system event log and a second system event log. The first system event log is generated by reading a count value of a counter stored in a complex programmable logic device and a restart type parameter, wherein the count value of the counter is used for metering the restart times of the server, and the restart type parameter is used for representing the latest start type of the server. The second system event log is generated based on an event record sent by the BIOS, and the first system event log is used for judging the phase and the restarting type which are reached before the system is restarted and enters the BIOS. The second system event log is used for judging the reaching stage after the system is restarted and enters the BIOS.
It is understood that, in some embodiments, before executing the step S101, the following steps are further included: s1001, when an EventTrigger interrupt signal of the complex programmable logic device is detected, reading a counter value and a restart type parameter stored in the complex programmable logic device. S1002, when the count value of the counter changes relative to the count value of the counter read last time, judging that the server is restarted, and generating a corresponding restart event record according to the restart type parameter. S1003, updating a first system event log in the system event logs according to the restart event record.
When the server host system is restarted, the two groups of signals in the X86 system mechanism change correspondingly according to different restarting types, when hot restarting occurs, only the PLTRST # signal is effective, and when cold restarting occurs, the PLTRST # signal and the SLP _ SX # signal are effective at the same time. The complex programmable logic device records corresponding restart types to an internal register based on the rule of the PLTRST # signal and the SLP _ SX # signal on different restart types, and meanwhile, the count value of a counter is added with 1, and the count value of the counter is used for representing the starting times of the server. Then, the complex programmable logic device interrupts and informs the main board manager through the GPIO signal of EventTrigger #. When the mainboard manager detects an EventTrigger interrupt signal of the complex programmable logic device, the count value of the counter stored in the complex programmable logic device and the restart type parameter are read, the complex programmable logic device compares the read count value of the counter with the count value of the counter read last time, if the count value of the counter is the same as the count value of the counter read last time, the restart is not generated, and if the count value of the counter is different from the count value of the counter read last time, the restart is generated. If a reboot has occurred, a reboot event record is generated and then the first system event log is updated.
It is to be understood that the system event log further includes a second system event log; a second system event log is generated based on a plurality of operational event records of the BIOS. Wherein the plurality of operational event records comprises: the BIOS start event record, the display parameter initialization completion event record, and the BIOS start completion event record are not limited thereto.
Specifically, in some embodiments, before executing step S101, the following steps are further included:
and S1004, receiving a BIOS starting event record sent by the BIOS, wherein the BIOS starting event record is generated when the BIOS starts to start.
S1005, updating the second system event log according to the BIOS starting event record.
S1006, receiving a display parameter initialization completion event record sent by the BIOS, wherein the display parameter initialization completion event record is generated after the BIOS completes initialization operation on the display parameters.
And S1007, updating the second system event log according to the display parameter initialization completion event record.
And S1008, receiving a BIOS start-up completion event record sent by the BIOS, wherein the BIOS start-up completion event record is generated after the BIOS finishes start-up and transmits the control right to an operating system of the server.
And S1009, updating the second system event log according to the BIOS starting completion event record.
The event records in the system event log are sorted according to the occurrence time of the event, so that the fault node is convenient to find.
In step S102, the restart status information includes: a reboot type of the server and a phase to which the server reboots. Wherein, the phase reached by the restart may be one of the following phases: the method comprises a restart initial stage, a BIOS starting and starting stage, a display parameter initialization completion stage and a BIOS starting and completing stage. Of course, it is not limited thereto. If the server is in a blank screen state and the system event log in the mainboard manager is not updated, it is indicated that the blank screen occurs due to the operating system fault of the server and no restarting action occurs. If only the restart event record is updated in the system event log of the mainboard manager, the server is restarted, and the restart is not carried out to the initial starting stage of the BIOS. If the system event log of the mainboard manager only updates the restart event record and the BIOS start event record, the card shows that the system is restarted before the display parameter initialization completion stage.
In step S103, when the fault location is determined from the restart status information, a preliminary determination is made based on the stage to which the restart has proceeded.
For example, if the server is blank and the system event log in the motherboard manager is not updated, it indicates that the blank screen occurs under the operating system of the server, and no restart action occurs, it indicates that a fault occurs on the display screen or the display driver.
For example, if only the restart event record is updated in the system event log of the motherboard manager, it indicates that the server has restarted, and the restart does not proceed to the initial startup phase of the BIOS, it indicates that a failure occurs in the BIOS or the processor.
For example, if only the restart event record and the BIOS start event record are updated in the system event log of the motherboard manager, it indicates that the card after the system is restarted is before the display parameter initialization completion node, and it indicates that a fault occurs in the display screen or the display card portion.
Of course, the position where the specific fault occurs can be judged by combining other parameters, so that the accuracy of fault positioning is improved.
For example, in some embodiments, this step S103 includes: s1031, preliminarily screening out a server module with higher fault probability according to the restart type and the stage reached by the server restart; s1032, confirming the fault position of the server from the server module with the larger fault occurrence probability. For example, if only the restart event record and the BIOS start event record are updated in the system event log of the motherboard manager, it indicates that the card is stuck before the display parameter initialization completion stage after the system is restarted, and therefore, the server modules that have failed may be preliminarily screened as follows: display, display card. Maintenance personnel can then obtain some status information of the display and graphics card, thereby allowing the specific location of the fault to be determined.
As can be seen from the above, in the embodiment of the present application, when the server fails, a system event log stored in the motherboard manager is queried, where the system event log includes a first system event log, the first system event log is generated by reading a count value of a counter stored in a complex programmable logic device and a restart type parameter, the count value of the counter is used to measure the number of times of restarting the server, and the restart type parameter is used to characterize a last start type of the server; acquiring restarting state information of the server according to the system event log; and judging the fault position of the server according to the restarting state information, thereby realizing fault positioning, being convenient for finding out the fault position quickly and improving the maintenance efficiency of the server.
Referring to fig. 3, fig. 3 is a structural diagram of a server anomaly locating device according to some embodiments of the present application.
Wherein, this server anomaly positioner includes: a query module 201, an acquisition module 202 and a judgment module 203.
The query module 201 is configured to query a system event log stored in the motherboard manager when the server fails, where the system event log includes a first system event log, the first system event log is generated by reading a count value of a counter stored in a complex programmable logic device and a restart type parameter, the count value of the counter is used to measure the number of times of restarting the server, and the restart type parameter is used to represent a last start type of the server. The system event log includes a first system event log and a second system event log. The first system event log is generated by reading a count value of a counter stored in a complex programmable logic device and a restart type parameter, wherein the count value of the counter is used for metering the restart times of the server, and the restart type parameter is used for representing the latest start type of the server. The second system event log is generated based on event records sent by the BIOS.
It will be appreciated that in some embodiments the query module 201 is further operable to: when an EventTrigger interrupt signal of the complex programmable logic device is detected, reading a count value of a counter and a restart type parameter which are stored in the complex programmable logic device; when the count value of the counter changes relative to the count value of the counter read last time, judging that the server is restarted, and generating a corresponding restart event record according to a restart type parameter; and updating a first system event log in the system event logs according to the restart event record. The complex programmable logic device monitors a PLTRST # signal and an SLP _ SX # signal of the south bridge chip, when the server host system is restarted, the two groups of signals in an X86 system mechanism change correspondingly according to different restarting types, when hot restarting occurs, only the PLTRST # signal is effective, and when cold restarting occurs, the PLTRST # signal and the SLP _ SX # signal are effective simultaneously. The complex programmable logic device records corresponding restart types to an internal register based on the rule of the PLTRST # signal and the SLP _ SX # signal on different restart types, and meanwhile, the count value of a counter is added with 1, and the count value of the counter is used for representing the starting times of the server. Then, the complex programmable logic device interrupts and informs the main board manager through the GPIO signal of EventTrigger #. When the mainboard manager detects an EventTrigger interrupt signal of the complex programmable logic device, the count value of the counter stored in the complex programmable logic device and the restart type parameter are read, the complex programmable logic device compares the read count value of the counter with the count value of the counter read last time, if the count value of the counter is the same as the count value of the counter read last time, the restart is not generated, and if the count value of the counter is different from the count value of the counter read last time, the restart is generated. If a reboot has occurred, a reboot event record is generated and then the first system event log is updated.
It is to be understood that the system event log further includes a second system event log; the second system event log is generated based on a plurality of operational event records of the BIOS 14. Wherein the plurality of operational event records comprises: the BIOS start event record, the display parameter initialization completion event record, and the BIOS start completion event record are not limited thereto.
Wherein the query module is further configured to: receiving a BIOS starting event record sent by the BIOS, wherein the BIOS starting event record is generated when the BIOS starts to start; updating the second system event log according to the BIOS starting event record; receiving a display parameter initialization completion event record sent by a BIOS, wherein the display parameter initialization completion event record is generated after the BIOS completes initialization operation on display parameters; initializing a completion event record according to the display parameters and updating a second system event log; receiving a BIOS start-up completion event record sent by the BIOS, wherein the BIOS start-up completion event record is generated after the BIOS finishes start-up and transmits a control right to an operating system of a server; and updating the second system event log according to the BIOS starting completion event record. The event records in the system event log are sorted according to the occurrence time of the event, so that the fault node is convenient to find.
The obtaining module 202 is configured to obtain the restart status information of the server according to the system event log. The restart status information includes: a reboot type of the server and a phase to which the server reboots. Wherein, the phase reached by the restart may be one of the following phases: the method comprises a restart initial stage, a BIOS starting and starting stage, a display parameter initialization completion stage and a BIOS starting and completing stage. Of course, it is not limited thereto. If the server is in a blank screen state and the system event log in the mainboard manager is not updated, it is indicated that the blank screen occurs due to the operating system fault of the server and no restarting action occurs. If only the restart event record is updated in the system event log of the mainboard manager, the server is restarted, and the restart is not carried out to the initial starting stage of the BIOS. If the system event log of the mainboard manager only updates the restart event record and the BIOS start event record, the card shows that the system is restarted before the display parameter initialization completion stage.
The judging module 203 is configured to judge a fault location of the server according to the restart status information. When the fault location is judged according to the restart status information, preliminary judgment is performed based on the stage to which the restart has proceeded.
For example, if the server is blank and the system event log in the motherboard manager is not updated, it indicates that the blank screen occurs under the operating system of the server, and no restart action occurs, it indicates that a fault occurs on the display screen or the display driver.
For example, if only the restart event record is updated in the system event log of the motherboard manager, it indicates that the server has restarted, and the restart does not proceed to the initial startup phase of the BIOS, it indicates that a failure occurs in the BIOS or the processor.
For example, if only the restart event record and the BIOS start event record are updated in the system event log of the motherboard manager, it indicates that the card after the system is restarted is before the display parameter initialization completion node, and it indicates that a fault occurs in the display screen or the display card portion.
Of course, the position where the specific fault occurs can be judged by combining other parameters, so that the accuracy of fault positioning is improved.
As can be seen from the above, in the embodiment of the present application, when the server fails, a system event log stored in the motherboard manager is queried, where the system event log includes a first system event log, the first system event log is generated by reading a count value of a counter stored in a complex programmable logic device and a restart type parameter, the count value of the counter is used to measure the number of times of restarting the server, and the restart type parameter is used to characterize a last start type of the server; acquiring restarting state information of the server according to the system event log; and judging the fault position of the server according to the restarting state information, thereby realizing fault positioning, being convenient for finding out the fault position quickly and improving the maintenance efficiency of the server.
The embodiment of the present application provides a storage medium, and when being executed by a processor, the computer program performs the method in any optional implementation manner of the above embodiment. The storage medium may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as a Static Random Access Memory (SRAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), an Erasable Programmable Read-Only Memory (EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A server exception positioning method is disclosed, wherein the server comprises a processor, a complex programmable logic device, a mainboard manager, a BIOS and a south bridge chip, wherein the complex programmable logic device is respectively connected with the mainboard manager, the south bridge chip and the processor, the south bridge chip is respectively connected with the BIOS and the processor, and the BIOS is connected with the mainboard manager; the method is applied to the mainboard manager, and is characterized in that the method comprises the following steps:
when the server fails, inquiring a system event log stored in the mainboard manager, wherein the system event log comprises a first system event log, the first system event log is generated by reading a count value of a counter stored in a complex programmable logic device and a restarting type parameter, the count value of the counter is used for metering the restarting times of the server, and the restarting type parameter is used for representing the latest starting type of the server;
acquiring restarting state information of the server according to the system event log;
and judging the fault position of the server according to the restarting state information.
2. The server exception location method of claim 1, wherein the step of querying a system event log stored in the motherboard manager is preceded by the step of:
when an EventTrigger interrupt signal of the complex programmable logic device is detected, reading a count value of a counter and a restart type parameter which are stored in the complex programmable logic device;
when the count value of the counter changes relative to the count value of the counter read last time, judging that the server is restarted, and generating a corresponding restart event record according to a restart type parameter;
and updating a first system event log in the system event logs according to the restart event record.
3. The server abnormal positioning method according to claim 1 or 2, wherein the restart status information comprises: a reboot type of the server and a phase to which the server reboots.
4. The server anomaly location method according to claim 2, wherein said system event log further comprises a second system event log; the second system event log is used for judging the reaching stage after the system is restarted and enters the BIOS, and the second system event log is generated based on a plurality of running event records of the BIOS.
5. The server exception location method of claim 4, wherein the plurality of run event records comprises a BIOS start event record;
the method further comprises the steps of:
receiving a BIOS starting event record sent by the BIOS, wherein the BIOS starting event record is generated when the BIOS starts to start;
and updating the second system event log according to the BIOS starting event record.
6. The server anomaly locating method according to claim 4, wherein said plurality of running event records further comprises displaying an initialization completion event record;
the method further comprises the steps of:
receiving a display parameter initialization completion event record sent by the BIOS, wherein the display parameter initialization completion event record is generated after the BIOS completes initialization operation on display parameters;
initializing a completion event record according to the display parameters and updating the second system event log;
or, the plurality of running event records further include a BIOS start completion event record;
the method further comprises the steps of:
receiving a BIOS start-up completion event record sent by the BIOS, wherein the BIOS start-up completion event record is generated after the BIOS finishes start-up and transmits a control right to an operating system of the server;
and updating the second system event log according to the BIOS starting completion event record.
7. The method for locating server abnormality according to claim 4, wherein said determining a fault location of the server according to the restart status information includes:
preliminarily screening out a server module with higher fault probability according to the restart type and the stage of restarting the server;
and confirming the fault position of the server from the screened server module with higher fault probability.
8. A server exception positioning device comprises a processor, a complex programmable logic device, a mainboard manager, a BIOS (basic input output System), and a south bridge chip, wherein the complex programmable logic device is respectively connected with the mainboard manager, the south bridge chip and the processor; the method is applied to the mainboard manager; characterized in that the device comprises:
the system event log comprises a first system event log, the first system event log is generated by reading a count value of a counter stored in a complex programmable logic device and a restart type parameter, the count value of the counter is used for metering the restart times of the server, and the restart type parameter is used for representing the latest starting type of the server;
the acquisition module is used for acquiring the restarting state information of the server according to the system event log;
and the judging module is used for judging the fault position of the server according to the restarting state information.
9. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, performs the method according to any of claims 1-7.
10. A server is characterized by comprising a processor, a complex programmable logic device, a mainboard manager, a BIOS and a south bridge chip, wherein the complex programmable logic device is respectively connected with the mainboard manager, the south bridge chip and the processor, and the BIOS is sequentially connected with the mainboard manager, the BIOS, the south bridge chip and the processor;
the motherboard manager is configured to perform the method of any of claims 1-7.
CN202010623604.7A 2020-06-30 2020-06-30 Server abnormity positioning method and device, storage medium and server Pending CN111722954A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010623604.7A CN111722954A (en) 2020-06-30 2020-06-30 Server abnormity positioning method and device, storage medium and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010623604.7A CN111722954A (en) 2020-06-30 2020-06-30 Server abnormity positioning method and device, storage medium and server

Publications (1)

Publication Number Publication Date
CN111722954A true CN111722954A (en) 2020-09-29

Family

ID=72571038

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010623604.7A Pending CN111722954A (en) 2020-06-30 2020-06-30 Server abnormity positioning method and device, storage medium and server

Country Status (1)

Country Link
CN (1) CN111722954A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112667462A (en) * 2020-12-15 2021-04-16 苏州浪潮智能科技有限公司 System, method and medium for monitoring double flash memory operation of server
CN112948157A (en) * 2021-01-29 2021-06-11 苏州浪潮智能科技有限公司 Server fault positioning method, device and system and computer readable storage medium
CN113254304A (en) * 2021-04-28 2021-08-13 中国长城科技集团股份有限公司 Method for determining shutdown type of server, server and storage medium
CN113806123A (en) * 2021-08-14 2021-12-17 苏州浪潮智能科技有限公司 System and method for positioning downtime of server and server
CN117234812A (en) * 2023-11-16 2023-12-15 中科泓泰电子有限公司 System and method for controlling restarting of server

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1713156A (en) * 2004-06-25 2005-12-28 联想(北京)有限公司 Method and device for detecting and diagnosing fault of computer hardware
CN103176873A (en) * 2011-12-23 2013-06-26 鸿富锦精密工业(深圳)有限公司 Counting card
CN104391765A (en) * 2014-10-27 2015-03-04 浪潮电子信息产业股份有限公司 Method for automatically diagnosing starting fault of server
TW201706844A (en) * 2015-08-04 2017-02-16 英業達股份有限公司 Power failure detection system and method thereof
CN106598790A (en) * 2015-10-16 2017-04-26 中兴通讯股份有限公司 Server hardware failure detection method, apparatus of server, and server
CN107193708A (en) * 2017-05-17 2017-09-22 郑州云海信息技术有限公司 A kind of condition detection method and system
CN109086155A (en) * 2018-07-27 2018-12-25 郑州云海信息技术有限公司 Server failure localization method, device, equipment and computer readable storage medium
CN109634796A (en) * 2018-12-14 2019-04-16 郑州云海信息技术有限公司 A kind of method for diagnosing faults of computer, apparatus and system
CN110134540A (en) * 2019-05-21 2019-08-16 苏州浪潮智能科技有限公司 A kind of log information collection method, device, equipment and readable storage medium storing program for executing
CN110609778A (en) * 2019-08-16 2019-12-24 苏州浪潮智能科技有限公司 Method and system for storing server downtime log
CN111290918A (en) * 2020-02-26 2020-06-16 苏州浪潮智能科技有限公司 Server running state monitoring method and device and computer readable storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1713156A (en) * 2004-06-25 2005-12-28 联想(北京)有限公司 Method and device for detecting and diagnosing fault of computer hardware
CN103176873A (en) * 2011-12-23 2013-06-26 鸿富锦精密工业(深圳)有限公司 Counting card
CN104391765A (en) * 2014-10-27 2015-03-04 浪潮电子信息产业股份有限公司 Method for automatically diagnosing starting fault of server
TW201706844A (en) * 2015-08-04 2017-02-16 英業達股份有限公司 Power failure detection system and method thereof
CN106598790A (en) * 2015-10-16 2017-04-26 中兴通讯股份有限公司 Server hardware failure detection method, apparatus of server, and server
CN107193708A (en) * 2017-05-17 2017-09-22 郑州云海信息技术有限公司 A kind of condition detection method and system
CN109086155A (en) * 2018-07-27 2018-12-25 郑州云海信息技术有限公司 Server failure localization method, device, equipment and computer readable storage medium
CN109634796A (en) * 2018-12-14 2019-04-16 郑州云海信息技术有限公司 A kind of method for diagnosing faults of computer, apparatus and system
CN110134540A (en) * 2019-05-21 2019-08-16 苏州浪潮智能科技有限公司 A kind of log information collection method, device, equipment and readable storage medium storing program for executing
CN110609778A (en) * 2019-08-16 2019-12-24 苏州浪潮智能科技有限公司 Method and system for storing server downtime log
CN111290918A (en) * 2020-02-26 2020-06-16 苏州浪潮智能科技有限公司 Server running state monitoring method and device and computer readable storage medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112667462A (en) * 2020-12-15 2021-04-16 苏州浪潮智能科技有限公司 System, method and medium for monitoring double flash memory operation of server
CN112948157A (en) * 2021-01-29 2021-06-11 苏州浪潮智能科技有限公司 Server fault positioning method, device and system and computer readable storage medium
WO2022160756A1 (en) * 2021-01-29 2022-08-04 苏州浪潮智能科技有限公司 Server fault positioning method, apparatus and system, and computer-readable storage medium
CN112948157B (en) * 2021-01-29 2022-12-23 苏州浪潮智能科技有限公司 Server fault positioning method, device and system and computer readable storage medium
CN113254304A (en) * 2021-04-28 2021-08-13 中国长城科技集团股份有限公司 Method for determining shutdown type of server, server and storage medium
CN113806123A (en) * 2021-08-14 2021-12-17 苏州浪潮智能科技有限公司 System and method for positioning downtime of server and server
CN113806123B (en) * 2021-08-14 2023-08-08 苏州浪潮智能科技有限公司 Server downtime positioning system and method and server
CN117234812A (en) * 2023-11-16 2023-12-15 中科泓泰电子有限公司 System and method for controlling restarting of server
CN117234812B (en) * 2023-11-16 2024-01-30 中科泓泰电子有限公司 System and method for controlling restarting of server

Similar Documents

Publication Publication Date Title
CN111722954A (en) Server abnormity positioning method and device, storage medium and server
WO2022160756A1 (en) Server fault positioning method, apparatus and system, and computer-readable storage medium
US6502208B1 (en) Method and system for check stop error handling
US10365961B2 (en) Information handling system pre-boot fault management
US7734945B1 (en) Automated recovery of unbootable systems
CN103150231B (en) The method of computer booting and computer system
US8140907B2 (en) Accelerated virtual environments deployment troubleshooting based on two level file system signature
US6934879B2 (en) Method and apparatus for backing up and restoring data from nonvolatile memory
US20040260678A1 (en) State based configuration failure detection using checkpoint comparison
US20120239981A1 (en) Method To Detect Firmware / Software Errors For Hardware Monitoring
JPH05173808A (en) Diagnostic system for personal computer and interface
CN110750396B (en) Server operating system compatibility testing method and device and storage medium
US7975084B1 (en) Configuring a host computer using a service processor
US6550019B1 (en) Method and apparatus for problem identification during initial program load in a multiprocessor system
CN111158968B (en) BIOS configuration information self-checking method, device and storage medium
US8176309B2 (en) Boot system has BIOS that reads rescue operating system from memory device via input/output chip based on detecting a temperature of a hard disk
CN110502386B (en) Method and device for diagnosing faults of hard disk on line
CN112506693A (en) Method and device for recording abnormal information, storage medium and electronic equipment
CN104657232A (en) BIOS automatic recovery system and BIOS automatic recovery method
US7673082B2 (en) Method and system to determine device criticality for hot-plugging in computer configurations
CN105159810A (en) Method and device for testing BIOS of computer system
WO2019094233A2 (en) Systems and methods of deploying an operating system from a resilient virtual drive
CN114116330A (en) Server performance test method, system, terminal and storage medium
CN115269244A (en) Control method and device and electronic equipment
US11354109B1 (en) Firmware updates using updated firmware files in a dedicated firmware volume

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination