CN111722954A - Server abnormity positioning method and device, storage medium and server - Google Patents
Server abnormity positioning method and device, storage medium and server Download PDFInfo
- Publication number
- CN111722954A CN111722954A CN202010623604.7A CN202010623604A CN111722954A CN 111722954 A CN111722954 A CN 111722954A CN 202010623604 A CN202010623604 A CN 202010623604A CN 111722954 A CN111722954 A CN 111722954A
- Authority
- CN
- China
- Prior art keywords
- server
- bios
- event log
- system event
- manager
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 230000002159 abnormal effect Effects 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 5
- 230000005856 abnormality Effects 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 238000012423 maintenance Methods 0.000 abstract description 8
- 230000009471 action Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/079—Root cause analysis, i.e. error or fault diagnosis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/22—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
- G06F11/2268—Logging of test results
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computer Hardware Design (AREA)
- Debugging And Monitoring (AREA)
Abstract
The application provides a server abnormity positioning method, a device, a storage medium and a server, wherein the method comprises the following steps: when the server fails, inquiring a system event log stored in the mainboard manager, wherein the system event log comprises a first system event log, the first system event log is generated by reading a count value of a counter stored in a complex programmable logic device and a restarting type parameter, the count value of the counter is used for metering the restarting times of the server, and the restarting type parameter is used for representing the latest starting type of the server; acquiring restarting state information of the server according to the system event log; and judging the fault position of the server according to the restarting state information. According to the method and the device, the fault position of the server is judged according to the restarting state information, so that fault positioning is realized, the fault position can be found conveniently and rapidly, and the maintenance efficiency of the server is improved.
Description
Technical Field
The present disclosure relates to the field of server maintenance technologies, and in particular, to a method and an apparatus for locating server abnormality, a storage medium, and a server.
Background
With the rise of cloud computing, the number of X86 servers deployed in a data center has multiplied. Monitoring and diagnosing abnormal phenomena of the servers, particularly abnormal downtime and restart, is a very important work of server research and development and operation and maintenance departments. The server's motherboard manager assumes the responsibility of this monitoring for failures and abnormal reboots.
In the currently used technology, the motherboard manager relies on its record to record SEL events sent by the BIOS. And judging whether the server is started to which stage or not and whether abnormal restart occurs or not according to the event records sent by the BIOS. However, when the actual server fails abnormally, the BIOS has not yet reached the first instruction. Under the condition, it is difficult to judge what the reason of the black screen phenomenon occurs in the system, and whether the system is restarted or not cannot be judged, so that the fault phenomenon cannot be positioned.
In view of the above problems, no effective technical solution exists at present.
Disclosure of Invention
An object of the embodiments of the present application is to provide a method and an apparatus for server exception location, a storage medium, and a server, so as to improve server maintenance efficiency.
In a first aspect, an embodiment of the present application provides a server exception location method, where the server includes a processor, a complex programmable logic device, a motherboard manager, a BIOS, and a south bridge chip, where the complex programmable logic device is connected to the motherboard manager, the south bridge chip, and the processor, the south bridge chip is connected to the BIOS and the processor, and the BIOS is connected to the motherboard manager; the method is applied to the mainboard manager; the method is applied to the mainboard manager, and comprises the following steps:
when the server fails, inquiring a system event log stored in the mainboard manager, wherein the system event log comprises a first system event log, the first system event log is generated by reading a count value of a counter stored in a complex programmable logic device and a restarting type parameter, the count value of the counter is used for metering the restarting times of the server, and the restarting type parameter is used for representing the latest starting type of the server;
acquiring restarting state information of the server according to the system event log;
and judging the fault position of the server according to the restarting state information.
Optionally, in the method for locating an exception of a server according to the embodiment of the present application, before the step of querying a system event log stored in the motherboard manager, the method further includes:
when an EventTrigger interrupt signal of the complex programmable logic device is detected, reading a count value of a counter and a restart type parameter which are stored in the complex programmable logic device;
when the count value of the counter changes relative to the count value of the counter read last time, judging that the server is restarted, and generating a corresponding restart event record according to a restart type parameter;
and updating a first system event log in the system event logs according to the restart event record.
Optionally, in the server anomaly positioning method according to the embodiment of the present application, the system event log further includes a second system event log; the second system event log is used for judging the reaching stage after the system is restarted and enters the BIOS, and the second system event log is generated based on a plurality of running event records of the BIOS.
Optionally, in the server abnormal location method according to the embodiment of the present application, the restart state information includes: a reboot type of the server and a phase to which the server reboots.
Optionally, in the server exception location method according to the embodiment of the present application, the plurality of running event records include a BIOS start event record;
the method further comprises the steps of:
receiving a BIOS starting event record sent by the BIOS, wherein the BIOS starting event record is generated when the BIOS starts to start;
and updating the second system event log according to the BIOS starting event record.
Optionally, in the server exception positioning method according to the embodiment of the present application, the plurality of running event records further include a display initialization completion event record;
the method further comprises the steps of:
receiving a display parameter initialization completion event record sent by the BIOS, wherein the display parameter initialization completion event record is generated after the BIOS completes initialization operation on display parameters;
and updating the second system event log according to the display parameter initialization completion event record.
Optionally, in the server exception location method according to the embodiment of the present application, the plurality of running event records further include a BIOS start completion event record;
the method further comprises the steps of:
receiving a BIOS start-up completion event record sent by the BIOS, wherein the BIOS start-up completion event record is generated after the BIOS finishes start-up and transmits a control right to an operating system of the server;
and updating the second system event log according to the BIOS starting completion event record.
Optionally, in the method for locating an abnormality of a server according to the embodiment of the present application, the determining a fault location of the server according to the restart status information includes:
preliminarily screening out a server module with higher fault probability according to the restart type and the stage of restarting the server;
and confirming the fault position of the server from the screened server module with higher fault probability.
In a second aspect, an embodiment of the present application further provides a server exception locating device, where the server includes a processor, a complex programmable logic device, a motherboard manager, a BIOS, and a south bridge chip, where the complex programmable logic device is connected to the motherboard manager, the south bridge chip, and the processor, the south bridge chip is connected to the BIOS and the processor, and the BIOS is connected to the motherboard manager; the method is applied to the mainboard manager; the device comprises:
the system event log comprises a first system event log, the first system event log is generated by reading a count value of a counter stored in a complex programmable logic device and a restart type parameter, the count value of the counter is used for metering the restart times of the server, and the restart type parameter is used for representing the latest starting type of the server;
the acquisition module is used for acquiring the restarting state information of the server according to the system event log;
and the judging module is used for judging the fault position of the server according to the restarting state information.
In a third aspect, the present application further provides a storage medium having a computer program stored thereon, where the computer program is executed by a processor to execute the method according to any one of the above descriptions.
In a fourth aspect, an embodiment of the present application further provides a server, including a processor, a complex programmable logic device, a motherboard manager, a BIOS, and a south bridge chip, where the complex programmable logic device is connected to the motherboard manager, the south bridge chip, and the processor, respectively, and the BIOS is connected to the motherboard manager, the BIOS, the south bridge chip, and the processor in sequence;
the mainboard manager is used for executing the method of any one of the above items.
As can be seen from the above, the server exception location method, the apparatus, the storage medium, and the server provided in the embodiments of the present application query a system event log stored in the motherboard manager when the server fails, where the system event log includes a first system event log, the first system event log is generated by reading a count value of a counter stored in a complex programmable logic device and a restart type parameter, the count value of the counter is used to measure the number of times of restarting the server, and the restart type parameter is used to characterize the latest start type of the server; acquiring restarting state information of the server according to the system event log; and judging the fault position of the server according to the restarting state information, thereby realizing fault positioning, being convenient for finding out the fault position quickly and improving the maintenance efficiency of the server.
Additional features and advantages of the present application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a flowchart of a server anomaly positioning method according to an embodiment of the present application.
Fig. 2 is a schematic structural diagram of a server according to an embodiment of the present application.
Fig. 3 is a schematic structural diagram of a server anomaly positioning device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
Referring to fig. 1, fig. 1 is a flowchart illustrating a server anomaly locating method according to some embodiments of the present disclosure. Referring to fig. 2, fig. 2 is a schematic structural diagram of a server in the embodiment of the present application, where the server includes a processor 11, a complex programmable logic device 12, a motherboard manager 13, a BIOS (Basic Input Output System) 14, and a south bridge chip 15, where the complex programmable logic device 12 is connected to the motherboard manager 13, the south bridge chip 15, and the processor 11, the south bridge chip 15 is connected to the BIOS14 and the processor 11, and the BIOS14 is connected to the motherboard manager 13; the method is applied to the mainboard manager 13; the server anomaly positioning method is applied to the mainboard manager 13.
The server abnormity positioning method comprises the following steps:
s101, when the server fails, inquiring a system event log stored in the mainboard manager, wherein the system event log comprises a first system event log, the first system event log is generated by reading a counter value of a counter stored in a complex programmable logic device and a restarting type parameter, the counter value of the counter is used for metering the restarting times of the server, and the restarting type parameter is used for representing the latest starting type of the server.
And S102, acquiring the restarting state information of the server according to the system event log.
S103, judging the fault position of the server according to the restarting state information.
In step S101, the system event log includes a first system event log and a second system event log. The first system event log is generated by reading a count value of a counter stored in a complex programmable logic device and a restart type parameter, wherein the count value of the counter is used for metering the restart times of the server, and the restart type parameter is used for representing the latest start type of the server. The second system event log is generated based on an event record sent by the BIOS, and the first system event log is used for judging the phase and the restarting type which are reached before the system is restarted and enters the BIOS. The second system event log is used for judging the reaching stage after the system is restarted and enters the BIOS.
It is understood that, in some embodiments, before executing the step S101, the following steps are further included: s1001, when an EventTrigger interrupt signal of the complex programmable logic device is detected, reading a counter value and a restart type parameter stored in the complex programmable logic device. S1002, when the count value of the counter changes relative to the count value of the counter read last time, judging that the server is restarted, and generating a corresponding restart event record according to the restart type parameter. S1003, updating a first system event log in the system event logs according to the restart event record.
When the server host system is restarted, the two groups of signals in the X86 system mechanism change correspondingly according to different restarting types, when hot restarting occurs, only the PLTRST # signal is effective, and when cold restarting occurs, the PLTRST # signal and the SLP _ SX # signal are effective at the same time. The complex programmable logic device records corresponding restart types to an internal register based on the rule of the PLTRST # signal and the SLP _ SX # signal on different restart types, and meanwhile, the count value of a counter is added with 1, and the count value of the counter is used for representing the starting times of the server. Then, the complex programmable logic device interrupts and informs the main board manager through the GPIO signal of EventTrigger #. When the mainboard manager detects an EventTrigger interrupt signal of the complex programmable logic device, the count value of the counter stored in the complex programmable logic device and the restart type parameter are read, the complex programmable logic device compares the read count value of the counter with the count value of the counter read last time, if the count value of the counter is the same as the count value of the counter read last time, the restart is not generated, and if the count value of the counter is different from the count value of the counter read last time, the restart is generated. If a reboot has occurred, a reboot event record is generated and then the first system event log is updated.
It is to be understood that the system event log further includes a second system event log; a second system event log is generated based on a plurality of operational event records of the BIOS. Wherein the plurality of operational event records comprises: the BIOS start event record, the display parameter initialization completion event record, and the BIOS start completion event record are not limited thereto.
Specifically, in some embodiments, before executing step S101, the following steps are further included:
and S1004, receiving a BIOS starting event record sent by the BIOS, wherein the BIOS starting event record is generated when the BIOS starts to start.
S1005, updating the second system event log according to the BIOS starting event record.
S1006, receiving a display parameter initialization completion event record sent by the BIOS, wherein the display parameter initialization completion event record is generated after the BIOS completes initialization operation on the display parameters.
And S1007, updating the second system event log according to the display parameter initialization completion event record.
And S1008, receiving a BIOS start-up completion event record sent by the BIOS, wherein the BIOS start-up completion event record is generated after the BIOS finishes start-up and transmits the control right to an operating system of the server.
And S1009, updating the second system event log according to the BIOS starting completion event record.
The event records in the system event log are sorted according to the occurrence time of the event, so that the fault node is convenient to find.
In step S102, the restart status information includes: a reboot type of the server and a phase to which the server reboots. Wherein, the phase reached by the restart may be one of the following phases: the method comprises a restart initial stage, a BIOS starting and starting stage, a display parameter initialization completion stage and a BIOS starting and completing stage. Of course, it is not limited thereto. If the server is in a blank screen state and the system event log in the mainboard manager is not updated, it is indicated that the blank screen occurs due to the operating system fault of the server and no restarting action occurs. If only the restart event record is updated in the system event log of the mainboard manager, the server is restarted, and the restart is not carried out to the initial starting stage of the BIOS. If the system event log of the mainboard manager only updates the restart event record and the BIOS start event record, the card shows that the system is restarted before the display parameter initialization completion stage.
In step S103, when the fault location is determined from the restart status information, a preliminary determination is made based on the stage to which the restart has proceeded.
For example, if the server is blank and the system event log in the motherboard manager is not updated, it indicates that the blank screen occurs under the operating system of the server, and no restart action occurs, it indicates that a fault occurs on the display screen or the display driver.
For example, if only the restart event record is updated in the system event log of the motherboard manager, it indicates that the server has restarted, and the restart does not proceed to the initial startup phase of the BIOS, it indicates that a failure occurs in the BIOS or the processor.
For example, if only the restart event record and the BIOS start event record are updated in the system event log of the motherboard manager, it indicates that the card after the system is restarted is before the display parameter initialization completion node, and it indicates that a fault occurs in the display screen or the display card portion.
Of course, the position where the specific fault occurs can be judged by combining other parameters, so that the accuracy of fault positioning is improved.
For example, in some embodiments, this step S103 includes: s1031, preliminarily screening out a server module with higher fault probability according to the restart type and the stage reached by the server restart; s1032, confirming the fault position of the server from the server module with the larger fault occurrence probability. For example, if only the restart event record and the BIOS start event record are updated in the system event log of the motherboard manager, it indicates that the card is stuck before the display parameter initialization completion stage after the system is restarted, and therefore, the server modules that have failed may be preliminarily screened as follows: display, display card. Maintenance personnel can then obtain some status information of the display and graphics card, thereby allowing the specific location of the fault to be determined.
As can be seen from the above, in the embodiment of the present application, when the server fails, a system event log stored in the motherboard manager is queried, where the system event log includes a first system event log, the first system event log is generated by reading a count value of a counter stored in a complex programmable logic device and a restart type parameter, the count value of the counter is used to measure the number of times of restarting the server, and the restart type parameter is used to characterize a last start type of the server; acquiring restarting state information of the server according to the system event log; and judging the fault position of the server according to the restarting state information, thereby realizing fault positioning, being convenient for finding out the fault position quickly and improving the maintenance efficiency of the server.
Referring to fig. 3, fig. 3 is a structural diagram of a server anomaly locating device according to some embodiments of the present application.
Wherein, this server anomaly positioner includes: a query module 201, an acquisition module 202 and a judgment module 203.
The query module 201 is configured to query a system event log stored in the motherboard manager when the server fails, where the system event log includes a first system event log, the first system event log is generated by reading a count value of a counter stored in a complex programmable logic device and a restart type parameter, the count value of the counter is used to measure the number of times of restarting the server, and the restart type parameter is used to represent a last start type of the server. The system event log includes a first system event log and a second system event log. The first system event log is generated by reading a count value of a counter stored in a complex programmable logic device and a restart type parameter, wherein the count value of the counter is used for metering the restart times of the server, and the restart type parameter is used for representing the latest start type of the server. The second system event log is generated based on event records sent by the BIOS.
It will be appreciated that in some embodiments the query module 201 is further operable to: when an EventTrigger interrupt signal of the complex programmable logic device is detected, reading a count value of a counter and a restart type parameter which are stored in the complex programmable logic device; when the count value of the counter changes relative to the count value of the counter read last time, judging that the server is restarted, and generating a corresponding restart event record according to a restart type parameter; and updating a first system event log in the system event logs according to the restart event record. The complex programmable logic device monitors a PLTRST # signal and an SLP _ SX # signal of the south bridge chip, when the server host system is restarted, the two groups of signals in an X86 system mechanism change correspondingly according to different restarting types, when hot restarting occurs, only the PLTRST # signal is effective, and when cold restarting occurs, the PLTRST # signal and the SLP _ SX # signal are effective simultaneously. The complex programmable logic device records corresponding restart types to an internal register based on the rule of the PLTRST # signal and the SLP _ SX # signal on different restart types, and meanwhile, the count value of a counter is added with 1, and the count value of the counter is used for representing the starting times of the server. Then, the complex programmable logic device interrupts and informs the main board manager through the GPIO signal of EventTrigger #. When the mainboard manager detects an EventTrigger interrupt signal of the complex programmable logic device, the count value of the counter stored in the complex programmable logic device and the restart type parameter are read, the complex programmable logic device compares the read count value of the counter with the count value of the counter read last time, if the count value of the counter is the same as the count value of the counter read last time, the restart is not generated, and if the count value of the counter is different from the count value of the counter read last time, the restart is generated. If a reboot has occurred, a reboot event record is generated and then the first system event log is updated.
It is to be understood that the system event log further includes a second system event log; the second system event log is generated based on a plurality of operational event records of the BIOS 14. Wherein the plurality of operational event records comprises: the BIOS start event record, the display parameter initialization completion event record, and the BIOS start completion event record are not limited thereto.
Wherein the query module is further configured to: receiving a BIOS starting event record sent by the BIOS, wherein the BIOS starting event record is generated when the BIOS starts to start; updating the second system event log according to the BIOS starting event record; receiving a display parameter initialization completion event record sent by a BIOS, wherein the display parameter initialization completion event record is generated after the BIOS completes initialization operation on display parameters; initializing a completion event record according to the display parameters and updating a second system event log; receiving a BIOS start-up completion event record sent by the BIOS, wherein the BIOS start-up completion event record is generated after the BIOS finishes start-up and transmits a control right to an operating system of a server; and updating the second system event log according to the BIOS starting completion event record. The event records in the system event log are sorted according to the occurrence time of the event, so that the fault node is convenient to find.
The obtaining module 202 is configured to obtain the restart status information of the server according to the system event log. The restart status information includes: a reboot type of the server and a phase to which the server reboots. Wherein, the phase reached by the restart may be one of the following phases: the method comprises a restart initial stage, a BIOS starting and starting stage, a display parameter initialization completion stage and a BIOS starting and completing stage. Of course, it is not limited thereto. If the server is in a blank screen state and the system event log in the mainboard manager is not updated, it is indicated that the blank screen occurs due to the operating system fault of the server and no restarting action occurs. If only the restart event record is updated in the system event log of the mainboard manager, the server is restarted, and the restart is not carried out to the initial starting stage of the BIOS. If the system event log of the mainboard manager only updates the restart event record and the BIOS start event record, the card shows that the system is restarted before the display parameter initialization completion stage.
The judging module 203 is configured to judge a fault location of the server according to the restart status information. When the fault location is judged according to the restart status information, preliminary judgment is performed based on the stage to which the restart has proceeded.
For example, if the server is blank and the system event log in the motherboard manager is not updated, it indicates that the blank screen occurs under the operating system of the server, and no restart action occurs, it indicates that a fault occurs on the display screen or the display driver.
For example, if only the restart event record is updated in the system event log of the motherboard manager, it indicates that the server has restarted, and the restart does not proceed to the initial startup phase of the BIOS, it indicates that a failure occurs in the BIOS or the processor.
For example, if only the restart event record and the BIOS start event record are updated in the system event log of the motherboard manager, it indicates that the card after the system is restarted is before the display parameter initialization completion node, and it indicates that a fault occurs in the display screen or the display card portion.
Of course, the position where the specific fault occurs can be judged by combining other parameters, so that the accuracy of fault positioning is improved.
As can be seen from the above, in the embodiment of the present application, when the server fails, a system event log stored in the motherboard manager is queried, where the system event log includes a first system event log, the first system event log is generated by reading a count value of a counter stored in a complex programmable logic device and a restart type parameter, the count value of the counter is used to measure the number of times of restarting the server, and the restart type parameter is used to characterize a last start type of the server; acquiring restarting state information of the server according to the system event log; and judging the fault position of the server according to the restarting state information, thereby realizing fault positioning, being convenient for finding out the fault position quickly and improving the maintenance efficiency of the server.
The embodiment of the present application provides a storage medium, and when being executed by a processor, the computer program performs the method in any optional implementation manner of the above embodiment. The storage medium may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as a Static Random Access Memory (SRAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), an Erasable Programmable Read-Only Memory (EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Claims (10)
1. A server exception positioning method is disclosed, wherein the server comprises a processor, a complex programmable logic device, a mainboard manager, a BIOS and a south bridge chip, wherein the complex programmable logic device is respectively connected with the mainboard manager, the south bridge chip and the processor, the south bridge chip is respectively connected with the BIOS and the processor, and the BIOS is connected with the mainboard manager; the method is applied to the mainboard manager, and is characterized in that the method comprises the following steps:
when the server fails, inquiring a system event log stored in the mainboard manager, wherein the system event log comprises a first system event log, the first system event log is generated by reading a count value of a counter stored in a complex programmable logic device and a restarting type parameter, the count value of the counter is used for metering the restarting times of the server, and the restarting type parameter is used for representing the latest starting type of the server;
acquiring restarting state information of the server according to the system event log;
and judging the fault position of the server according to the restarting state information.
2. The server exception location method of claim 1, wherein the step of querying a system event log stored in the motherboard manager is preceded by the step of:
when an EventTrigger interrupt signal of the complex programmable logic device is detected, reading a count value of a counter and a restart type parameter which are stored in the complex programmable logic device;
when the count value of the counter changes relative to the count value of the counter read last time, judging that the server is restarted, and generating a corresponding restart event record according to a restart type parameter;
and updating a first system event log in the system event logs according to the restart event record.
3. The server abnormal positioning method according to claim 1 or 2, wherein the restart status information comprises: a reboot type of the server and a phase to which the server reboots.
4. The server anomaly location method according to claim 2, wherein said system event log further comprises a second system event log; the second system event log is used for judging the reaching stage after the system is restarted and enters the BIOS, and the second system event log is generated based on a plurality of running event records of the BIOS.
5. The server exception location method of claim 4, wherein the plurality of run event records comprises a BIOS start event record;
the method further comprises the steps of:
receiving a BIOS starting event record sent by the BIOS, wherein the BIOS starting event record is generated when the BIOS starts to start;
and updating the second system event log according to the BIOS starting event record.
6. The server anomaly locating method according to claim 4, wherein said plurality of running event records further comprises displaying an initialization completion event record;
the method further comprises the steps of:
receiving a display parameter initialization completion event record sent by the BIOS, wherein the display parameter initialization completion event record is generated after the BIOS completes initialization operation on display parameters;
initializing a completion event record according to the display parameters and updating the second system event log;
or, the plurality of running event records further include a BIOS start completion event record;
the method further comprises the steps of:
receiving a BIOS start-up completion event record sent by the BIOS, wherein the BIOS start-up completion event record is generated after the BIOS finishes start-up and transmits a control right to an operating system of the server;
and updating the second system event log according to the BIOS starting completion event record.
7. The method for locating server abnormality according to claim 4, wherein said determining a fault location of the server according to the restart status information includes:
preliminarily screening out a server module with higher fault probability according to the restart type and the stage of restarting the server;
and confirming the fault position of the server from the screened server module with higher fault probability.
8. A server exception positioning device comprises a processor, a complex programmable logic device, a mainboard manager, a BIOS (basic input output System), and a south bridge chip, wherein the complex programmable logic device is respectively connected with the mainboard manager, the south bridge chip and the processor; the method is applied to the mainboard manager; characterized in that the device comprises:
the system event log comprises a first system event log, the first system event log is generated by reading a count value of a counter stored in a complex programmable logic device and a restart type parameter, the count value of the counter is used for metering the restart times of the server, and the restart type parameter is used for representing the latest starting type of the server;
the acquisition module is used for acquiring the restarting state information of the server according to the system event log;
and the judging module is used for judging the fault position of the server according to the restarting state information.
9. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, performs the method according to any of claims 1-7.
10. A server is characterized by comprising a processor, a complex programmable logic device, a mainboard manager, a BIOS and a south bridge chip, wherein the complex programmable logic device is respectively connected with the mainboard manager, the south bridge chip and the processor, and the BIOS is sequentially connected with the mainboard manager, the BIOS, the south bridge chip and the processor;
the motherboard manager is configured to perform the method of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010623604.7A CN111722954A (en) | 2020-06-30 | 2020-06-30 | Server abnormity positioning method and device, storage medium and server |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010623604.7A CN111722954A (en) | 2020-06-30 | 2020-06-30 | Server abnormity positioning method and device, storage medium and server |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111722954A true CN111722954A (en) | 2020-09-29 |
Family
ID=72571038
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010623604.7A Pending CN111722954A (en) | 2020-06-30 | 2020-06-30 | Server abnormity positioning method and device, storage medium and server |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111722954A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112667462A (en) * | 2020-12-15 | 2021-04-16 | 苏州浪潮智能科技有限公司 | System, method and medium for monitoring double flash memory operation of server |
CN112948157A (en) * | 2021-01-29 | 2021-06-11 | 苏州浪潮智能科技有限公司 | Server fault positioning method, device and system and computer readable storage medium |
CN113254304A (en) * | 2021-04-28 | 2021-08-13 | 中国长城科技集团股份有限公司 | Method for determining shutdown type of server, server and storage medium |
CN113806123A (en) * | 2021-08-14 | 2021-12-17 | 苏州浪潮智能科技有限公司 | System and method for positioning downtime of server and server |
CN117234812A (en) * | 2023-11-16 | 2023-12-15 | 中科泓泰电子有限公司 | System and method for controlling restarting of server |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1713156A (en) * | 2004-06-25 | 2005-12-28 | 联想(北京)有限公司 | Method and device for detecting and diagnosing fault of computer hardware |
CN103176873A (en) * | 2011-12-23 | 2013-06-26 | 鸿富锦精密工业(深圳)有限公司 | Counting card |
CN104391765A (en) * | 2014-10-27 | 2015-03-04 | 浪潮电子信息产业股份有限公司 | Method for automatically diagnosing starting fault of server |
TW201706844A (en) * | 2015-08-04 | 2017-02-16 | 英業達股份有限公司 | Power failure detection system and method thereof |
CN106598790A (en) * | 2015-10-16 | 2017-04-26 | 中兴通讯股份有限公司 | Server hardware failure detection method, apparatus of server, and server |
CN107193708A (en) * | 2017-05-17 | 2017-09-22 | 郑州云海信息技术有限公司 | A kind of condition detection method and system |
CN109086155A (en) * | 2018-07-27 | 2018-12-25 | 郑州云海信息技术有限公司 | Server failure localization method, device, equipment and computer readable storage medium |
CN109634796A (en) * | 2018-12-14 | 2019-04-16 | 郑州云海信息技术有限公司 | A kind of method for diagnosing faults of computer, apparatus and system |
CN110134540A (en) * | 2019-05-21 | 2019-08-16 | 苏州浪潮智能科技有限公司 | A kind of log information collection method, device, equipment and readable storage medium storing program for executing |
CN110609778A (en) * | 2019-08-16 | 2019-12-24 | 苏州浪潮智能科技有限公司 | Method and system for storing server downtime log |
CN111290918A (en) * | 2020-02-26 | 2020-06-16 | 苏州浪潮智能科技有限公司 | Server running state monitoring method and device and computer readable storage medium |
-
2020
- 2020-06-30 CN CN202010623604.7A patent/CN111722954A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1713156A (en) * | 2004-06-25 | 2005-12-28 | 联想(北京)有限公司 | Method and device for detecting and diagnosing fault of computer hardware |
CN103176873A (en) * | 2011-12-23 | 2013-06-26 | 鸿富锦精密工业(深圳)有限公司 | Counting card |
CN104391765A (en) * | 2014-10-27 | 2015-03-04 | 浪潮电子信息产业股份有限公司 | Method for automatically diagnosing starting fault of server |
TW201706844A (en) * | 2015-08-04 | 2017-02-16 | 英業達股份有限公司 | Power failure detection system and method thereof |
CN106598790A (en) * | 2015-10-16 | 2017-04-26 | 中兴通讯股份有限公司 | Server hardware failure detection method, apparatus of server, and server |
CN107193708A (en) * | 2017-05-17 | 2017-09-22 | 郑州云海信息技术有限公司 | A kind of condition detection method and system |
CN109086155A (en) * | 2018-07-27 | 2018-12-25 | 郑州云海信息技术有限公司 | Server failure localization method, device, equipment and computer readable storage medium |
CN109634796A (en) * | 2018-12-14 | 2019-04-16 | 郑州云海信息技术有限公司 | A kind of method for diagnosing faults of computer, apparatus and system |
CN110134540A (en) * | 2019-05-21 | 2019-08-16 | 苏州浪潮智能科技有限公司 | A kind of log information collection method, device, equipment and readable storage medium storing program for executing |
CN110609778A (en) * | 2019-08-16 | 2019-12-24 | 苏州浪潮智能科技有限公司 | Method and system for storing server downtime log |
CN111290918A (en) * | 2020-02-26 | 2020-06-16 | 苏州浪潮智能科技有限公司 | Server running state monitoring method and device and computer readable storage medium |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112667462A (en) * | 2020-12-15 | 2021-04-16 | 苏州浪潮智能科技有限公司 | System, method and medium for monitoring double flash memory operation of server |
CN112948157A (en) * | 2021-01-29 | 2021-06-11 | 苏州浪潮智能科技有限公司 | Server fault positioning method, device and system and computer readable storage medium |
WO2022160756A1 (en) * | 2021-01-29 | 2022-08-04 | 苏州浪潮智能科技有限公司 | Server fault positioning method, apparatus and system, and computer-readable storage medium |
CN112948157B (en) * | 2021-01-29 | 2022-12-23 | 苏州浪潮智能科技有限公司 | Server fault positioning method, device and system and computer readable storage medium |
CN113254304A (en) * | 2021-04-28 | 2021-08-13 | 中国长城科技集团股份有限公司 | Method for determining shutdown type of server, server and storage medium |
CN113806123A (en) * | 2021-08-14 | 2021-12-17 | 苏州浪潮智能科技有限公司 | System and method for positioning downtime of server and server |
CN113806123B (en) * | 2021-08-14 | 2023-08-08 | 苏州浪潮智能科技有限公司 | Server downtime positioning system and method and server |
CN117234812A (en) * | 2023-11-16 | 2023-12-15 | 中科泓泰电子有限公司 | System and method for controlling restarting of server |
CN117234812B (en) * | 2023-11-16 | 2024-01-30 | 中科泓泰电子有限公司 | System and method for controlling restarting of server |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111722954A (en) | Server abnormity positioning method and device, storage medium and server | |
WO2022160756A1 (en) | Server fault positioning method, apparatus and system, and computer-readable storage medium | |
US6502208B1 (en) | Method and system for check stop error handling | |
US10365961B2 (en) | Information handling system pre-boot fault management | |
US7734945B1 (en) | Automated recovery of unbootable systems | |
CN103150231B (en) | The method of computer booting and computer system | |
US8140907B2 (en) | Accelerated virtual environments deployment troubleshooting based on two level file system signature | |
US6934879B2 (en) | Method and apparatus for backing up and restoring data from nonvolatile memory | |
US20040260678A1 (en) | State based configuration failure detection using checkpoint comparison | |
US20120239981A1 (en) | Method To Detect Firmware / Software Errors For Hardware Monitoring | |
JPH05173808A (en) | Diagnostic system for personal computer and interface | |
CN110750396B (en) | Server operating system compatibility testing method and device and storage medium | |
US7975084B1 (en) | Configuring a host computer using a service processor | |
US6550019B1 (en) | Method and apparatus for problem identification during initial program load in a multiprocessor system | |
CN111158968B (en) | BIOS configuration information self-checking method, device and storage medium | |
US8176309B2 (en) | Boot system has BIOS that reads rescue operating system from memory device via input/output chip based on detecting a temperature of a hard disk | |
CN110502386B (en) | Method and device for diagnosing faults of hard disk on line | |
CN112506693A (en) | Method and device for recording abnormal information, storage medium and electronic equipment | |
CN104657232A (en) | BIOS automatic recovery system and BIOS automatic recovery method | |
US7673082B2 (en) | Method and system to determine device criticality for hot-plugging in computer configurations | |
CN105159810A (en) | Method and device for testing BIOS of computer system | |
WO2019094233A2 (en) | Systems and methods of deploying an operating system from a resilient virtual drive | |
CN114116330A (en) | Server performance test method, system, terminal and storage medium | |
CN115269244A (en) | Control method and device and electronic equipment | |
US11354109B1 (en) | Firmware updates using updated firmware files in a dedicated firmware volume |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |