CN114816022B - Method, system and storage medium for monitoring server power supply abnormality - Google Patents

Method, system and storage medium for monitoring server power supply abnormality Download PDF

Info

Publication number
CN114816022B
CN114816022B CN202210463541.2A CN202210463541A CN114816022B CN 114816022 B CN114816022 B CN 114816022B CN 202210463541 A CN202210463541 A CN 202210463541A CN 114816022 B CN114816022 B CN 114816022B
Authority
CN
China
Prior art keywords
power
abnormal
bmc
server
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210463541.2A
Other languages
Chinese (zh)
Other versions
CN114816022A (en
Inventor
于淏宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202210463541.2A priority Critical patent/CN114816022B/en
Publication of CN114816022A publication Critical patent/CN114816022A/en
Application granted granted Critical
Publication of CN114816022B publication Critical patent/CN114816022B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/28Supervision thereof, e.g. detecting power-supply failure by out of limits supervision
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0778Dumping, i.e. gathering error/state information after a fault for later diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a server power supply abnormality monitoring method, a system and a storage medium, and relates to the technical field of computers. The method comprises the following steps: starting a server, and starting to power up according to a power-up time sequence; in the starting process of the server, the CPLD continuously monitors the power-on state of the server; when the power-on state is abnormal, recording power-on abnormal information and detecting the starting state of the BMC; and determining the storage position of the power-on abnormal information according to the starting state of the BMC. The invention can record the power supply state when the abnormality occurs, so as to quickly analyze and locate the cause of the problem.

Description

Method, system and storage medium for monitoring server power supply abnormality
Technical Field
The invention relates to the technical field of computers, in particular to a server power supply abnormality monitoring method, a system and a storage medium.
Background
In the big data age, the data center bears massive operational data, the deployed servers are more and more dense, and the requirements on the stability and the reliability of the servers are continuously improved. As the service time increases, the factors that cause the server to malfunction are increasing, because the server needs 24 hours to operate continuously. When a server deployed in the data center has abnormal power failure, the current power state needs to be recorded for subsequent analysis by engineers, so that the failure cause is rapidly positioned. Therefore, developing a fast, accurate and stable fault recording mechanism is a technical problem to be solved urgently by those skilled in the art.
As shown in fig. 1, the fault detection and recording mechanism adopted in the prior art is implemented based on CPLD (complex programmable logic device) and BMC (baseboard management controller) in a collocation manner. The key power supply signal of the server is connected to the CPLD through hardware, and the CPLD is sequentially pulled up or pulled down according to a power-on time sequence designed in advance when the server is started, so that the power-on process of the server is completed. During the startup and operation of the server, the CPLD continuously monitors the status of the incoming power signals, including signals such as EN, PWRGD, alert of devices such as a CPU (central processing unit), PSU (power supply), DIMM (dual in-line memory module), and intelligent network card (OCP). As shown in fig. 1, and report the real-time status to the BMC over the i2c bus. When an abnormality occurs, a technician can locate the fault reason according to the log recorded by the BMC. However, in the above-described monitoring process, there are the following problems:
(1) If an abnormality occurs in the power-on starting process, for example, if the power-on of a certain key signal power supply signal is overtime or the power is abnormally turned off, the CPLD can detect the abnormality, but the BMC may not be started successfully at this time or still be in the starting process, the abnormality state reported by the CPLD cannot be received, so that the cause of the abnormality cannot be recorded. If the fault is probability occurrence, the cause of the follow-up positioning fault is very difficult;
(2) In the running process of the server, after part of electrical signals are abnormal, BMC is hung up, so that abnormality cannot be recorded;
(3) Some clients may require shutdown to protect the device immediately after the abnormal power failure occurs, which causes that after the abnormal power failure occurs, the BMC does not acquire the power signal states from the CPLD yet, and the server shuts down, so that the BMC cannot record the abnormal power failure.
Disclosure of Invention
In order to solve at least one of the problems mentioned in the background art, the present invention provides a method, a system and a storage medium for monitoring server power supply abnormality, wherein the CPLD can record the power supply state when abnormality occurs, so as to quickly analyze and locate the cause of the problem.
The specific technical scheme provided by the embodiment of the invention is as follows:
in a first aspect, a method for monitoring server power anomalies, the method comprising:
starting a server, and starting to power up according to a power-up time sequence;
in the starting process of the server, the CPLD continuously monitors the power-on state of the server;
when the power-on state is abnormal, recording power-on abnormal information and detecting the starting state of the BMC;
and determining the storage position of the power-on abnormal information according to the starting state of the BMC.
Further, the determining the storage location of the abnormal information according to the starting state of the BMC includes:
if the BMC is started, the power-on abnormal information is stored in a first storage module so as to be read by the BMC and locate a fault reason;
if the BMC is not started, the power-on abnormal information is stored in a first flash memory module so that the BMC can read and locate the fault reason after the starting is completed.
Further, the method further comprises the following steps:
if the power-on state is not abnormal, the power-on state data are stored in the second storage module, and after the BMC is started, the power-on state data are read from the second storage module and whether the abnormality exists is checked.
Further, the method further comprises:
after the starting of the server is completed, the CPLD continuously monitors abnormal power-down information of the server;
and determining the storage position of the abnormal power-down information according to the influence of the abnormal power-down information on the BMC.
Further, the determining the storage location of the power-down abnormal information according to the influence of the abnormal power-down information on the BMC includes:
if the abnormal power failure information does not cause the BMC to hang up, the abnormal power failure information is recorded in a third storage module so as to be read by the BMC and locate a fault reason;
if the abnormal power failure information can cause the BMC to hang up, the abnormal power failure information is stored in the third storage module and the second flash memory module at the same time, so that the BMC can read and locate the fault reason after starting.
Further, if the abnormal power-off information does not cause the BMC to hang up, recording the abnormal power-off information to the third memory module, further including:
reading the abnormal power-off information through the BMC and sending a clearing instruction;
and the CPLD clears the abnormal power-down information in the third storage module and closes the server according to the clearing instruction.
Further, the method further comprises the following steps:
and if the abnormal power-off information does not appear, storing the data after the server is started in the second storage module, reading the data after the server is started from the second storage module after the BMC is started, and checking whether the server is abnormal or not.
In a second aspect, a server power anomaly monitoring system is provided, the system comprising:
the control module is used for starting the server and starting to power up according to the power-up time sequence;
the power-on monitoring module is used for continuously monitoring the power-on state of the server in the starting process of the server, recording power-on abnormal information and detecting the starting state of the BMC when the power-on state is abnormal, and determining the storage position of the power-on abnormal information according to the starting state of the BMC;
and the operation monitoring module is used for continuously monitoring the abnormal power-down information of the server after the server is started, and determining the storage position of the abnormal power-down information according to the influence of the abnormal power-down information on the BMC.
In a third aspect, a computer device is provided, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
starting a server, and starting to power up according to a power-up time sequence;
in the starting process of the server, the CPLD continuously monitors the power-on state of the server;
when the power-on state is abnormal, recording power-on abnormal information and detecting the starting state of the BMC;
and determining the storage position of the power-on abnormal information according to the starting state of the BMC.
In a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
starting a server, and starting to power up according to a power-up time sequence;
in the starting process of the server, the CPLD continuously monitors the power-on state of the server;
when the power-on state is abnormal, recording power-on abnormal information and detecting the starting state of the BMC;
and determining the storage position of the power-on abnormal information according to the starting state of the BMC.
The embodiment of the invention has the following beneficial effects:
1. according to the invention, the CPLD is used for monitoring the power supply of the server, when power-on abnormality occurs in the power-on process and abnormal power-down occurs after the server is started, abnormal server state data are recorded in the CPLD, after the BMC is restarted, the abnormal data are read from the CPLD, and fault reasons are positioned, so that the situation that the abnormal data are not recorded due to the fact that the BMC is not started or is still in the starting process is prevented, the time required for the problem reproduction is reduced, and the fault reason positioning and fault processing efficiency is improved;
2. the BMC is not started in the power-on process, or is suspended when abnormal power failure occurs in the running process of the server, at the moment, the CPLD records abnormal state data in the first flash memory module and the second flash memory module, and the abnormal data are read from the UFM by the BMC after the server is restarted after power failure due to the non-volatility of the first flash memory module and the second flash memory module.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram for embodying fault detection in the background art;
FIG. 2 is a schematic overall structure for embodying the monitoring method in the present application;
FIG. 3 is a schematic illustration of a specific flow for embodying the monitoring method in the present application
Fig. 4 is an internal structural diagram of a computer device used to embody the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the big data age, the data center bears massive operational data, the deployed servers are more and more dense, and the requirements on the stability and the reliability of the servers are continuously improved. As the service time increases, the factors that cause the server to malfunction are increasing, because the server needs 24 hours to operate continuously. When a server deployed in the data center has abnormal power failure, the current power state needs to be recorded for subsequent analysis by engineers, so that the failure cause is rapidly positioned. Therefore, developing a fast, accurate and stable fault recording mechanism is a technical problem to be solved urgently by those skilled in the art. In the existing monitoring process, the following problems exist: if an abnormality occurs in the power-on starting process, for example, if the power-on of a certain key signal power supply signal is overtime or the power is abnormally turned off, the CPLD can detect the abnormality, but the BMC may not be started successfully at this time or still be in the starting process, the abnormality state reported by the CPLD cannot be received, so that the cause of the abnormality cannot be recorded. If the fault is probability occurrence, the cause of the follow-up positioning fault is very difficult; in the running process of the server, after part of electrical signals are abnormal, BMC is hung up, so that abnormality cannot be recorded; some clients may require shutdown to protect the device immediately after the abnormal power failure occurs, which causes that after the abnormal power failure occurs, the BMC does not acquire the power signal states from the CPLD yet, and the server shuts down, so that the BMC cannot record the abnormal power failure. Based on the above problems, the application provides a method, a system and a storage medium for monitoring server power supply abnormality, which can record the power supply state when abnormality occurs, so as to quickly analyze and locate the cause of the problem.
Example 1
The server power abnormality monitoring method, as shown in fig. 2 and 3, comprises the following steps:
step S1: starting a server, and starting to power up according to a power-up time sequence; in the starting process of the server, the CPLD continuously monitors the power-on state of the server; .
The key power supply signals of the server are sequentially connected to the CPLD through hardware, when the server is started, the CPLD sequentially pulls up or pulls down according to a pre-designed power-on time sequence so as to finish the power-on current process of the server, and specifically, the EN signals and PG signals of a CPU in the server are connected into the CPLD; the PWROK signal and PG signal on PSU are connected into CPLD; the PWRGD signal on the DIMM is connected into the CPLD; the EN signal and the PWRGD signal on the OCP are connected into the CPLD, and the CPLD continuously monitors the connected power signal state and records related data.
Step S2: when the power-on state is abnormal, recording power-on abnormal information and detecting the starting state of the BMC; and determining the storage position of the power-on abnormal information according to the starting state of the BMC.
When the power-on state is abnormal, the CPLD judges whether the BMC is started or not according to a heartbeat signal of the BMC when power-on timeout or abnormal power-off occurs in a specific power-on process; if the BMC is started, the power-on abnormal information is recorded in the first storage module, after the BMC is started, the BMC reads the power-on abnormal information in the first storage module through the i2C bus, and after the reading is completed, the CPLD controls the server to power down and power off, so that the power-off protection of the server is realized. Wherein the first memory module includes, but is not limited to, a FIFO module, and the first memory module is disposed in the CPLD.
At this time, the data stored in the first storage module includes at least: the CPLD can judge the abnormal position according to the data in the first storage module to determine the cause of the fault problem. Specifically, the first preset time period is any time period of 1 to 200 microseconds.
If the BMC is not started, the power-on abnormal information is recorded in the first flash memory module. The first flash memory module comprises, but is not limited to, a UFM module, the first flash memory module is arranged in the CPLD, the server is powered off and restarted, after the server is powered off and restarted and the BMC is started, the BMC reads power-on abnormal information from the UFM module, and analyzes and locates the fault reason.
If the power-on state is not abnormal, the power-on state data are stored in the second storage module, after the BMC is started, the BMC reads the server state data in the second storage module through the i2C bus and checks whether the power-on process is abnormal or not so as to judge the state of the server power supply. Wherein the second memory module includes, but is not limited to, a FIFO module, the FIFO being disposed in the CPLD.
After the power-on process is successfully completed, the server starts to operate, and when an abnormality occurs in the operation process, the following steps are started.
Step S3: after the starting of the server is completed, the CPLD continuously monitors abnormal power-down information of the server; and determining the storage position of the abnormal power-down information according to the influence of the abnormal power-down information on the BMC.
If the running state of the server is abnormal, and particularly abnormal power failure is possible, the CPLD receives abnormal power failure information, wherein the abnormal power failure information comprises an abnormal signal, and judges whether the abnormal power failure information can cause BMC to hang up; if the abnormal signal does not cause the BMC to hang up, recording abnormal power-down information in a third storage module; wherein the third storage module includes, but is not limited to, a FIFO module, and the data recorded in the FIFO module at this time includes at least: the power state data before the abnormal operation, the power state data at the moment of the abnormal operation and the power state data in a second preset time period after the abnormal operation.
At this time, the BMC reads the server status data of the abnormal operation through the i2C bus, and issues a clearing instruction, and the CPLD receives the clearing instruction, and executes the server status data operation for clearing the abnormal operation in the third storage module and the server closing operation, so as to implement the power-down protection of the server.
If the abnormal signal can cause the BMC to hang up, the abnormal power failure information is recorded in the third storage module and the second flash memory module at the same time so as to read and locate the fault reason after the BMC is started. Specifically, the CPLD records the abnormal power-down information in the UFM, performs power-down restarting on the server, reads the abnormal power-down information from the UFM after the server is powered down and restarted and after the BMC is started, and analyzes and locates the cause of the fault.
If abnormal power-off information does not occur in the running process of the server, the data after the server is started are stored in the second storage module, after the BMC is started, the data after the server is started are read from the second storage module, and whether the server is abnormal or not is checked.
Through the arrangement, the first storage module, the second storage module, the third storage module, the first flash memory module and the second flash memory module in the CPLD are utilized to temporarily store abnormal states in the power-on state and the running state, so that a technician can quickly locate the fault cause by utilizing the information after recovering the BMC, the time required for the problem reproduction is reduced, and the efficiency of locating the fault cause is improved.
Example two
Corresponding to the above embodiment, the present application provides a server power abnormality monitoring system, including:
the control module is used for starting the server and starting to power up according to the power-up time sequence;
the power-on monitoring module is used for continuously monitoring the power-on state of the server in the starting process of the server, recording power-on abnormal information and detecting the starting state of the BMC when the power-on state is abnormal, and determining the storage position of the power-on abnormal information according to the starting state of the BMC;
the operation monitoring module is used for continuously monitoring abnormal power-off information of the server by the CPLD after the server is started, and determining the storage position of the abnormal power-off information according to the influence of the abnormal power-off information on the BMC;
the first verification module is used for storing the power-on state data in the second storage module when no abnormality occurs in the power-on state, reading the power-on state data from the second storage module after the BMC is started and verifying whether the abnormality exists or not;
and the second checking module is used for storing the data after the server is started in the second storage module when the abnormal power-off information does not appear, reading the data after the server is started from the second storage module after the BMC is started, and checking whether the server is abnormal or not.
In a preferred embodiment, the power-on monitoring module is further configured to determine, by the CPLD, whether the BMC is started according to a heartbeat signal of the BMC when an abnormality occurs in the power-on state; if the BMC is started, the power-on abnormal information is recorded in the first storage module, after the BMC is started, the BMC reads the power-on abnormal information in the first storage module through the i2C bus, and after the reading is completed, the CPLD controls the server to power down and power off, so that the power-off protection of the server is realized. If the BMC is not started, the power-on abnormal information is recorded in the first flash memory module. The first flash memory module comprises, but is not limited to, a UFM module, the first flash memory module is arranged in the CPLD, the server is powered off and restarted, after the server is powered off and restarted and the BMC is started, the BMC reads power-on abnormal information from the UFM module, and analyzes and locates the fault reason.
In a preferred embodiment, if the power-on state is not abnormal, the power-on state data is stored in the second storage module, after the BMC is started, the BMC reads the server state data in the second storage module through the i2C bus, and checks whether the power-on process is abnormal or not, so as to judge the state of the server power supply. Wherein the second memory module includes, but is not limited to, a FIFO module, the FIFO being disposed in the CPLD.
In a preferred embodiment, the data stored in the first memory module includes at least: the data stored in the first memory module includes at least: the CPLD can judge the abnormal position according to the data in the first storage module to determine the cause of the fault problem.
In a preferred embodiment, the operation monitoring module is further configured to determine, when an abnormality occurs in the operation state, whether the abnormal signal causes the BMC to hang up; if the abnormal signal does not cause the BMC to hang up, recording abnormal power-down information in a third storage module; the BMC reads the abnormal power-down information through the i2C bus and sends out a clearing instruction, the CPLD receives the clearing instruction and executes server state data operation for clearing abnormal operation in the third storage module and server closing operation so as to realize power-down protection of the server. If the abnormal signal can cause the BMC to hang up, server state data running abnormally are recorded in the third storage module and the second flash memory module at the same time, and after the BMC is started, the fault reason is read and positioned. The CPLD records the abnormal power-down information in the UFM, performs power-down restarting on the server, reads the abnormal power-down information from the UFM after the server is powered down and restarted and the BMC is started, and analyzes and locates the fault reason.
In a preferred embodiment, the data recorded in the third storage module at this time includes at least: the power state data before the abnormal operation, the power state data at the moment of the abnormal operation and the power state data in a second preset time period after the abnormal operation.
Example III
There is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the computer program implementing the steps of:
step 101: starting a server, and starting to power up according to a power-up time sequence; the method comprises the steps of carrying out a first treatment on the surface of the
Step 102: in the starting process of the server, the CPLD continuously monitors the power-on state of the server;
step 103: when the power-on state is abnormal, recording power-on abnormal information and detecting the starting state of the BMC;
determining the storage position of the power-on abnormal information according to the starting state of the BMC;
step 104: after the starting of the server is completed, the CPLD continuously monitors abnormal power-down information of the server;
and determining the storage position of the abnormal power-down information according to the influence of the abnormal power-down information on the BMC.
In a preferred embodiment, step 103 further includes determining whether the BMC is started when the power-on status is abnormal; if the BMC is started, the starting abnormal information is recorded in the first storage module, after the BMC is started, the BMC reads the power-on abnormal information in the first storage module through the i2C bus to locate the fault reason, and after the reading is completed, the CPLD controls the server to be powered off and shut down, so that the power-off protection of the server is realized.
If the BMC is not started, the power-on abnormal information is recorded in the first flash memory module, the server is powered off and restarted, after the server is powered off and restarted and the BMC is started, the BMC reads the power-on abnormal information from the UFM, and the reason of the positioning fault is analyzed.
In a preferred embodiment, step 104 further includes determining whether the abnormal power-off condition causes the BMC to hang up when the running state is abnormal after the server is started; if the abnormal power-down information does not cause the BMC to hang up, the abnormal power-down information is recorded in the third storage module; the BMC reads the abnormal power-down information through the i2C bus and sends out a clearing instruction, the CPLD receives the clearing instruction and executes server state data operation for clearing abnormal operation in the third storage module and server closing operation so as to realize power-down protection of the server.
If the BMC is suspended due to abnormal power failure information, server state data running abnormally are recorded in the third storage module and the second flash memory module at the same time. The CPLD records abnormal power failure information in the UFM, performs power failure restarting on the server, reads server state data of abnormal operation from the UFM after the server is powered off and restarted and after the BMC is started, and analyzes and locates fault reasons.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a flash memory module storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing abnormal data in the power-on process and the running process.
The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method for monitoring server power anomalies.
Those skilled in the art will appreciate that the structures shown in FIG. 4 are block diagrams only and do not constitute a limitation of the computer device on which the present aspects apply, and that a particular computer device may include more or less components than those shown, or may combine some of the components, or have a different arrangement of components.
Example IV
In one embodiment, a computer readable storage medium having a computer program stored thereon, the computer program when executed by a processor performing the steps of:
step 201: starting a server, and starting to power up according to a power-up time sequence;
step 202: in the starting process of the server, the CPLD continuously monitors the power-on state of the server;
step 203: when the power-on state is abnormal, recording power-on abnormal information and detecting the starting state of the BMC;
determining the storage position of the power-on abnormal information according to the starting state of the BMC;
step 204: after the starting of the server is completed, the CPLD continuously monitors abnormal power-down information of the server;
and determining the storage position of the abnormal power-down information according to the influence of the abnormal power-down information on the BMC.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (7)

1. A method for monitoring server power anomalies, the method comprising:
starting a server, and starting to power up according to a power-up time sequence;
in the starting process of the server, the CPLD continuously monitors the power-on state of the server;
when the power-on state is abnormal, recording power-on abnormal information and detecting the starting state of the BMC;
determining the storage position of the power-on abnormal information according to the starting state of the BMC;
the determining the storage location of the abnormal information according to the starting state of the BMC specifically includes:
if the BMC is started, the power-on abnormal information is stored in a first storage module so as to be read by the BMC and locate a fault reason; if the BMC is not started, the power-on abnormal information is stored in a first flash memory module so that the BMC can read and locate fault reasons after the starting is completed;
the data stored in the first storage module at least comprises: power state data before power-on abnormality occurs, power state data at the moment of power-on abnormality occurs, and power state data in a first preset time period after power-on abnormality occurs;
after the starting of the server is completed, the CPLD continuously monitors abnormal power-down information of the server; determining a storage position of the abnormal power-down information according to the influence of the abnormal power-down information on the BMC;
the determining the storage location of the abnormal power down information according to the influence of the abnormal power down information on the BMC specifically includes:
if the abnormal power failure information does not cause the BMC to hang up, the abnormal power failure information is recorded in a third storage module so as to be read by the BMC and locate a fault reason;
if the abnormal power failure information can cause the BMC to hang up, the abnormal power failure information is stored in the third storage module and the second flash memory module at the same time, so that the fault reason can be read and positioned after the BMC is started.
2. The server power anomaly monitoring method of claim 1, further comprising:
if the power-on state is not abnormal, the power-on state data are stored in the second storage module, and after the BMC is started, the power-on state data are read from the second storage module and whether the abnormality exists is checked.
3. The method for monitoring abnormal power supply of a server according to claim 1, wherein if the abnormal power-off information does not cause the BMC to hang up, recording the abnormal power-off information in a third storage module, further comprising:
reading the abnormal power-off information through the BMC and sending a clearing instruction;
and the CPLD clears the abnormal power-down information in the third storage module and closes the server according to the clearing instruction.
4. The server power anomaly monitoring method of claim 3, further comprising:
and if the abnormal power-off information does not appear, storing the data after the server is started in the second storage module, reading the data after the server is started from the second storage module after the BMC is started, and checking whether the server is abnormal or not.
5. A server power anomaly monitoring system, the system comprising:
the control module is used for starting the server and starting to power up according to the power-up time sequence;
the power-on monitoring module is used for continuously monitoring the power-on state of the server in the starting process of the server, recording power-on abnormal information and detecting the starting state of the BMC when the power-on state is abnormal, and determining the storage position of the power-on abnormal information according to the starting state of the BMC;
the operation monitoring module is used for continuously monitoring abnormal power-down information of the server after the server is started, and determining the storage position of the abnormal power-down information according to the influence of the abnormal power-down information on the BMC;
the operation monitoring module is used for storing the power-on abnormal information in the first storage module if the BMC is started so as to be read by the BMC and locate a fault reason; if the BMC is not started, the power-on abnormal information is stored in a first flash memory module so that the BMC can read and locate fault reasons after the starting is completed;
the data stored in the first storage module at least comprises: power state data before power-on abnormality occurs, power state data at the moment of power-on abnormality occurs, and power state data in a first preset time period after power-on abnormality occurs;
the operation monitoring module is also used for continuously monitoring abnormal power-down information of the server by the CPLD after the starting of the server is completed; determining a storage position of the abnormal power-down information according to the influence of the abnormal power-down information on the BMC;
the determining the storage location of the abnormal power down information according to the influence of the abnormal power down information on the BMC specifically includes: if the abnormal power failure information does not cause the BMC to hang up, the abnormal power failure information is recorded in a third storage module so as to be read by the BMC and locate a fault reason; if the abnormal power failure information can cause the BMC to hang up, the abnormal power failure information is stored in the third storage module and the second flash memory module at the same time, so that the fault reason can be read and positioned after the BMC is started.
6. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1-4 when the computer program is executed by the processor.
7. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1-4.
CN202210463541.2A 2022-04-28 2022-04-28 Method, system and storage medium for monitoring server power supply abnormality Active CN114816022B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210463541.2A CN114816022B (en) 2022-04-28 2022-04-28 Method, system and storage medium for monitoring server power supply abnormality

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210463541.2A CN114816022B (en) 2022-04-28 2022-04-28 Method, system and storage medium for monitoring server power supply abnormality

Publications (2)

Publication Number Publication Date
CN114816022A CN114816022A (en) 2022-07-29
CN114816022B true CN114816022B (en) 2023-08-04

Family

ID=82509324

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210463541.2A Active CN114816022B (en) 2022-04-28 2022-04-28 Method, system and storage medium for monitoring server power supply abnormality

Country Status (1)

Country Link
CN (1) CN114816022B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117149548B (en) * 2023-09-07 2024-04-26 上海合芯数字科技有限公司 Method and device for measuring time sequence of server system, electronic equipment and storage medium
CN117008704B (en) * 2023-09-27 2023-12-01 天固信息安全系统(深圳)有限公司 Control method and device based on EC or CPLD, storage medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066356A (en) * 2017-05-17 2017-08-18 郑州云海信息技术有限公司 A kind of storage method of server B MC configuration datas
CN108304299A (en) * 2018-03-02 2018-07-20 郑州云海信息技术有限公司 Server power-up state monitors system and method, computer storage and equipment
CN111258405A (en) * 2020-01-18 2020-06-09 苏州浪潮智能科技有限公司 Server mainboard burning prevention system and method
CN112948185A (en) * 2021-02-26 2021-06-11 浪潮电子信息产业股份有限公司 Server heat dissipation method and device and related components
WO2022078013A1 (en) * 2020-10-16 2022-04-21 苏州浪潮智能科技有限公司 Server power-down detection method and system, and device and medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201417536A (en) * 2012-10-24 2014-05-01 Hon Hai Prec Ind Co Ltd Method and system for automatically managing servers

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066356A (en) * 2017-05-17 2017-08-18 郑州云海信息技术有限公司 A kind of storage method of server B MC configuration datas
CN108304299A (en) * 2018-03-02 2018-07-20 郑州云海信息技术有限公司 Server power-up state monitors system and method, computer storage and equipment
CN111258405A (en) * 2020-01-18 2020-06-09 苏州浪潮智能科技有限公司 Server mainboard burning prevention system and method
WO2022078013A1 (en) * 2020-10-16 2022-04-21 苏州浪潮智能科技有限公司 Server power-down detection method and system, and device and medium
CN112948185A (en) * 2021-02-26 2021-06-11 浪潮电子信息产业股份有限公司 Server heat dissipation method and device and related components

Also Published As

Publication number Publication date
CN114816022A (en) 2022-07-29

Similar Documents

Publication Publication Date Title
CN114816022B (en) Method, system and storage medium for monitoring server power supply abnormality
US20240012706A1 (en) Method, system and apparatus for fault positioning in starting process of server
CN104850485A (en) BMC based method and system for remote diagnosis of server startup failure
CN110609778A (en) Method and system for storing server downtime log
CN103631685A (en) Fault self-inspection system and method
CN110445638B (en) Switch system fault protection method and device
CN104320308A (en) Method and device for detecting anomalies of server
CN110457907B (en) Firmware program detection method and device
CN116820820A (en) Server fault monitoring method and system
US11263083B1 (en) Method and apparatus for selective boot-up in computing devices
JP6880961B2 (en) Information processing device and log recording method
CN112463516A (en) Method and system for collecting and verifying integrity of BMC log
CN116501705A (en) RAS-based memory information collecting and analyzing method, system, equipment and medium
CN111400153A (en) Serial port log starting method and device and computer readable storage medium
CN115098291A (en) Method, system, storage medium and equipment for recording system restart reason
CN113608603A (en) Method, system, equipment and storage medium for repairing PCIe fault equipment
US20100162082A1 (en) Control device, storage apparatus and controlling method
CN109491872B (en) Memory supervision method and device and computer readable storage medium
CN117112273A (en) Fault state management and control method, device, equipment and medium
CN118193466A (en) Log management method and device, storage medium and embedded equipment
CN108415788B (en) Data processing apparatus and method for responding to non-responsive processing circuitry
CN115639969B (en) Storage disk main-standby switching method and device and computer equipment
CN117687821A (en) Method and device for processing bad blocks of cache memory and electronic equipment
CN109815064B (en) Node isolation method, node isolation device, node equipment and computer readable storage medium
CN117234771A (en) Fault memory positioning method, system, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant