CN114816022A - Server power supply abnormity monitoring method, system and storage medium - Google Patents

Server power supply abnormity monitoring method, system and storage medium Download PDF

Info

Publication number
CN114816022A
CN114816022A CN202210463541.2A CN202210463541A CN114816022A CN 114816022 A CN114816022 A CN 114816022A CN 202210463541 A CN202210463541 A CN 202210463541A CN 114816022 A CN114816022 A CN 114816022A
Authority
CN
China
Prior art keywords
power
abnormal
server
bmc
power failure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210463541.2A
Other languages
Chinese (zh)
Other versions
CN114816022B (en
Inventor
于淏宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202210463541.2A priority Critical patent/CN114816022B/en
Publication of CN114816022A publication Critical patent/CN114816022A/en
Application granted granted Critical
Publication of CN114816022B publication Critical patent/CN114816022B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/28Supervision thereof, e.g. detecting power-supply failure by out of limits supervision
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0778Dumping, i.e. gathering error/state information after a fault for later diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a server power supply abnormity monitoring method, a system and a storage medium, and relates to the technical field of computers. The method comprises the following steps: starting a server, and starting power-on according to a power-on sequence; in the starting process of the server, the CPLD continuously monitors the power-on state of the server; when the power-on state is abnormal, recording power-on abnormal information and detecting the starting state of the BMC; and determining the storage position of the power-on abnormal information according to the starting state of the BMC. The invention can record the power supply state when abnormality occurs so as to rapidly analyze and locate the cause of the problem in the following.

Description

Server power supply abnormity monitoring method, system and storage medium
Technical Field
The invention relates to the technical field of computers, in particular to a server power supply abnormity monitoring method, a server power supply abnormity monitoring system and a storage medium.
Background
In the big data era, a data center bears massive operational data, the deployed servers are increasingly dense, and the requirements on the stability and the reliability of the servers are continuously improved. Because the server needs 24 hours to operate continuously, the factors causing the server to break down are increasing along with the increase of the service time. When the server deployed in the data center has an abnormal power failure, the current power state needs to be recorded for subsequent analysis by an engineer, so as to quickly locate the cause of the failure. Therefore, it is an urgent technical problem to be solved by those skilled in the art to develop a fast, accurate and stable fault recording mechanism.
As shown in fig. 1, the fault detection and recording mechanism adopted in the prior art is implemented based on a CPLD (complex programmable logic device) and a BMC (baseboard management controller) in a matching manner. The key power supply signal of the server is connected to the CPLD through hardware, and the CPLD is sequentially pulled up or pulled down according to a pre-designed power-on time sequence when the server is started, so that the power-on process of the server is completed. In the process of starting and operating the server, the CPLD can continuously monitor the state of the incoming power signal, including signals such as EN, PWRGD, Alert and the like of devices such as a CPU (central processing unit), a PSU (power supply unit), a DIMM (dual in-line memory module), an intelligent network card (OCP) and the like. As shown in fig. 1, and reports the real-time status to the BMC via the i2c bus. When the abnormality occurs, a technician can locate the fault reason according to the log recorded by the BMC. However, in the above monitoring process, there are the following problems:
(1) if an abnormality occurs in the power-on startup process, such as power-on timeout or abnormal power failure of a certain key signal power supply signal, the CPLD can detect the abnormality, but the BMC may not be started successfully or still be in the starting process at this time, and the abnormal state reported by the CPLD cannot be received, so that the reason for the abnormality cannot be recorded. If the fault occurs probabilistically, the subsequent positioning of the fault reason is very difficult;
(2) in the running process of the server, the BMC is hung up after part of electric signals are abnormal, and further the abnormality cannot be recorded;
(3) some clients will require to be immediately powered off after abnormal power failure occurs to protect the equipment, so that the BMC cannot record the abnormality because the server is powered off and the power failure occurs before the BMC acquires the power signal states from the CPLD.
Disclosure of Invention
In order to solve at least one of the problems mentioned in the background art, the present invention provides a server power supply abnormality monitoring method, system and storage medium, wherein a CPLD can record the power supply state when an abnormality occurs, so as to quickly analyze and locate the cause of the problem in the following.
The embodiment of the invention provides the following specific technical scheme:
in a first aspect, a server power supply abnormality monitoring method includes:
starting a server, and starting power-on according to a power-on sequence;
in the starting process of the server, the CPLD continuously monitors the power-on state of the server;
when the power-on state is abnormal, recording power-on abnormal information and detecting the starting state of the BMC;
and determining the storage position of the power-on abnormal information according to the starting state of the BMC.
Further, the determining the storage location of the abnormal information according to the start state of the BMC includes:
if the BMC is started, storing the power-on abnormal information in a first storage module so that the BMC can read and locate a fault reason;
and if the BMC is not started, storing the power-on abnormal information in a first flash memory module so that the BMC can read and locate the fault reason after starting.
Further, the method also comprises the following steps:
and if the power-on state is not abnormal, storing the power-on state data in a second storage module, reading the power-on state data from the second storage module after the BMC is started, and checking whether the abnormality exists.
Further, the method further comprises:
after the server is started, the CPLD continuously monitors abnormal power failure information of the server;
and determining the storage position of the abnormal power failure information according to the influence of the abnormal power failure information on the BMC.
Further, the determining the storage location of the abnormal power failure information according to the influence of the abnormal power failure information on the BMC includes:
if the abnormal power failure information does not cause the BMC to be hung up, recording the abnormal power failure information in a third storage module for the BMC to read and locate a fault reason;
and if the abnormal power failure information can cause the BMC to be hung, the abnormal power failure information is simultaneously stored in the third storage module and the second flash memory module so that the BMC can read and locate the fault reason after starting.
Further, if the abnormal power failure information does not cause the BMC to be hung up, recording the abnormal power failure information in a third storage module, further comprising:
reading the abnormal power failure information and sending a clearing instruction through the BMC;
and the CPLD can clear the abnormal power failure information in the third storage module and close the server according to the clearing instruction.
Further, the method also comprises the following steps:
and if the abnormal power failure information does not appear, storing the data after the server is started in a second storage module, reading the data after the server is started from the second storage module after the BMC is started, and checking whether the abnormality exists.
In a second aspect, a server power supply abnormality monitoring system is provided, the system including:
the control module is used for starting the server and starting power-on according to a power-on sequence;
the power-on monitoring module is used for continuously monitoring the power-on state of the server by the CPLD in the starting process of the server, recording power-on abnormal information and detecting the starting state of the BMC when the power-on state is abnormal, and determining the storage position of the power-on abnormal information according to the starting state of the BMC;
and the operation monitoring module is used for continuously monitoring abnormal power failure information of the server by the CPLD after the server is started, and determining the storage position of the abnormal power failure information according to the influence of the abnormal power failure information on the BMC.
In a third aspect, a computer device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the following steps when executing the computer program:
starting a server, and starting power-on according to a power-on sequence;
in the starting process of the server, the CPLD continuously monitors the power-on state of the server;
when the power-on state is abnormal, recording power-on abnormal information and detecting the starting state of the BMC;
and determining the storage position of the power-on abnormal information according to the starting state of the BMC.
In a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
starting a server, and starting power-on according to a power-on sequence;
in the starting process of the server, the CPLD continuously monitors the power-on state of the server;
when the power-on state is abnormal, recording power-on abnormal information and detecting the starting state of the BMC;
and determining the storage position of the power-on abnormal information according to the starting state of the BMC.
The embodiment of the invention has the following beneficial effects:
1. according to the invention, the power supply of the server is monitored by the CPLD, when abnormal power-on occurs in the power-on process and abnormal power failure occurs after the server is started, abnormal server state data are recorded in the CPLD, and after the BMC is restarted, the abnormal data are read from the CPLD, and the fault reason is positioned, so that the condition that the abnormal data are not recorded because the BMC is not started or is still in the starting process is prevented, the time required by the recurrence problem is reduced, and the fault reason positioning and fault processing efficiency are improved;
2. the BMC is not started in the power-on process or the BMC is hung up when abnormal power failure occurs in the operation process of the server, and at the moment, the CPLD records abnormal state data in the first flash memory module and the second flash memory module, so that after the server is powered off and restarted, the BMC reads the abnormal data from the UFM due to the non-volatility of the first flash memory module and the second flash memory module.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram for embodying fault detection in the background art;
FIG. 2 is a schematic diagram of the overall architecture for embodying the monitoring method of the present application;
FIG. 3 is a detailed flow chart for embodying the monitoring method in the present application
Fig. 4 is an internal structural diagram of a computer apparatus for embodying the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the big data era, a data center bears massive operational data, the deployed servers are increasingly dense, and the requirements on the stability and the reliability of the servers are continuously improved. Because the server needs 24 hours to operate continuously, the factors causing the server to break down are increasing along with the increase of the service time. When the server deployed in the data center has an abnormal power failure, the current power state needs to be recorded for subsequent analysis by an engineer, so as to quickly locate the cause of the failure. Therefore, it is an urgent technical problem to be solved by those skilled in the art to develop a fast, accurate and stable fault recording mechanism. The following problems exist in the existing monitoring process: if an abnormality occurs in the power-on startup process, such as power-on timeout or abnormal power failure of a certain key signal power supply signal, the CPLD can detect the abnormality, but the BMC may not be started successfully or still be in the starting process at this time, and the abnormal state reported by the CPLD cannot be received, so that the reason for the abnormality cannot be recorded. If the fault occurs probabilistically, the subsequent positioning of the fault reason is very difficult; in the running process of the server, the BMC is hung up after part of electric signals are abnormal, and further the abnormality cannot be recorded; some clients will require to be immediately powered off after abnormal power failure occurs to protect the equipment, so that the BMC cannot record the abnormality because the server is powered off and the power failure occurs before the BMC acquires the power signal states from the CPLD. Based on the above problems, the present application provides various server power supply abnormality monitoring methods, systems and storage media, which can record the power supply state when an abnormality occurs, so as to quickly analyze and locate the cause of the problem in the following.
Example one
A server power supply abnormality monitoring method, as shown in fig. 2 and 3, includes the following steps:
step S1: starting a server, and starting power-on according to a power-on sequence; in the starting process of the server, the CPLD continuously monitors the power-on state of the server; .
Key power supply signals of the server are sequentially connected to the CPLD through hardware, when the server is started, the CPLD is sequentially pulled up or pulled down according to a pre-designed power-on time sequence so as to complete a power-on process of the server during starting, and specifically, an EN signal and a PG signal of a CPU in the server are accessed into the CPLD; the PWROK signal and the PG signal on the PSU are accessed into the CPLD; the PWRGD signal on the DIMM is accessed into the CPLD; the EN signal and the PWRGD signal on the OCP are accessed into the CPLD, and the CPLD can continuously monitor the state of the accessed power supply signal and record related data.
Step S2: when the power-on state is abnormal, recording power-on abnormal information and detecting the starting state of the BMC; and determining the storage position of the power-on abnormal information according to the starting state of the BMC.
When the power-on state is abnormal, the specific power-on process is overtime or abnormal power failure, the CPLD judges whether the BMC is started or not according to the heartbeat signal of the BMC; if the BMC is started, the power-on abnormal information is recorded in the first storage module, after the BMC is started, the BMC reads the power-on abnormal information in the first storage module through the i2C bus, and after the reading is finished, the CPLD controls the server to be powered off, so that power-off protection of the server is achieved. Wherein, the first storage module includes but is not limited to FIFO module, and the first storage module is arranged in CPLD.
At this time, the data stored in the first storage module at least includes: the CPLD can judge the position of the abnormality according to the data in the first storage module so as to determine the cause of the fault problem. Specifically, the first preset time period is any time period within 1-200 microseconds.
If the BMC is not started, the power-on abnormal information is recorded in the first flash memory module. The first flash memory module comprises but is not limited to a UFM module, the first flash memory module is arranged in a CPLD, the server is powered off and restarted, after the server is powered off and restarted and a BMC is started, the BMC reads power-on abnormal information from the UFM and analyzes and positions the fault reason.
If the power-on state is not abnormal, the power-on state data is stored in the second storage module, after the BMC is started, the BMC reads the server state data in the second storage module through the i2C bus, and checks whether the power-on process is abnormal or not so as to judge the state of the server power supply. Wherein, the second storage module includes but is not limited to a FIFO module, and the FIFO is arranged in the CPLD.
And after the power-on process is successfully completed, the server starts to operate, and when abnormality occurs in the operation process, the following steps are started.
Step S3: after the server is started, the CPLD continuously monitors abnormal power failure information of the server; and determining the storage position of the abnormal power failure information according to the influence of the abnormal power failure information on the BMC.
If the running state of the server is abnormal, specifically, when abnormal power failure possibly occurs, the CPLD receives abnormal power failure information, the abnormal power failure information comprises an abnormal signal, and judges whether the abnormal power failure information can cause the BMC to be hung; if the abnormal signal can not cause the BMC to be hung, recording abnormal power failure information into a third storage module; wherein, the third storage module includes but is not limited to a FIFO module, and the data recorded in the FIFO module at this time at least includes: the power supply state data before the abnormal operation occurs, the power supply state data at the moment of the abnormal operation, and the power supply state data in a second preset time period after the abnormal operation occurs.
At this time, the BMC reads the server state data with abnormal operation through the i2C bus and sends a clear instruction, and the CPLD receives the clear instruction and performs an operation of clearing the server state data with abnormal operation in the third storage module and a server operation shutdown operation, so as to implement power failure protection on the server.
If the abnormal signal can cause the BMC to be hung, the abnormal power failure information is recorded in the third storage module and the second flash memory module simultaneously so as to read and locate the fault reason after the BMC is started. Specifically, the CPLD records the abnormal power failure information into the UFM, performs power failure restart on the server, and after the server is powered off and restarted and the BMC is started, the BMC reads the abnormal power failure information from the UFM and analyzes and positions the fault reason.
And if the abnormal power failure information does not appear in the running process of the server, storing the data after the server is started in the second storage module, reading the data after the server is started from the second storage module after the BMC is started, and checking whether the abnormality exists.
Through the arrangement, the first storage module, the second storage module, the third storage module, the first flash memory module and the second flash memory module in the CPLD are used for temporarily storing abnormal states in the power-on and running states, so that technicians can quickly locate fault reasons by using the information after the BMC is recovered, the time required by recurring problems is shortened, and the efficiency of locating the fault reasons is improved.
Example two
Corresponding to the foregoing embodiment, the present application provides a server power supply abnormality monitoring system, including:
the control module is used for starting the server and starting power-on according to a power-on time sequence;
the power-on monitoring module is used for continuously monitoring the power-on state of the server by the CPLD in the starting process of the server, recording power-on abnormal information and detecting the starting state of the BMC when the power-on state is abnormal, and determining the storage position of the power-on abnormal information according to the starting state of the BMC;
the operation monitoring module is used for continuously monitoring abnormal power failure information of the server by the CPLD after the server is started, and determining the storage position of the abnormal power failure information according to the influence of the abnormal power failure information on the BMC;
the first checking module is used for storing the power-on state data in the second storage module when the power-on state is not abnormal, reading the power-on state data from the second storage module after the BMC is started and checking whether the abnormality exists;
and the second check module is used for storing the data after the server is started in the second storage module when the abnormal power failure information does not appear, reading the data after the server is started from the second storage module after the BMC is started, and checking whether the abnormality exists.
In a preferred embodiment, the power-on monitoring module is further configured to, when the power-on state is abnormal, the CPLD determines whether the BMC is started according to a heartbeat signal of the BMC; if the BMC is started, the power-on abnormal information is recorded in the first storage module, after the BMC is started, the BMC reads the power-on abnormal information in the first storage module through the i2C bus, and after the reading is completed, the CPLD controls the server to be powered off and shut down, so that power-off protection of the server is realized. If the BMC is not started, the power-on abnormal information is recorded in the first flash memory module. The first flash memory module comprises but is not limited to a UFM module, the first flash memory module is arranged in a CPLD, the server is powered off and restarted, after the server is powered off and restarted and a BMC is started, the BMC reads power-on abnormal information from the UFM and analyzes and positions the fault reason.
In a preferred embodiment, if the power-on state is not abnormal, the power-on state data is stored in the second storage module, and after the BMC is started, the BMC reads the server state data in the second storage module through the i2C bus and checks whether there is an abnormality in the power-on process, so as to determine the state of the server power supply. Wherein, the second storage module includes but is not limited to a FIFO module, and the FIFO is arranged in the CPLD.
In a preferred embodiment, the data stored in the first storage module at least comprises: the data stored in the first storage module at least comprises: the CPLD can judge the position of the abnormality according to the data in the first storage module so as to determine the cause of the fault problem.
In a preferred embodiment, the operation monitoring module is further configured to determine whether the abnormal signal causes the BMC to hang up when the operation state is abnormal; if the abnormal signal can not cause the BMC to be hung, recording abnormal power failure information into a third storage module; the BMC reads abnormal power failure information through the i2C bus and sends out a clearing instruction, and the CPLD receives the clearing instruction and executes operation of clearing abnormal server state data in the third storage module and operation of closing the server so as to realize power failure protection of the server. If the abnormal signal can cause the BMC to be hung, the server state data with abnormal operation is simultaneously recorded in the third storage module and the second flash memory module so as to read and locate the fault reason after the BMC is started. And the CPLD records the abnormal power failure information into the UFM, performs power failure restart on the server, and after the server is powered off and restarted and the BMC is started, the BMC reads the abnormal power failure information from the UFM and analyzes and positions the fault reason.
In a preferred embodiment, the data recorded in the third storage module at this time at least includes: the power supply state data before the abnormal operation occurs, the power supply state data at the moment of the abnormal operation, and the power supply state data in a second preset time period after the abnormal operation occurs.
EXAMPLE III
There is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
step 101: starting a server, and starting power-on according to a power-on sequence; (ii) a
Step 102: in the starting process of the server, the CPLD continuously monitors the power-on state of the server;
step 103: when the power-on state is abnormal, recording power-on abnormal information and detecting the starting state of the BMC;
determining the storage position of the power-on abnormal information according to the starting state of the BMC;
step 104: after the server is started, the CPLD continuously monitors abnormal power failure information of the server;
and determining the storage position of the abnormal power failure information according to the influence of the abnormal power failure information on the BMC.
In a preferred embodiment, step 103 further includes determining whether the BMC is started when the power-on state is abnormal; if the BMC is started, the starting abnormal information is recorded in the first storage module, after the BMC is started, the BMC reads the power-on abnormal information in the first storage module through the i2C bus to locate the fault reason, and after the reading is completed, the CPLD controls the server to be powered off and shut down, so that the power-off protection of the server is realized.
If the BMC is not started, the power-on abnormal information is recorded in the first flash memory module, the server is powered off and restarted, after the server is powered off and restarted and the BMC is started, the BMC reads the power-on abnormal information from the UFM and analyzes and positions the fault reason.
In a preferred embodiment, step 104 further includes after the server is powered on, when the running status is abnormal, determining whether the abnormal power-down information causes the BMC to hang up; if the abnormal power failure information cannot cause the BMC to be hung, recording the abnormal power failure information into a third storage module; the BMC reads abnormal power failure information through the i2C bus and sends out a clearing instruction, and the CPLD receives the clearing instruction and executes operation of clearing abnormal server state data in the third storage module and operation of closing the server so as to realize power failure protection of the server.
And if the abnormal power failure information can cause the BMC to be hung, recording the server state data with abnormal operation in the third storage module and the second flash memory module at the same time. And the CPLD records the abnormal power failure information into the UFM, performs power failure restart on the server, and after the server is powered off and restarted and the BMC is started, the BMC reads the server state data with abnormal operation from the UFM and analyzes and positions the fault reason.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a flash memory module storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used for storing abnormal data in the power-on process and the operation process.
The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a server power anomaly monitoring method.
Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
Example four
In one embodiment, a variety of computer-readable storage media are provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
step 201: starting a server, and starting power-on according to a power-on sequence;
step 202: in the starting process of the server, the CPLD continuously monitors the power-on state of the server;
step 203: when the power-on state is abnormal, recording power-on abnormal information and detecting the starting state of the BMC;
determining the storage position of the power-on abnormal information according to the starting state of the BMC;
step 204: after the server is started, the CPLD continuously monitors abnormal power failure information of the server;
and determining the storage position of the abnormal power failure information according to the influence of the abnormal power failure information on the BMC.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A server power supply abnormity monitoring method is characterized by comprising the following steps:
starting a server, and starting power-on according to a power-on sequence;
in the starting process of the server, the CPLD continuously monitors the power-on state of the server;
when the power-on state is abnormal, recording power-on abnormal information and detecting the starting state of the BMC;
and determining the storage position of the power-on abnormal information according to the starting state of the BMC.
2. The method for monitoring the power supply abnormality of the server according to claim 1, wherein the determining the storage location of the abnormality information according to the boot state of the BMC includes:
if the BMC is started, storing the power-on abnormal information in a first storage module so that the BMC can read and locate a fault reason;
and if the BMC is not started, storing the power-on abnormal information in a first flash memory module so that the BMC can read and locate the fault reason after starting.
3. The server power supply abnormality monitoring method according to claim 1 or 2, characterized by further comprising:
and if the power-on state is not abnormal, storing the power-on state data in a second storage module, reading the power-on state data from the second storage module after the BMC is started, and checking whether the abnormality exists.
4. The server power supply abnormality monitoring method according to claim 3, characterized by further comprising:
after the server is started, the CPLD continuously monitors abnormal power failure information of the server;
and determining the storage position of the abnormal power failure information according to the influence of the abnormal power failure information on the BMC.
5. The server power supply abnormality monitoring method according to claim 4, wherein the determining the storage location of the power failure abnormality information according to the influence of the abnormality power failure information on the BMC includes:
if the abnormal power failure information does not cause the BMC to be hung up, recording the abnormal power failure information in a third storage module for the BMC to read and locate a fault reason;
and if the abnormal power failure information can cause the BMC to be hung, storing the abnormal power failure information in the third storage module and the second flash memory module simultaneously so as to read and locate the fault reason after the BMC is started.
6. The method for monitoring the abnormal power supply of the server according to claim 5, wherein if the abnormal power failure information does not cause the BMC to hang up, the method further includes, after recording the abnormal power failure information in a third storage module:
reading the abnormal power failure information and sending a clearing instruction through the BMC;
and the CPLD can clear the abnormal power failure information in the third storage module and close the server according to the clearing instruction.
7. The server power supply abnormality monitoring method according to claim 6, characterized by further comprising:
and if the abnormal power failure information does not appear, storing the data after the server is started in a second storage module, reading the data after the server is started from the second storage module after the BMC is started, and checking whether the abnormality exists.
8. A server power anomaly monitoring system, the system comprising:
the control module is used for starting the server and starting power-on according to a power-on sequence;
the power-on monitoring module is used for continuously monitoring the power-on state of the server by the CPLD in the starting process of the server, recording power-on abnormal information and detecting the starting state of the BMC when the power-on state is abnormal, and determining the storage position of the power-on abnormal information according to the starting state of the BMC;
and the operation monitoring module is used for continuously monitoring abnormal power failure information of the server by the CPLD after the server is started, and determining the storage position of the abnormal power failure information according to the influence of the abnormal power failure information on the BMC.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 7 are implemented when the computer program is executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202210463541.2A 2022-04-28 2022-04-28 Method, system and storage medium for monitoring server power supply abnormality Active CN114816022B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210463541.2A CN114816022B (en) 2022-04-28 2022-04-28 Method, system and storage medium for monitoring server power supply abnormality

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210463541.2A CN114816022B (en) 2022-04-28 2022-04-28 Method, system and storage medium for monitoring server power supply abnormality

Publications (2)

Publication Number Publication Date
CN114816022A true CN114816022A (en) 2022-07-29
CN114816022B CN114816022B (en) 2023-08-04

Family

ID=82509324

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210463541.2A Active CN114816022B (en) 2022-04-28 2022-04-28 Method, system and storage medium for monitoring server power supply abnormality

Country Status (1)

Country Link
CN (1) CN114816022B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117008704A (en) * 2023-09-27 2023-11-07 天固信息安全系统(深圳)有限公司 Control method and device based on EC or CPLD, storage medium and electronic equipment
CN117149548A (en) * 2023-09-07 2023-12-01 上海合芯数字科技有限公司 Method and device for measuring time sequence of server system, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140115386A1 (en) * 2012-10-24 2014-04-24 Hon Hai Precision Industry Co., Ltd. Server and method for managing server
CN107066356A (en) * 2017-05-17 2017-08-18 郑州云海信息技术有限公司 A kind of storage method of server B MC configuration datas
CN108304299A (en) * 2018-03-02 2018-07-20 郑州云海信息技术有限公司 Server power-up state monitors system and method, computer storage and equipment
CN111258405A (en) * 2020-01-18 2020-06-09 苏州浪潮智能科技有限公司 Server mainboard burning prevention system and method
CN112948185A (en) * 2021-02-26 2021-06-11 浪潮电子信息产业股份有限公司 Server heat dissipation method and device and related components
WO2022078013A1 (en) * 2020-10-16 2022-04-21 苏州浪潮智能科技有限公司 Server power-down detection method and system, and device and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140115386A1 (en) * 2012-10-24 2014-04-24 Hon Hai Precision Industry Co., Ltd. Server and method for managing server
CN107066356A (en) * 2017-05-17 2017-08-18 郑州云海信息技术有限公司 A kind of storage method of server B MC configuration datas
CN108304299A (en) * 2018-03-02 2018-07-20 郑州云海信息技术有限公司 Server power-up state monitors system and method, computer storage and equipment
CN111258405A (en) * 2020-01-18 2020-06-09 苏州浪潮智能科技有限公司 Server mainboard burning prevention system and method
WO2022078013A1 (en) * 2020-10-16 2022-04-21 苏州浪潮智能科技有限公司 Server power-down detection method and system, and device and medium
CN112948185A (en) * 2021-02-26 2021-06-11 浪潮电子信息产业股份有限公司 Server heat dissipation method and device and related components

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117149548A (en) * 2023-09-07 2023-12-01 上海合芯数字科技有限公司 Method and device for measuring time sequence of server system, electronic equipment and storage medium
CN117149548B (en) * 2023-09-07 2024-04-26 上海合芯数字科技有限公司 Method and device for measuring time sequence of server system, electronic equipment and storage medium
CN117008704A (en) * 2023-09-27 2023-11-07 天固信息安全系统(深圳)有限公司 Control method and device based on EC or CPLD, storage medium and electronic equipment
CN117008704B (en) * 2023-09-27 2023-12-01 天固信息安全系统(深圳)有限公司 Control method and device based on EC or CPLD, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN114816022B (en) 2023-08-04

Similar Documents

Publication Publication Date Title
US20240012706A1 (en) Method, system and apparatus for fault positioning in starting process of server
CN114816022B (en) Method, system and storage medium for monitoring server power supply abnormality
CN104850485A (en) BMC based method and system for remote diagnosis of server startup failure
CN110445638B (en) Switch system fault protection method and device
CN110609778A (en) Method and system for storing server downtime log
CN114116280B (en) Interactive BMC self-recovery method, system, terminal and storage medium
CN110457907B (en) Firmware program detection method and device
CN115129520A (en) Computer system, computer server and starting method thereof
CN116820820A (en) Server fault monitoring method and system
CN115098291A (en) Method, system, storage medium and equipment for recording system restart reason
CN116775141A (en) Abnormality detection method, abnormality detection device, computer device, and storage medium
CN113672306B (en) Server component self-checking abnormity recovery method, device, system and medium
CN111488050A (en) Power supply monitoring method, system and server
EP3534259B1 (en) Computer and method for storing state and event log relevant for fault diagnosis
CN117707884A (en) Method, system, equipment and medium for monitoring power management chip
CN116501705A (en) RAS-based memory information collecting and analyzing method, system, equipment and medium
CN111400153A (en) Serial port log starting method and device and computer readable storage medium
CN113868001B (en) Method, system and computer storage medium for checking memory repair result
CN115728665A (en) Power failure detection circuit, method and system
CN114816822A (en) Server management method, device and system based on memory fault
CN111865719A (en) Automatic testing method and device for fault injection of switch
CN108415788B (en) Data processing apparatus and method for responding to non-responsive processing circuitry
CN109491872B (en) Memory supervision method and device and computer readable storage medium
CN117687821A (en) Method and device for processing bad blocks of cache memory and electronic equipment
CN117112273A (en) Fault state management and control method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant