CN111488050B - Power supply monitoring method, system and server - Google Patents

Power supply monitoring method, system and server Download PDF

Info

Publication number
CN111488050B
CN111488050B CN202010300845.8A CN202010300845A CN111488050B CN 111488050 B CN111488050 B CN 111488050B CN 202010300845 A CN202010300845 A CN 202010300845A CN 111488050 B CN111488050 B CN 111488050B
Authority
CN
China
Prior art keywords
power supply
monitoring device
fault
information
monitoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010300845.8A
Other languages
Chinese (zh)
Other versions
CN111488050A (en
Inventor
滕学军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010300845.8A priority Critical patent/CN111488050B/en
Publication of CN111488050A publication Critical patent/CN111488050A/en
Application granted granted Critical
Publication of CN111488050B publication Critical patent/CN111488050B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/28Supervision thereof, e.g. detecting power-supply failure by out of limits supervision
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0745Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in an input/output transactions management context
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • G06F11/3093Configuration details thereof, e.g. installation, enabling, spatial arrangement of the probes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a power supply monitoring method, which detects whether a communication link between a power supply and a monitoring device for monitoring the working condition of the power supply is interrupted; if the power failure alarm is not interrupted, determining that the power failure alarm of the monitoring device is effective; if the power failure alarm is interrupted, the power failure alarm of the monitoring device is determined to be invalid, and the communication ports of the power supply and the monitoring device and the communication bus between the power supply and the monitoring device are reset so as to repair a communication link between the power supply and the monitoring device. Therefore, when the communication between the monitoring device and the power supply is interrupted, the false alarm of the monitoring device is determined, and the communication link between the monitoring device and the power supply is repaired, so that the false alarm problem caused by the interruption of the communication between the monitoring device and the power supply is avoided. The invention also discloses a power supply monitoring system and a server, which have the same beneficial effects as the power supply monitoring method.

Description

Power supply monitoring method, system and server
Technical Field
The invention relates to the field of power supply monitoring, in particular to a power supply monitoring method, a power supply monitoring system and a server.
Background
The servers of the data center play an important role in data calculation and storage, and once the power supply of the servers fails, the servers are down, which easily causes data loss of the servers. At present, in order to avoid data loss caused by power failure, a power system composed of two identical power supplies is configured for a server, a monitoring device for monitoring working conditions of the two power supplies is configured for the power system, and when the monitoring device monitors that one power supply fails, the other power supply takes over power supply operation to continuously ensure normal operation of the server. However, in the communication process between the monitoring device and the power supply, the handshaking failure and communication interruption of the monitoring device and the power supply may be caused by electromagnetic interference, and at this time, the monitoring device cannot monitor the power supply, and a power supply failure alarm is performed, so that a false alarm problem exists.
Therefore, how to provide a solution to the above technical problem is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a power supply monitoring method, a system and a server, which can determine false alarm of a monitoring device and repair a communication link between the monitoring device and a power supply when the communication between the monitoring device and the power supply is interrupted, thereby avoiding the false alarm problem caused by the communication interruption between the monitoring device and the power supply.
In order to solve the technical problem, the invention provides a power supply monitoring method, which comprises the following steps:
detecting whether a communication link between a power supply and a monitoring device which is directly connected with the power supply and is used for monitoring the working condition of the power supply is interrupted;
if not, determining that the power failure alarm of the monitoring device is effective;
and if so, determining that the power failure alarm of the monitoring device is invalid, and resetting the power supply, the communication port of the monitoring device and a communication bus between the power supply and the communication port of the monitoring device to repair the communication link.
Preferably, the power supply monitoring method further includes:
pre-establishing address information corresponding relation between the address of a register used for storing power failure information and the stored power failure information;
after analyzing the operation parameter information of the power supply to obtain the actual fault information of the power supply, determining a target address corresponding to the actual fault information according to the address information corresponding relation, and writing a target register corresponding to the target address into a preset fault value for the monitoring device to read.
Preferably, the operation parameter information includes input/output parameter information of the power supply and operation parameter information of key components inside the power supply;
and the power supply monitoring method further comprises:
when the actual fault information of the power supply is analyzed, recording the fault analysis condition of the power supply;
and periodically acquiring the current operation parameter information of the power supply, and predicting the future fault condition of the power supply by combining the fault analysis condition of the historical record.
Preferably, the power supply monitoring method further includes:
pre-establishing an index relation corresponding table for searching the power failure type and the failure processing mode according to the power failure information;
and after analyzing the operation parameter information of the power supply to obtain the actual fault information of the power supply, finding the power supply fault type and the fault processing mode corresponding to the actual fault information according to the index relation corresponding table.
Preferably, the power supply monitoring method further includes:
and when the searched fault processing mode is a firmware upgrading mode, triggering a chip for upgrading the firmware in the power supply to carry out online firmware upgrading.
Preferably, the chip comprises a first chip core and a second chip core;
correspondingly, the process of triggering a chip for firmware upgrade in the power supply to perform online firmware upgrade includes:
detecting whether a first chip core pre-designated to execute firmware upgrading operation fails;
if not, triggering the first chip core to execute firmware upgrading operation;
and if so, triggering the second chip core to execute firmware upgrading operation.
In order to solve the above technical problem, the present invention further provides a power supply monitoring system, including:
the first communication fault-tolerant-resisting module is arranged in the power supply and used for resetting a communication port of the power supply when a communication link between the power supply and a monitoring device which is directly connected with the power supply and used for monitoring the working condition of the power supply is interrupted;
the second communication fault-tolerant-resisting module is arranged in the monitoring device and used for detecting whether a communication link between the power supply and the monitoring device is interrupted or not, and if not, determining that the power supply fault alarm of the monitoring device is effective; if so, determining that the power failure alarm of the monitoring device is invalid, and resetting a communication port of the monitoring device and a communication bus between the monitoring device and the power supply so as to repair the communication link.
Preferably, the power supply monitoring system further comprises:
the register is arranged in the power supply and used for storing power supply fault information;
the fault processing module is arranged in the power supply and used for pre-establishing an address information corresponding relation between the address of the register and the stored power supply fault information; after analyzing the operation parameter information of the power supply to obtain the actual fault information of the power supply, determining a target address corresponding to the actual fault information according to the address information corresponding relation, and writing a target register corresponding to the target address into a preset fault value for the monitoring device to read.
In order to solve the technical problem, the invention also provides a server, which comprises a power supply and a monitoring device which is directly connected with the power supply and is used for monitoring the working condition of the power supply; wherein, the power supply is monitored by adopting any one of the power supply monitoring methods.
Preferably, the monitoring device is specifically a BMC in the server.
The invention provides a power supply monitoring method, which detects whether a communication link between a power supply and a monitoring device for monitoring the working condition of the power supply is interrupted; if the power failure alarm is not interrupted, determining that the power failure alarm of the monitoring device is effective; if the power failure alarm is interrupted, the power failure alarm of the monitoring device is determined to be invalid, and the communication ports of the power supply and the monitoring device and the communication bus between the power supply and the monitoring device are reset so as to repair a communication link between the power supply and the monitoring device. Therefore, when the communication between the monitoring device and the power supply is interrupted, the false alarm of the monitoring device is determined, and the communication link between the monitoring device and the power supply is repaired, so that the false alarm problem caused by the interruption of the communication between the monitoring device and the power supply is avoided.
The invention also provides a power supply monitoring system and a server, and the power supply monitoring system and the server have the same beneficial effects as the power supply monitoring method.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed in the prior art and the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a flowchart of a power monitoring method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of power monitoring under an Intel chip topology according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an improved power supply monitoring provided by an embodiment of the present invention;
fig. 4 is a schematic diagram of monitoring a power failure according to an embodiment of the present invention.
Detailed Description
The core of the invention is to provide a power supply monitoring method, a system and a server, when the communication between a monitoring device and a power supply is interrupted, the false alarm of the monitoring device is determined, and a communication link between the monitoring device and the power supply is repaired, so that the false alarm problem caused by the communication interruption between the monitoring device and the power supply is avoided.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating a power monitoring method according to an embodiment of the present invention.
The power supply monitoring method comprises the following steps:
step S1: detecting whether a communication link between a power supply and a monitoring device which is directly connected with the power supply and is used for monitoring the working condition of the power supply is interrupted; if not, go to step S2; if yes, go to step S3.
Step S2: and determining that the power failure alarm of the monitoring device is effective.
Step S3: determining that the power failure alarm of the monitoring device is invalid and resetting the power supply and the communication port of the monitoring device and the communication bus therebetween to repair the communication link.
Specifically, referring to fig. 2, fig. 2 is a schematic diagram illustrating a power supply monitoring under an Intel chip topology according to an embodiment of the present invention. In the process of monitoring the power supply, firstly, an Intel chip set ME (Management Engine) passes through I2The C bus reads the information of the power supply, and the monitoring device (such as BMC (Baseboard Management Controller) as the monitoring device) passes through another path I2The C bus reads the information of the power supply from the interior of the ME to ensure that the monitoring device monitors the information of the power supply in real time, and the ME plays a role in bridging in the process. When the server is in the state of S5 (one of the states of the server motherboard, S5 represents that the motherboard AC is powered on but not powered on), the ME is not working properly; after the server enters the state of S0 (one of the states of the server motherboard, S0 represents that the motherboard is already powered on), the ME starts to work normally; when the server enters the S0 state from the S5 state, the motherboard boot signal is sent to the monitoring device and the PCH (integrated south bridge) at the same time, the monitoring device monitors the power information after receiving the signal, and the PCH controls the server to boot after receiving the signal; in the process, when the monitoring device scans the information of the monitoring power supply, the ME does not work normally yet, so that the monitoring device and the ME cannot communicate with each other, the monitoring device records the power supply fault and gives an alarm after detecting that the communication cannot be performed, however, the 'false' alarm is not a real 'fault', and a large amount of work is brought to operation and maintenance personnel.
In order to solve the problems, a direct connection topology design is adopted between the power supply and the monitoring device for monitoring the working condition of the power supply, as shown in fig. 3, namely, the monitoring device is directly communicated with the power supply in any state, and no intermediate link exists, so that the problem of 'error' alarm caused by the intermediate link is effectively solved.
In addition, considering that in the process of communication between the monitoring device and the power supply, the two may be caused by handshake failure and communication interruption due to electromagnetic interference, and at this time, the monitoring device cannot perform monitoring on the power supply, and power failure alarm is performed, so that the problem of false alarm exists, the technical means adopted by the application is as follows:
detecting whether a communication link between a power supply and a monitoring device is interrupted, and if the communication link between the power supply and the monitoring device is not interrupted, indicating that the monitoring device is not a power supply failure alarm caused by the interruption of the communication with the power supply, determining that the power supply failure alarm of the monitoring device is effective; if the communication link between the monitoring device and the power supply is interrupted, the monitoring device is indicated to be a power supply failure alarm caused by the interruption of the communication with the power supply, the power supply failure alarm of the monitoring device is determined to be invalid, namely, the monitoring device is determined to have a false alarm problem due to the interruption of the communication with the power supply, and the communication between the monitoring device and the power supply is repaired.
The communication repair operation between the monitoring device and the power supply specifically comprises the following steps: arranging a first communication fault-tolerant resisting module in the power supply, detecting whether the communication between the power supply and the monitoring device is interrupted or not by the first communication fault-tolerant resisting module, and resetting a communication port of the power supply to recover the communication port of the power supply if the communication is interrupted; if the communication is not interrupted, the reset operation of the power supply communication port is not executed. Similarly, a second communication fault-tolerant resisting module is arranged in the monitoring device, the second communication fault-tolerant resisting module detects whether the monitoring device is interrupted in communication with the power supply, and if the communication is interrupted, the communication port of the monitoring device is reset so as to recover the communication port of the monitoring device; meanwhile, the second communication fault-tolerant resisting module resets a communication bus between the monitoring device and the power supply so as to repair a communication link; if the communication is not interrupted, the reset operation of the communication port and the communication bus of the device is not executed.
More specifically, the first communication fault-tolerant resistant module detects the way of the power supply and the monitoring device communication interruption: recording the time of the monitoring device polling the power supply, if the monitoring device does not access the power supply in 15 polling periods (the monitoring device polls the power supply once a second generally, and 15 polling periods are 15 seconds), determining that the communication between the power supply and the monitoring device is interrupted, and executing the operation of resetting the communication port of the power supply, thereby ensuring that the problem of timely recovery due to the fault of the communication port of the power supply is solved in the process of communication between the power supply and the monitoring device. The second communication fault-tolerant-resisting module detects the mode of the monitoring device and the power supply communication interruption: when the monitoring device does not respond to the power supply communication through periodic detection and recognition, the communication interruption between the monitoring device and the power supply is determined, and the operation of resetting the communication port of the monitoring device is executed, so that the problem of timely recovery due to the fact that the communication port of the monitoring device breaks down in the process of communication between the monitoring device and the power supply is solved. In addition, the second communication fault-tolerant module also performs an operation of resetting the communication port of the monitoring device when a PEC (Parity Check) transmission error is detected during the communication process. The second communication fault-tolerant-resistant module resets the communication bus between the monitoring device and the power supply: the signal (9 clocks) that the monitoring device established communication with the power supply is retransmitted to the power supply.
The invention provides a power supply monitoring method, which detects whether a communication link between a power supply and a monitoring device for monitoring the working condition of the power supply is interrupted; if the power failure alarm is not interrupted, determining that the power failure alarm of the monitoring device is effective; if the power failure alarm is interrupted, the power failure alarm of the monitoring device is determined to be invalid, and the communication ports of the power supply and the monitoring device and the communication bus between the power supply and the monitoring device are reset so as to repair a communication link between the power supply and the monitoring device. Therefore, when the communication between the monitoring device and the power supply is interrupted, the false alarm of the monitoring device is determined, and the communication link between the monitoring device and the power supply is repaired, so that the false alarm problem caused by the interruption of the communication between the monitoring device and the power supply is avoided.
On the basis of the above-described embodiment:
as an optional embodiment, the power supply monitoring method further includes:
pre-establishing address information corresponding relation between the address of a register used for storing power failure information and the stored power failure information;
after the operation parameter information of the power supply is analyzed to obtain the actual fault information of the power supply, a target address corresponding to the actual fault information is determined according to the address information corresponding relation, and a target register corresponding to the target address is written into a preset fault value for the monitoring device to read.
It should be noted that the preset of the present application is set in advance, and only needs to be set once, and the reset is not needed unless the modification is needed according to the actual situation.
Further, the present application may also establish a correspondence relationship (address information correspondence relationship for short, which may be embodied in a table form) between an address of a register for storing power failure information and the power failure information stored therein in advance, that is, the address information correspondence relationship represents which kind of failure information (such as OVP overvoltage failure, UVP undervoltage failure) of the power supply is specifically stored in each register for storing power failure information. Based on this, after the actual fault information of the power supply is obtained by analyzing the operation parameter information of the power supply, the target address corresponding to the obtained actual fault information, namely the address of the target register for storing the obtained actual fault information, can be determined according to the established address information corresponding relation, and then the preset fault value is written into the target register based on the address of the target register to indicate that the power supply has the fault corresponding to the actual fault information. At the same time, the monitoring device may interact with the power supply to read stored information in a register within the power supply to determine a fault condition of the power supply based on the stored information in the register.
More specifically, referring to fig. 4, fig. 4 is a schematic diagram illustrating a power failure monitoring method according to an embodiment of the present invention. The specific mode of monitoring the power failure by the monitoring device is as follows: the power supply is internally provided with a fault processing module and a register for storing power supply fault information, wherein the fault processing module is used for pre-establishing an address information corresponding relation between the address of the register and the stored power supply fault information, determining a target address corresponding to the actual fault information according to the address information corresponding relation after analyzing the operation parameter information of the power supply to obtain the actual fault information of the power supply, and writing a target register corresponding to the target address into a preset fault value for a monitoring device to read.
As an optional embodiment, the operation parameter information includes input/output parameter information of the power supply and operation parameter information of key components inside the power supply;
and the power supply monitoring method further comprises the following steps:
when the actual fault information of the power supply is analyzed, the fault analysis condition of the power supply is recorded;
the current operation parameter information of the power supply is periodically acquired, and the future fault condition of the power supply is predicted by combining the fault analysis condition of the historical record.
Further, the method analyzes the operation parameter information of the power supply to obtain the actual fault information of the power supply, and specifically analyzes the input and output parameter information of the power supply and the operation parameter information of key components inside the power supply, wherein the input and output parameter information of the power supply is analyzed to obtain the externally dominant fault of the power supply, and the operation parameter information of the key components inside the power supply is analyzed to obtain the faults of the internal structure of the power supply, such as the comprehensive faults and the standard exceeding information of the voltage, the current and the temperature of the key components inside the power supply.
Therefore, when the actual fault information of the power supply is analyzed, the fault analysis condition of the power supply can be recorded and used as the basis for subsequently prejudging the power supply fault. In addition, the method and the device periodically acquire the current operation parameter information of the power supply, and predict the future fault condition of the power supply by combining the fault analysis condition of the historical record.
More specifically, the fault processing module analyzes the operation parameter information of the power supply to obtain actual fault information of the power supply, records a fault analysis log of the power supply, and sends the fault analysis log of the power supply to the monitoring device for saving. The monitoring device periodically polls the current operation parameter information of the power supply from the fault processing module and predicts the future fault condition of the power supply by combining with a fault analysis log stored in history.
As an optional embodiment, the power supply monitoring method further includes:
pre-establishing an index relation corresponding table for searching the power failure type and the failure processing mode according to the power failure information;
after the operation parameter information of the power supply is analyzed to obtain the actual fault information of the power supply, the power supply fault type and the fault processing mode corresponding to the actual fault information are found according to the index relation corresponding table.
Further, the method and the device can also establish an index relation corresponding table used for searching the power failure type and the failure processing mode according to the power failure information in advance, namely the index relation corresponding table represents the power failure type and the failure processing mode corresponding to any power failure information. Based on the method and the device, after the operation parameter information of the power supply is analyzed to obtain the actual fault information of the power supply, the power supply fault type and the fault processing mode corresponding to the obtained actual fault information can be found according to the index relation corresponding table.
More specifically, the fault processing module analyzes the operation parameter information of the power supply to obtain actual fault information of the power supply, and sends the actual fault information to the monitoring device (such as the BMC). The BMC stores the index relation corresponding table in advance, and after receiving the actual fault information, searches the power failure type and the fault processing mode corresponding to the obtained actual fault information according to the index relation corresponding table. For operation and maintenance personnel, the fault type and how to process the current fault can be known by remotely accessing the BMC WEB interface, so that the maintenance cost is saved.
As an optional embodiment, the power supply monitoring method further includes:
and when the searched fault processing mode is the firmware upgrading mode, triggering a chip for upgrading the firmware in the power supply to carry out online firmware upgrading.
Further, if the fault processing mode corresponding to the current fault information of the power supply is a firmware upgrading mode, the current fault of the power supply is eliminated by upgrading the firmware of the power supply. The existing system upgrading mode is as follows: the offline power supply upgrade is to take the power supply out of the system, and use a tool composed of a jig board, a computer, a burner, a USB (Universal Serial Bus) cable, a USB conversion head, and a PMBus (power management Bus) cable to upgrade the firmware of the power supply one by one. The method adopts online upgrading, and particularly, the monitoring device sends an upgrading instruction to the power supply to trigger a chip for upgrading firmware in the power supply to upgrade the firmware online, so that the method is simple and convenient.
As an alternative embodiment, the chip comprises a first chip core and a second chip core;
correspondingly, the process of triggering a chip for firmware upgrade in a power supply to perform online firmware upgrade comprises the following steps:
detecting whether a first chip core pre-designated to execute firmware upgrading operation fails;
if not, triggering the first chip core to execute firmware upgrading operation;
if yes, triggering the second chip core to execute the firmware upgrading operation.
Specifically, the chip for firmware upgrade adopts a dual-core chip, namely two chip cores are guaranteed to be mirror images of each other, if one chip core fails, the other chip core can continue to execute firmware upgrade operation, so that the successful online effective upgrade of the power firmware can be ensured, and meanwhile, the situation that the system fails to upgrade due to abnormity (such as interruption, interference, code error, sudden power failure in the upgrade process and the like) in the firmware upgrade process and further the system is crashed can be prevented.
The present application further provides a power monitoring system, including:
the first communication fault-tolerant-resisting module is arranged in the power supply and used for resetting a communication port of the power supply when a communication link between the power supply and a monitoring device which is directly connected with the power supply and used for monitoring the working condition of the power supply is interrupted;
the second communication fault-tolerant-resisting module is arranged in the monitoring device and used for detecting whether a communication link between the power supply and the monitoring device is interrupted or not, and if not, determining that the power supply fault alarm of the monitoring device is effective; if so, determining that the power failure alarm of the monitoring device is invalid, and resetting a communication port of the monitoring device and a communication bus between the monitoring device and a power supply so as to repair a communication link.
As an alternative embodiment, the power supply monitoring system further comprises:
the register is arranged in the power supply and used for storing power supply fault information;
the fault processing module is arranged in the power supply and used for pre-establishing an address information corresponding relation between the address of the register and the stored power supply fault information; after the operation parameter information of the power supply is analyzed to obtain the actual fault information of the power supply, a target address corresponding to the actual fault information is determined according to the address information corresponding relation, and a target register corresponding to the target address is written into a preset fault value for the monitoring device to read.
For introduction of the power monitoring system provided in the present application, reference is made to the embodiments of the power monitoring method described above, and details of the power monitoring system are not repeated herein.
The application also provides a server, which comprises a power supply and a monitoring device which is directly connected with the power supply and is used for monitoring the working condition of the power supply; wherein, the power supply is monitored by adopting any one of the power supply monitoring methods.
As an alternative embodiment, the monitoring device is embodied as a BMC within the server.
For the introduction of the server provided in the present application, please refer to the above embodiments of the power monitoring method, which are not described herein again.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

1. A power supply monitoring method, comprising:
detecting whether a communication link between a power supply and a monitoring device which is directly connected with the power supply and is used for monitoring the working condition of the power supply is interrupted;
if not, determining that the power failure alarm of the monitoring device is effective;
if so, determining that the power failure alarm of the monitoring device is invalid, and resetting the power supply, a communication port of the monitoring device and a communication bus between the power supply and the communication port of the monitoring device to repair the communication link;
pre-establishing an index relation corresponding table for searching the power failure type and the failure processing mode according to the power failure information;
and after analyzing the operation parameter information of the power supply to obtain the actual fault information of the power supply, finding the power supply fault type and the fault processing mode corresponding to the actual fault information according to the index relation corresponding table.
2. The power supply monitoring method of claim 1, further comprising:
pre-establishing address information corresponding relation between the address of a register used for storing power failure information and the stored power failure information;
after analyzing the operation parameter information of the power supply to obtain the actual fault information of the power supply, determining a target address corresponding to the actual fault information according to the address information corresponding relation, and writing a target register corresponding to the target address into a preset fault value for the monitoring device to read.
3. The power supply monitoring method according to claim 2, wherein the operation parameter information includes input/output parameter information of the power supply and operation parameter information of key components inside the power supply;
and the power supply monitoring method further comprises:
when the actual fault information of the power supply is analyzed, recording the fault analysis condition of the power supply;
and periodically acquiring the current operation parameter information of the power supply, and predicting the future fault condition of the power supply by combining the fault analysis condition of the historical record.
4. The power supply monitoring method of claim 1, further comprising:
and when the searched fault processing mode is a firmware upgrading mode, triggering a chip for upgrading the firmware in the power supply to carry out online firmware upgrading.
5. The power monitoring method of claim 4, wherein the chip comprises a first chip core and a second chip core;
correspondingly, the process of triggering a chip for firmware upgrade in the power supply to perform online firmware upgrade includes:
detecting whether a first chip core pre-designated to execute firmware upgrading operation fails;
if not, triggering the first chip core to execute firmware upgrading operation;
and if so, triggering the second chip core to execute firmware upgrading operation.
6. A power monitoring system, comprising:
the first communication fault-tolerant-resisting module is arranged in the power supply and used for resetting a communication port of the power supply when a communication link between the power supply and a monitoring device which is directly connected with the power supply and used for monitoring the working condition of the power supply is interrupted;
the second communication fault-tolerant-resisting module is arranged in the monitoring device and used for detecting whether a communication link between the power supply and the monitoring device is interrupted or not, and if not, determining that the power supply fault alarm of the monitoring device is effective; if so, determining that the power failure alarm of the monitoring device is invalid, and resetting a communication port of the monitoring device and a communication bus between the monitoring device and the power supply so as to repair the communication link; pre-establishing an index relation corresponding table for searching the power failure type and the failure processing mode according to the power failure information; and after analyzing the operation parameter information of the power supply to obtain the actual fault information of the power supply, finding the power supply fault type and the fault processing mode corresponding to the actual fault information according to the index relation corresponding table.
7. The power monitoring system of claim 6, further comprising:
the register is arranged in the power supply and used for storing power supply fault information;
the fault processing module is arranged in the power supply and used for pre-establishing an address information corresponding relation between the address of the register and the stored power supply fault information; after analyzing the operation parameter information of the power supply to obtain the actual fault information of the power supply, determining a target address corresponding to the actual fault information according to the address information corresponding relation, and writing a target register corresponding to the target address into a preset fault value for the monitoring device to read.
8. A server is characterized by comprising a power supply and a monitoring device which is directly connected with the power supply and is used for monitoring the working condition of the power supply; wherein the power supply is monitored using the power supply monitoring method according to any one of claims 1 to 5.
9. The server according to claim 8, wherein the monitoring means is embodied as a BMC within the server.
CN202010300845.8A 2020-04-16 2020-04-16 Power supply monitoring method, system and server Active CN111488050B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010300845.8A CN111488050B (en) 2020-04-16 2020-04-16 Power supply monitoring method, system and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010300845.8A CN111488050B (en) 2020-04-16 2020-04-16 Power supply monitoring method, system and server

Publications (2)

Publication Number Publication Date
CN111488050A CN111488050A (en) 2020-08-04
CN111488050B true CN111488050B (en) 2022-04-22

Family

ID=71791756

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010300845.8A Active CN111488050B (en) 2020-04-16 2020-04-16 Power supply monitoring method, system and server

Country Status (1)

Country Link
CN (1) CN111488050B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113625696B (en) * 2021-08-31 2023-03-24 东风商用车有限公司 Safety processing method and system for overcurrent protection of vehicle-mounted controller
CN117527478A (en) * 2024-01-05 2024-02-06 西安图为电气技术有限公司 Monitoring system for power module and power module management system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101102377A (en) * 2007-07-24 2008-01-09 北京意科通信技术有限责任公司 A communication power operation management and alert system and its method
CN102624584A (en) * 2012-03-01 2012-08-01 中兴通讯股份有限公司 Link detection method and link detection device
CN104656531A (en) * 2015-01-16 2015-05-27 张泽 Monitoring method and device for intelligent equipment
CN106292986A (en) * 2016-08-08 2017-01-04 浪潮电子信息产业股份有限公司 A kind of server power supply PSU fault determination method and device
CN106712287A (en) * 2016-11-21 2017-05-24 国家电网公司 Intelligent alarm analysis system of intelligent transformer substation
CN106788712A (en) * 2017-01-11 2017-05-31 山西恒海创盈科技有限公司 Electric power optical cable on-line intelligence monitoring system
CN108399116A (en) * 2018-03-02 2018-08-14 郑州云海信息技术有限公司 A kind of server power-up state monitoring system and method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6949916B2 (en) * 2002-11-12 2005-09-27 Power-One Limited System and method for controlling a point-of-load regulator
CN103792923A (en) * 2014-02-14 2014-05-14 浪潮电子信息产业股份有限公司 Method for detecting and controlling sets of power supplies of main board through digital chips
US10386425B2 (en) * 2014-03-24 2019-08-20 Lenovo Enterprise Solutions (Singapore) Pte. Ltd. Method and system for managing power faults
CN105897491A (en) * 2016-06-24 2016-08-24 努比亚技术有限公司 Method and device for filtering invalid monitoring alarm information
CN109885151A (en) * 2019-01-31 2019-06-14 郑州云海信息技术有限公司 A kind of server power supply monitoring method and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101102377A (en) * 2007-07-24 2008-01-09 北京意科通信技术有限责任公司 A communication power operation management and alert system and its method
CN102624584A (en) * 2012-03-01 2012-08-01 中兴通讯股份有限公司 Link detection method and link detection device
CN104656531A (en) * 2015-01-16 2015-05-27 张泽 Monitoring method and device for intelligent equipment
CN106292986A (en) * 2016-08-08 2017-01-04 浪潮电子信息产业股份有限公司 A kind of server power supply PSU fault determination method and device
CN106712287A (en) * 2016-11-21 2017-05-24 国家电网公司 Intelligent alarm analysis system of intelligent transformer substation
CN106788712A (en) * 2017-01-11 2017-05-31 山西恒海创盈科技有限公司 Electric power optical cable on-line intelligence monitoring system
CN108399116A (en) * 2018-03-02 2018-08-14 郑州云海信息技术有限公司 A kind of server power-up state monitoring system and method

Also Published As

Publication number Publication date
CN111488050A (en) 2020-08-04

Similar Documents

Publication Publication Date Title
US7589624B2 (en) Component unit monitoring system and component unit monitoring method
CN111324192A (en) System board power supply detection method, device, equipment and storage medium
CN106789306B (en) Method and system for detecting, collecting and recovering software fault of communication equipment
CN112286709B (en) Diagnosis method, diagnosis device and diagnosis equipment for server hardware faults
CN114328102B (en) Equipment state monitoring method, equipment state monitoring device, equipment and computer readable storage medium
CN111488050B (en) Power supply monitoring method, system and server
CN102880527B (en) Data recovery method of baseboard management controller
TW201119173A (en) Method of using power supply to execute remote monitoring of an electronic system
CN110032465A (en) A kind of BMC restarts log recording method and device
CN116126772A (en) UART serial port management system and method applied to ARM server
CN113672306B (en) Server component self-checking abnormity recovery method, device, system and medium
CN116775141A (en) Abnormality detection method, abnormality detection device, computer device, and storage medium
CN117573455B (en) PCIE equipment detection system, method, device and product
CN114816022A (en) Server power supply abnormity monitoring method, system and storage medium
CN116225812B (en) Baseboard management controller system operation method, device, equipment and storage medium
CN115562900B (en) AMD server system installation power-off processing method, device, equipment and medium
CN115728665A (en) Power failure detection circuit, method and system
CN115470056A (en) Method, system, device and medium for troubleshooting power-on starting of server hardware
CN100369009C (en) Monitor system and method capable of using interrupt signal of system management
CN115098342A (en) System log collection method, system, terminal and storage medium
CN115080132A (en) Information processing method, information processing apparatus, server, and storage medium
CN113162015A (en) Abnormal positioning protection method and device for main board power supply
CN111414274A (en) Far-end eliminating method for abnormal state of cabinet applied to data center
CN111416721A (en) Far-end eliminating method for abnormal state of cabinet applied to data center
CN108388488A (en) A kind of intelligent platform management system and fault handling method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant