CN116301276A - Device and method for detecting state of power module of server - Google Patents

Device and method for detecting state of power module of server Download PDF

Info

Publication number
CN116301276A
CN116301276A CN202310330277.XA CN202310330277A CN116301276A CN 116301276 A CN116301276 A CN 116301276A CN 202310330277 A CN202310330277 A CN 202310330277A CN 116301276 A CN116301276 A CN 116301276A
Authority
CN
China
Prior art keywords
power supply
server
signal
module
standby power
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310330277.XA
Other languages
Chinese (zh)
Inventor
王治大
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202310330277.XA priority Critical patent/CN116301276A/en
Publication of CN116301276A publication Critical patent/CN116301276A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/28Supervision thereof, e.g. detecting power-supply failure by out of limits supervision
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/30Means for acting in the event of power-supply failure or interruption, e.g. power-supply fluctuations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02BCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO BUILDINGS, e.g. HOUSING, HOUSE APPLIANCES OR RELATED END-USER APPLICATIONS
    • Y02B70/00Technologies for an efficient end-user side electric power management and consumption
    • Y02B70/30Systems integrating technologies related to power network operation and communication or information technologies for improving the carbon footprint of the management of residential or tertiary loads, i.e. smart grids as climate change mitigation technology in the buildings sector, including also the last stages of power distribution and the control, monitoring or operating management systems at local level
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S20/00Management or operation of end-user stationary applications or the last stages of power distribution; Controlling, monitoring or operating thereof
    • Y04S20/20End-user application control systems

Abstract

The invention relates to the field of power state detection, and particularly discloses a device and a method for detecting the state of a power module of a server, wherein the power module state collector comprises: the system comprises a server main board, a detection manager and a standby power supply module, wherein the server main board is connected with the standby power supply module, and is configured to acquire a state signal of the standby power supply module and a standby power supply signal of the server from the server main board and send the state signal and the standby power supply signal to the detection manager; detection manager: the system comprises a power module state collector, a standby power module state signal and a server standby power supply signal, wherein the power module state collector is connected with the standby power module state signal and the server standby power supply signal, and the standby power module state signal and the server standby power supply signal are recorded to generate a detection log. According to the invention, the state signal of the power supply module and the standby power supply signal of the server are collected in real time, the collected signals are recorded into the log, and maintenance personnel can analyze the fault reason according to the state signal of the power supply module and the standby power supply signal of the server, so that data support is provided for determining the fault reason and subsequent maintenance, and the probability of subsequent downtime fault is reduced.

Description

Device and method for detecting state of power module of server
Technical Field
The invention relates to the field of power state detection, in particular to a device and a method for detecting the state of a server power module.
Background
The current server generally adopts a dual-power supply mode on the user site, each power supply module supplies power independently, one power supply is commercial power, and the other power supply is uninterruptible power supply. When the mains supply fails, the power supply is switched to the uninterruptible power supply, so that the operation of the server is ensured, and the outage is avoided. Fig. 1 is a schematic diagram of a current power supply structure of a server, as shown in fig. 1, a server main board is respectively connected with a main power module and a standby power module, wherein the main power module is powered by mains supply, and the standby power module is powered by an uninterruptible power supply.
Although in theory, when one power supply environment is abnormal, the power supply environment is automatically switched to the other power supply environment, such as the uninterruptible power supply when the mains supply fails. However, in practice, a situation may be unavoidable at the customer site: when the mains supply fails, a large-area downtime phenomenon is caused. After downtime, customer use is affected, resulting in complaints. In theory, two power supplies are in a redundant mode, one power supply is powered down, the operation of a machine cannot be affected, and after the downtime problem occurs, whether the problem of power supply switching inside a server or the problem of the on-site power supply environment of a client is difficult to define. The method is characterized in that the downtime is instantaneous, and after the downtime, a client can restart the server to solve the fault, and no effective information record can analyze and determine which fault is caused by the fault, which is unfavorable for the maintenance of the following server and the power supply environment of the server.
Disclosure of Invention
In order to solve the problems, the invention provides a device and a method for detecting the state of a server power module, which provide data support for determining the cause of failure and subsequent maintenance and reduce the probability of subsequent downtime failure.
In a first aspect, the present invention provides a server power module status detection apparatus, including: a power module state collector and a detection manager;
power module state collector: the system comprises a server main board, a detection manager and a standby power supply module, wherein the server main board is connected with the standby power supply module, and is configured to acquire a state signal of the standby power supply module and a standby power supply signal of the server from the server main board and send the state signal and the standby power supply signal to the detection manager;
detection manager: the system comprises a power module state collector, a standby power module state signal and a server standby power supply signal, wherein the power module state collector is connected with the standby power module state signal and the server standby power supply signal, and the standby power module state signal and the server standby power supply signal are recorded to generate a detection log.
In an optional embodiment, the detection manager is further connected to the server motherboard, and configured to collect a server status signal and a power supply switching signal, determine whether the server is down according to the server status signal, respond to the server down when the main power supply module is switched to the standby power supply module, analyze a fault cause according to the standby power supply module status signal and the server standby power supply signal, and record the fault cause to the log;
the fault cause is analyzed according to the standby power module state signal and the server standby power supply signal, and the method specifically comprises the following steps: if the standby power supply module state signal and the server standby power supply signal are abnormal, judging that the standby power supply module power supply environment is abnormal; and if the standby power supply state signal and the server standby power supply signal are normal, judging that the reliability of the redundant logic of the server is abnormal.
In an alternative embodiment, the power module state collector is further configured to collect a state signal of the main power module from the server motherboard and send the state signal to the detection manager;
correspondingly, the detection manager is further configured to receive the main power module status signal sent by the power module status collector, and record the main power module status signal to the detection log.
In an optional embodiment, the detection manager is further configured to, in response to the standby power module switching to the main power module, downtime of the server, analyze a cause of the fault according to the main power module status signal and the server standby power signal, and record the cause of the fault to the log;
the method comprises the steps of analyzing fault reasons according to a main power module state signal and a server standby power supply signal, and specifically comprises the following steps: if the state signal of the main power supply module and the standby power supply signal of the server are abnormal, judging that the power supply environment of the main power supply module is abnormal; and if the main power supply state signal and the server standby power supply signal are normal, judging that the reliability of the redundant logic of the server is abnormal.
In an alternative embodiment, the status signals of the power supply module include an output voltage normal indication signal and a power supply control enable signal; responding to the output voltage normal indication signal and the power supply control enabling signal to be in a high level, and judging that the state signal of the power supply module is normal; in response to the output voltage normal indication signal and the power control enable signal being low level, determining that the status signal of the power supply module is abnormal;
responding to the server standby power supply signal to be high level, and judging that the server standby power supply signal is normal; and judging that the standby power supply signal of the server is abnormal in response to the standby power supply signal of the server being at a low level.
In an alternative embodiment, the power module state harvester employs a programmable logic device.
In an alternative embodiment, the device further comprises a main connector, and the server motherboard is provided with a secondary connector; the power module state collector is connected with the auxiliary connector through the main connector.
In an alternative embodiment, the device further comprises a battery module, and the battery module is respectively connected with the power module state collector and the detection manager and supplies power to the power module state collector and the detection manager.
In a second aspect, the present invention provides a method for detecting a state of a power module of a server, including the following steps:
receiving a main power supply module state signal, a standby power supply module state signal and a server standby power supply signal;
and recording the main power supply module state signal, the standby power supply module state signal and the server standby power supply signal into a log.
In an alternative embodiment, the method further comprises the steps of:
receiving a server status signal and a power supply switching signal;
responding to the downtime of the server when the power supply is switched, analyzing the fault reason according to the power supply module state signal and the server standby power supply signal, and recording the fault reason to a log, wherein the method specifically comprises the following steps:
responding to the server downtime when the main power supply module is switched to the standby power supply module, and judging that the power supply environment of the standby power supply module is abnormal if the state signal of the standby power supply module and the standby power supply signal of the server are abnormal; if the standby power supply state signal and the server standby power supply signal are normal, judging that the reliability of the redundant logic of the server is abnormal;
responding to the server downtime when the standby power supply module is switched to the main power supply module, and judging that the power supply environment of the main power supply module is abnormal if the state signal of the main power supply module and the standby power supply signal of the server are abnormal; and if the main power supply state signal and the server standby power supply signal are normal, judging that the reliability of the redundant logic of the server is abnormal.
Compared with the prior art, the device and the method for detecting the state of the power supply module of the server have the following beneficial effects: the method comprises the steps of collecting state signals of a power module and standby power supply signals of a server in real time, recording the collected signals to a log for maintenance personnel to check, analyzing fault reasons by the maintenance personnel according to the state signals of the power module and the standby power supply signals of the server, providing data support for determining the fault reasons and subsequent maintenance, and reducing the probability of subsequent downtime fault.
Drawings
For a clearer description of embodiments of the present application or of the prior art, the drawings that are used in the description of the embodiments or of the prior art will be briefly described, it being apparent that the drawings in the description that follow are only some embodiments of the present application, and that other drawings may be obtained from these drawings by a person of ordinary skill in the art without inventive effort.
Fig. 1 is a schematic diagram of a current server power architecture.
Fig. 2 is a schematic diagram of an application scenario of a server power module state detection apparatus according to an embodiment of the present invention.
Fig. 3 is a second schematic application scenario of the server power module state detection device provided in the embodiment of the present invention.
Fig. 4 is a schematic structural diagram of a server power module state detection device according to an embodiment of the present invention.
Fig. 5 is a schematic structural diagram of a server power module state detection device according to an embodiment of the present invention.
Fig. 6 is a schematic structural diagram of a specific embodiment of a server power module status detection device according to an embodiment of the present invention.
Fig. 7 is a flowchart of a method for detecting a state of a power module of a server according to an embodiment of the present invention.
Fig. 8 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
Detailed Description
In order to provide a better understanding of the present application, those skilled in the art will now make further details of the present application with reference to the drawings and detailed description. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
The following explains key terms appearing in the present invention.
CPLD Complex Programmable Logic Device, programmable logic controller.
PSU Power supply unit, power supply module.
P12v_stby: AC power input by the server power supply refers to 220V AC power input by the server power supply input. 220V alternating current is connected to a power input end of the server through a power line (namely an AC line). 220V alternating current is converted into two direct currents of P12V_main and P12V_STBY in the server power supply. Both of these are 12V voltages, except that the current provided by p12v_main is larger, and in addition p12v_main needs to receive a power enable signal to be converted into system mean power to output voltage to other chips or components. And the current provided by the P12V_STBY is smaller, and the P12V_STBY automatically outputs voltage after being inserted into an AC line without power supply enabling signal control. The 12V STBY is typically converted to p3v3_stby or other small voltage to power components such as CPLD for earlier chip use in the board card. After the server is started, the P12V_main is converted into other system main voltages to be used for chips or parts which work after the starting in the board card, such as a CPU, a memory, a hard disk and a fan lamp.
Fig. 2 is a schematic diagram of an application scenario of a server power module state detection device provided in an embodiment of the present invention, in the application scenario one, a server power module state detection device is configured in each server separately, the server power module state detection device may be configured in a board card manner in the server, the server power module state detection device collects signals such as a power module state of a server where the server power module state detection device is located, and the generated log is sent to a management end separately, and maintenance personnel performs fault analysis according to the log of each server.
Fig. 3 is a schematic diagram of an application scenario two of the server power module state detection device provided by the embodiment of the present invention, where in the application scenario two, one server power module state detection device may collect signals such as power module states of multiple servers, and the server power module state detection device operates in a separate computer device or operates as a separate terminal. Of course, when information is collected, identity identification is performed on each server, the collected signals such as power module states of each server correspond to the server identification to generate logs, and the server power state detection device sends the logs generated by the power module states of all servers to the management end.
Fig. 4 is a schematic structural diagram of a device for detecting a state of a power module of a server according to an embodiment of the present invention, as shown in fig. 4, where the device includes: the power module state collector and the detection manager.
The server main board is respectively connected with a main power supply module and a standby power supply module, wherein the main power supply module is powered by mains supply, and the standby power supply module is powered by uninterrupted power supply. Under normal conditions, the main power supply module powered by the mains supply supplies power to the server, and when the mains supply fails, the standby power supply module powered by the uninterruptible power supply is switched to supply power to the server. If the server is down during switching, the condition that the uninterrupted power supply environment is abnormal or the reliability of the redundant logic of the server is abnormal is indicated.
Power module state collector: the system comprises a server main board, a detection manager and a standby power supply module, wherein the server main board is connected with the standby power supply module, and is configured to acquire a state signal of the standby power supply module and a standby power supply signal of the server from the server main board and send the state signal and the standby power supply signal to the detection manager.
Detection manager: the system comprises a power module state collector, a standby power module state signal and a server standby power supply signal, wherein the power module state collector is connected with the standby power module state signal and the server standby power supply signal, and the standby power module state signal and the server standby power supply signal are recorded to generate a detection log.
If the power supply environment of the standby power supply module is normal, namely, the uninterrupted power supply is normal, the state signal of the standby power supply module and the standby power supply signal of the server are both in a normal state, no abnormal phenomenon occurs, and if a downtime fault occurs at the moment, the server redundant logic reliability is abnormal. If the power supply environment of the standby power supply module is abnormal, that is, when the uninterruptible power supply is abnormal, the state signal of the standby power supply module and the standby power supply signal of the server are abnormal, and at the moment, a downtime fault occurs, and the situation that the power supply environment of the standby power supply module is abnormal can be judged. The maintenance personnel can check the state signal of the standby power supply module and the standby power supply signal of the server in the log, and the fault cause can be determined according to the state of the state signal of the standby power supply module and the state of the standby power supply signal of the server.
In an alternative embodiment, the standby power module status signal may be selected from an output voltage normal indication signal and a power control enable signal.
In the normal state of the uninterruptible power supply, the output voltage normal indication signal and the power supply control enabling signal of the standby power supply module and the standby power supply signal of the server are high levels; in the abnormal state of the uninterruptible power supply, the output voltage normal indication signal and the power supply control enabling signal of the standby power supply module and the standby power supply signal of the server are low levels.
And the maintainer checks the high-low level signals in the log to determine the fault reason, and when the output voltage normal indication signal, the power control enabling signal and the server standby power supply signal are all high levels, the state signal of the power supply module is judged to be normal, the server standby power supply signal is judged to be normal, and then the uninterrupted power supply environment is judged to be normal, wherein the fault reason is that the reliability of the redundant logic of the server is abnormal. When the output voltage normal indication signal, the power control enabling signal and the server standby power supply signal are all low-level, the state signal abnormality of the power supply module is judged, the server standby power supply signal abnormality is judged, and then the uninterrupted power supply environment abnormality is judged, wherein the fault reason is the uninterrupted power supply environment abnormality.
In an alternative embodiment, the power module state harvester employs a programmable logic device.
The device is also provided with a main connector, and a secondary connector is arranged on the server main board; the power module state collector is connected with the auxiliary connector through the main connector. The server main board sends the standby power module state signal and the server standby power supply signal to the power module state collector through the connector.
In an alternative embodiment, the detection manager is further connected to the server motherboard, and may be in a wireless communication manner, configured to collect a server status signal and a power switch signal, determine whether the server is down according to the server status signal, respond to the server down when the main power module is switched to the standby power module, analyze a failure cause according to the standby power module status signal and the server standby power signal, and record the failure cause to a log.
In this embodiment, the device automatically performs logical judgment, judges the cause of the failure, and provides data to maintenance personnel. Specifically, whether the server is down or not can be judged according to the server state signal, whether power switching operation is performed or not can be judged according to the power switching signal, the server is down when the main power module is switched to the standby power module, and if the standby power module state signal and the server standby power supply signal are abnormal, the standby power module power supply environment is judged to be abnormal; and if the standby power supply state signal and the server standby power supply signal are normal, judging that the reliability of the redundant logic of the server is abnormal.
In an alternative embodiment, the device further comprises a battery module, and the battery module is respectively connected with the power module state collector and the detection manager and supplies power to the power module state collector and the detection manager.
Under the application scene, the battery module independently supplies power to the device, and when the server is in downtime abnormality, the power module state signal and the like can be continuously monitored. The battery module can be a rechargeable battery, is connected with the main connector, is charged by the server main board, and realizes power collection from the server main board through connection with the main connector. In addition, a power supply switcher can be arranged, one path of the input end of the power supply switcher is connected with the main connector, and the other path of the input end of the power supply switcher is connected with the battery module; the output end of the power supply switcher is connected with the power supply module state collector and the detection manager respectively; the controller of the power supply switcher is connected with the detection manager. When the server is normal, the server main board supplies power to the device, and when the server is abnormal, the device is switched to the battery module to supply power to the device.
Fig. 5 is a schematic structural diagram of a device for detecting a state of a power module of a server according to an embodiment of the present invention, as shown in fig. 5, where the device includes: the power module state collector and the detection manager.
Power module state collector: the system comprises a server main board, a detection manager, a standby power supply module, a server main board, a standby power supply module and a detection manager, wherein the server main board is connected with the server main board, and is configured to collect a state signal of the main power supply module, a state signal of the standby power supply module and a standby power supply signal of the server from the server main board and send the state signal and the standby power supply signal to the server.
Detection manager: the system comprises a power module state collector, a standby power module state signal acquisition unit, a server standby power supply signal acquisition unit and a detection log, wherein the power module state collector is connected with the main power module state signal acquisition unit and is configured to receive the main power module state signal, the standby power module state signal and the server standby power supply signal sent by the power module state collector, and record the main power module state signal, the standby power module state signal and the server standby power supply signal to generate the detection log.
The device in this embodiment detects the status signal of the main power module, the status signal of the standby power module, and the standby power supply signal of the server at the same time, and analyzes the failure cause according to the status signal of the standby power module and the standby power supply signal of the server when the main power module is switched to the standby power module, which are described in detail in the above embodiments and are not repeated herein. And when the standby power supply module is switched to the main power supply module, analyzing the fault cause according to the main power supply module state signal and the server standby power supply signal.
If the power supply environment of the main power supply module is normal, namely, the mains supply is normal, the state signal of the main power supply module and the standby power supply signal of the server are both in a normal state, no abnormal phenomenon occurs, and if a downtime fault occurs at the moment, the server redundant logic reliability is abnormal. If the power supply environment of the main power supply module is abnormal, namely, the mains supply is abnormal, the state signal of the main power supply module and the standby power supply signal of the server are abnormal, and at the moment, a downtime fault occurs, so that the power supply environment of the main power supply module can be judged to be abnormal. The maintenance personnel can look up the state signal of the main power supply module and the standby power supply signal of the server in the log, and the fault cause can be determined according to the state of the state signal of the main power supply module and the standby power supply signal of the server.
In an alternative embodiment, the main power module status signal may be selected from an output voltage normal indication signal and a power control enable signal.
In a normal state of the mains supply, the output voltage normal indication signal and the power control enabling signal of the main power supply module and the standby power supply signal of the server are high levels; in the abnormal state of the mains supply, the output voltage normal indication signal and the power control enabling signal of the main power supply module and the standby power supply signal of the server are low levels.
And the maintainer checks the high-low level signals in the log to determine the fault reason, and when the output voltage normal indication signal, the power control enabling signal and the server standby power supply signal are all high levels, the state signal of the power supply module is judged to be normal, the server standby power supply signal is judged to be normal, and then the commercial power supply environment is judged to be normal, wherein the fault reason is that the reliability of the redundant logic of the server is abnormal. When the output voltage normal indication signal, the power control enabling signal and the server standby power supply signal are all low-level, the state signal abnormality of the power supply module is judged, the server standby power supply signal abnormality is judged, and then the mains supply power supply environment abnormality is judged, wherein the fault reason is the mains supply power supply environment abnormality.
In an alternative embodiment, the detection manager is further connected to the server motherboard, and may be in a wireless communication manner, configured to collect a server status signal and a power switch signal, determine whether the server is down according to the server status signal, respond to the server down when the main power module is switched to the standby power module, analyze a failure cause according to the standby power module status signal and the server standby power signal, and record the failure cause to a log.
In this embodiment, the device automatically performs logical judgment, judges the cause of the failure, and provides data to maintenance personnel. Specifically, whether the server is down or not can be judged according to the server state signal, whether power switching operation is performed or not can be judged according to the power switching signal, the server is down when the standby power supply module is switched to the main power supply module, and if the main power supply module state signal and the server standby power supply signal are abnormal, the power supply environment of the main power supply module is abnormal; and if the main power supply state signal and the server standby power supply signal are normal, judging that the reliability of the redundant logic of the server is abnormal.
In an alternative embodiment, the detection manager is further connected to the server motherboard, and may be configured in a wireless communication manner, to collect a server status signal and a power switch signal, determine whether the server is down according to the server status signal, respond to the server down when the standby power module is switched to the main power module, analyze a failure cause according to the main power module status signal and the server standby power signal, and record the failure cause to a log.
In this embodiment, the device automatically performs logical judgment, judges the cause of the failure, and provides data to maintenance personnel. Specifically, whether the server is down or not can be judged according to the server state signal, whether power switching operation is performed or not can be judged according to the power switching signal, the server is down when the standby power supply module is switched to the main power supply module, and if the main power supply module state signal and the server standby power supply signal are abnormal, the power supply environment of the main power supply module is abnormal; and if the main power supply state signal and the server standby power supply signal are normal, judging that the reliability of the redundant logic of the server is abnormal.
In an alternative embodiment, the power module state harvester employs a programmable logic device. The device is also provided with a main connector, and a secondary connector is arranged on the server main board; the power module state collector is connected with the auxiliary connector through the main connector. The server main board sends the standby power module state signal and the server standby power supply signal to the power module state collector through the connector.
In an alternative embodiment, the device further comprises a battery module, and the battery module is respectively connected with the power module state collector and the detection manager and supplies power to the power module state collector and the detection manager.
For further understanding of the present invention, a detailed description of the present invention is provided below, and fig. 6 is a schematic diagram of the structure of the detailed embodiment.
As shown in fig. 6, the device of this particular embodiment includes a CPLD, a detection manager, a battery module, and a main connector. The server main board is provided with a secondary connector, and is respectively connected with a first PSU and a second PSU, wherein the first PSU is powered by mains supply, and the second PSU is powered by UPS.
The main connector is connected with the auxiliary connector, the CPLD is connected with the detection manager and the main connector respectively, the battery module is connected with the CPLD and the detection manager respectively, and the server main board charges the battery module.
The CPLD monitors the PSU1_POWER_GOOD, PSU1_EN, PSU2_POWER_GOOD, PSU2_EN signal states, and P12V_STBY signal states of the first PSU and the second PSU.
Working principle: when a user is in a power-off test (or in a power supply abnormality scene, one power supply is suddenly abnormal), and one power supply test is closed (the AC mains supply is closed), the CPLD can monitor a plurality of signal states of the two PSUs. For example, after one power supply is disconnected, the other power supply is immediately cut off according to the redundant mode of the power supply, and the server is not down; however, in the abnormal situation, after one power is lost, the other power is not switched in time, so that the server is in large-area downtime; at the moment, the signal states of the two PSUs are monitored to clearly confirm whether the signal states are problems at the server side or factors of the on-site power supply environment of the machine room. Under normal conditions, the states monitored by the signals CPLD, namely PSU1_POWER_GOOD, PSU1_EN, PSU2_POWER_GOOD, PSU2_EN and P12V_STBY, are all high-level states, if any signal is abnormal, the CPLD informs the detection manager, the detection manager generates a log, and the detection manager can alarm through a buzzer. When the psu1_power_good, psu1_en, psu2_power_good, psu2_en, p12v_stby signals are always in a normal state, such as after AC mains is cut off, the server is down, and at this time, if psu2_power_good, psu2_en, p12v_stby signals of psu2 are all pulled low, this indicates that the customer POWER supply environment is problematic. If the PSU2_POWER_GOOD, PSU2_EN and P12V_STBY signals are unchanged, but are down, the problem of the reliability of the redundant logic of the server is solved.
The embodiment of the invention provides a server power module state detection device, and the embodiment of the invention also provides a server power module state detection method corresponding to the device based on the server power module state detection device described in the embodiment.
Fig. 7 is a flowchart of a method for detecting a state of a power module of a server according to an embodiment of the present invention, as shown in fig. 7, the method includes the following steps.
S1, receiving a main power supply module state signal, a standby power supply module state signal and a server standby power supply signal.
S2, recording the main power supply module state signal, the standby power supply module state signal and the server standby power supply signal into a log.
The maintainer can analyze the fault reasons according to the signals recorded in the log. For example, when the main power supply module is switched to the standby power supply module, if the power supply environment of the standby power supply module is normal, that is, the uninterrupted power supply is normal, the state signal of the standby power supply module and the standby power supply signal of the server are both in a normal state, no abnormal phenomenon occurs, and if a downtime fault occurs at this time, the server redundant logic reliability is abnormal. If the power supply environment of the standby power supply module is abnormal, that is, when the uninterruptible power supply is abnormal, the state signal of the standby power supply module and the standby power supply signal of the server are abnormal, and at the moment, a downtime fault occurs, and the situation that the power supply environment of the standby power supply module is abnormal can be judged. When the standby power supply module is switched to the main power supply module, if the power supply environment of the main power supply module is normal, namely, the mains supply is normal, the state signal of the main power supply module and the standby power supply signal of the server are both in a normal state, no abnormal phenomenon occurs, and if a downtime fault occurs at the moment, the server redundancy logic reliability is abnormal. If the power supply environment of the main power supply module is abnormal, namely, the mains supply is abnormal, the state signal of the main power supply module and the standby power supply signal of the server are abnormal, and at the moment, a downtime fault occurs, so that the power supply environment of the main power supply module can be judged to be abnormal. The maintenance personnel can look up the state signal of the main power supply module and the standby power supply signal of the server in the log, and the fault cause can be determined according to the state of the state signal of the main power supply module and the standby power supply signal of the server.
In an alternative embodiment, the fault cause may be automatically determined, and specifically includes the following steps.
S1, receiving a main power supply module state signal, a standby power supply module state signal and a server standby power supply signal.
S2, receiving a server state signal and a power supply switching signal.
And S3, recording the main power supply module state signal, the standby power supply module state signal and the server standby power supply signal into a log.
And S4, responding to the downtime of the server during power supply switching, analyzing the fault reason according to the power supply module state signal and the server standby power supply signal, and recording the fault reason to a log.
The step S4 specifically includes: responding to the server downtime when the main power supply module is switched to the standby power supply module, and judging that the power supply environment of the standby power supply module is abnormal if the state signal of the standby power supply module and the standby power supply signal of the server are abnormal; and if the standby power supply state signal and the server standby power supply signal are normal, judging that the reliability of the redundant logic of the server is abnormal. Responding to the server downtime when the standby power supply module is switched to the main power supply module, and judging that the power supply environment of the main power supply module is abnormal if the state signal of the main power supply module and the standby power supply signal of the server are abnormal; and if the main power supply state signal and the server standby power supply signal are normal, judging that the reliability of the redundant logic of the server is abnormal.
The server power module state detection method of the present embodiment is implemented based on the foregoing server power module state detection device, so that the specific implementation of the method can be seen from the foregoing example portion of the server power module state detection device, and therefore, the specific implementation of the method can be referred to the description of the corresponding examples of the respective portions, and will not be further described herein.
In addition, since the method for detecting the state of the server power module in this embodiment is implemented based on the foregoing device for detecting the state of the server power module, the function of the method corresponds to that of the foregoing device, and will not be described herein again.
Fig. 8 is a schematic structural diagram of a terminal 800 according to an embodiment of the present invention, including: processor 810, memory 820, and communication unit 830. The processor 810 is configured to implement the following steps when implementing the server power module status detection program stored in the memory 820:
receiving a main power supply module state signal, a standby power supply module state signal and a server standby power supply signal;
and recording the main power supply module state signal, the standby power supply module state signal and the server standby power supply signal into a log.
According to the invention, the state signal of the power supply module and the standby power supply signal of the server are collected in real time, the collected signals are recorded into the log for maintenance personnel to check, the maintenance personnel can analyze the fault reason according to the state signal of the power supply module and the standby power supply signal of the server, data support is provided for determining the fault reason and subsequent maintenance, and the probability of subsequent downtime fault is reduced.
The terminal 800 includes a processor 810, a memory 820, and a communication unit 830. The components may communicate via one or more buses, and it will be appreciated by those skilled in the art that the configuration of the server as shown in the drawings is not limiting of the invention, as it may be a bus-like structure, a star-like structure, or include more or fewer components than shown, or may be a combination of certain components or a different arrangement of components.
The memory 820 may be implemented by any type of volatile or non-volatile memory terminal or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, or optical disk, among other things, for storing instructions for execution by the processor 810. The execution of the instructions in memory 820, when executed by processor 810, enables terminal 800 to perform some or all of the steps in the method embodiments described below.
The processor 810 is a control center of the storage terminal, connects various parts of the entire electronic terminal using various interfaces and lines, and performs various functions of the electronic terminal and/or processes data by running or executing software programs and/or modules stored in the memory 820, and invoking data stored in the memory. The processor may be comprised of an integrated circuit (Integrated Circuit, simply referred to as an IC), for example, a single packaged IC, or may be comprised of a plurality of packaged ICs connected to the same function or different functions. For example, the processor 810 may include only a central processing unit (Central Processing Unit, simply CPU). In the embodiment of the invention, the CPU can be a single operation core or can comprise multiple operation cores.
A communication unit 830, configured to establish a communication channel, so that the storage terminal may communicate with other terminals. Receiving user data sent by other terminals or sending the user data to other terminals.
The invention also provides a computer storage medium, which can be a magnetic disk, an optical disk, a read-only memory (ROM) or a random access memory (random access memory, RAM) and the like.
The computer storage medium stores a server power module state detection program which when executed by the processor performs the steps of:
receiving a main power supply module state signal, a standby power supply module state signal and a server standby power supply signal;
and recording the main power supply module state signal, the standby power supply module state signal and the server standby power supply signal into a log.
According to the invention, the state signal of the power supply module and the standby power supply signal of the server are collected in real time, the collected signals are recorded into the log for maintenance personnel to check, the maintenance personnel can analyze the fault reason according to the state signal of the power supply module and the standby power supply signal of the server, data support is provided for determining the fault reason and subsequent maintenance, and the probability of subsequent downtime fault is reduced.
It will be apparent to those skilled in the art that the techniques of embodiments of the present invention may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solution in the embodiments of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium such as a U-disc, a mobile hard disc, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, etc. various media capable of storing program codes, including several instructions for causing a computer terminal (which may be a personal computer, a server, or a second terminal, a network terminal, etc.) to execute all or part of the steps of the method described in the embodiments of the present invention.
In the several embodiments provided by the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The foregoing disclosure is merely illustrative of the preferred embodiments of the invention and the invention is not limited thereto, since modifications and variations may be made by those skilled in the art without departing from the principles of the invention.

Claims (10)

1. A server power module state detection apparatus, comprising: a power module state collector and a detection manager;
the power module state collector comprises: the system comprises a server main board, a detection manager and a standby power supply module, wherein the server main board is connected with the standby power supply module, and is configured to acquire a state signal of the standby power supply module and a standby power supply signal of the server from the server main board and send the state signal and the standby power supply signal to the detection manager;
the detection manager: the system comprises a power module state collector, a standby power module state signal and a server standby power supply signal, wherein the power module state collector is connected with the standby power module state signal and the server standby power supply signal, and the standby power module state signal and the server standby power supply signal are recorded to generate a detection log.
2. The server power module state detection device according to claim 1, wherein the detection manager is further connected to a server motherboard, and is configured to collect a server state signal and a power switching signal, determine whether a server is down according to the server state signal, respond to the server down when a main power module is switched to a standby power module, analyze a fault cause according to the standby power module state signal and a server standby power supply signal, and record the fault cause to a log;
the fault cause analysis method specifically comprises the following steps of: if the state signal of the standby power supply module and the standby power supply signal of the server are abnormal, judging that the power supply environment of the standby power supply module is abnormal; and if the standby power supply state signal and the server standby power supply signal are normal, judging that the reliability of the redundant logic of the server is abnormal.
3. The server power module state detection apparatus according to claim 1 or 2, wherein the power module state collector is further configured to collect a state signal of the main power module from the server motherboard and send the state signal to the detection manager;
correspondingly, the detection manager is further configured to receive the main power module status signal sent by the power module status collector, and record the main power module status signal to a detection log.
4. The server power module status detection apparatus according to claim 3, wherein the detection manager is further configured to analyze a cause of a fault from the main power module status signal and a server standby power signal and record the cause of the fault to a log in response to a server downtime when the standby power module is switched to the main power module;
the method specifically comprises the steps of analyzing fault reasons according to the state signals of the main power supply module and the standby power supply signals of the server, wherein the fault reasons specifically comprise: if the state signal of the main power supply module and the standby power supply signal of the server are abnormal, judging that the power supply environment of the main power supply module is abnormal; and if the main power supply state signal and the server standby power supply signal are normal, judging that the reliability of the redundant logic of the server is abnormal.
5. The server power module state detection apparatus according to claim 4, wherein the state signal of the power module includes an output voltage normal indication signal and a power control enable signal; responding to the output voltage normal indication signal and the power supply control enabling signal to be in a high level, and judging that the state signal of the power supply module is normal; in response to the output voltage normal indication signal and the power control enable signal being low level, determining that the status signal of the power supply module is abnormal;
responding to the server standby power supply signal to be high level, and judging that the server standby power supply signal is normal; and judging that the standby power supply signal of the server is abnormal in response to the standby power supply signal of the server being at a low level.
6. The server power module state detection apparatus of claim 5, wherein the power module state collector employs a programmable logic device.
7. The server power module state detection apparatus according to claim 6, further comprising a main connector, wherein a sub connector is provided on the server motherboard; the power module state collector is connected with the auxiliary connector through the main connector.
8. The server power module state detection apparatus of claim 7, further comprising a battery module, wherein the battery module is connected to the power module state collector and the detection manager, respectively, and supplies power to the power module state collector and the detection manager.
9. The method for detecting the state of the power supply module of the server is characterized by comprising the following steps:
receiving a main power supply module state signal, a standby power supply module state signal and a server standby power supply signal;
and recording the main power supply module state signal, the standby power supply module state signal and the server standby power supply signal to a log.
10. The method for detecting the status of a server power module according to claim 9, further comprising the steps of:
receiving a server status signal and a power supply switching signal;
responding to the downtime of the server when the power supply is switched, analyzing the fault reason according to the power supply module state signal and the server standby power supply signal, and recording the fault reason to a log, wherein the method specifically comprises the following steps:
responding to the server downtime when the main power supply module is switched to the standby power supply module, and judging that the power supply environment of the standby power supply module is abnormal if the state signal of the standby power supply module and the standby power supply signal of the server are abnormal; if the standby power supply state signal and the server standby power supply signal are normal, judging that the reliability of the redundant logic of the server is abnormal;
responding to the server downtime when the standby power supply module is switched to the main power supply module, and judging that the power supply environment of the main power supply module is abnormal if the state signal of the main power supply module and the standby power supply signal of the server are abnormal; and if the main power supply state signal and the server standby power supply signal are normal, judging that the reliability of the redundant logic of the server is abnormal.
CN202310330277.XA 2023-03-30 2023-03-30 Device and method for detecting state of power module of server Pending CN116301276A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310330277.XA CN116301276A (en) 2023-03-30 2023-03-30 Device and method for detecting state of power module of server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310330277.XA CN116301276A (en) 2023-03-30 2023-03-30 Device and method for detecting state of power module of server

Publications (1)

Publication Number Publication Date
CN116301276A true CN116301276A (en) 2023-06-23

Family

ID=86828665

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310330277.XA Pending CN116301276A (en) 2023-03-30 2023-03-30 Device and method for detecting state of power module of server

Country Status (1)

Country Link
CN (1) CN116301276A (en)

Similar Documents

Publication Publication Date Title
JP3831377B2 (en) Method and apparatus for analyzing power failure in a computer system
US8185753B2 (en) Storage medium for storing power consumption monitor program, power consumption monitor apparatus and power consumption monitor method
US7831860B2 (en) System and method for testing redundancy and hot-swapping capability of a redundant power supply
US20040010649A1 (en) User-configurable power architecture with hot-pluggable power modules
TW201305806A (en) Rack server system and operation method thereof
US7045914B2 (en) System and method for automatically providing continuous power supply via standby uninterrupted power supplies
CN112114989B (en) Fault diagnosis design method for server system
CN111865695A (en) Method and system for automatic fault handling in cloud environment
CN109188247A (en) A kind of electronic system abnormal state detection system and method
CN108762886A (en) The fault detect restoration methods and system of virtual machine
CN113608930B (en) System chip and electronic device
CN111488050B (en) Power supply monitoring method, system and server
CN114051583A (en) Standby bus power distribution system testing
CN116301276A (en) Device and method for detecting state of power module of server
CN111176878A (en) Server BBU (building base band Unit) standby power diagnosis method, system, terminal and storage medium
CN115728665A (en) Power failure detection circuit, method and system
CN114816022A (en) Server power supply abnormity monitoring method, system and storage medium
CN113885689A (en) Power supply control method, system, terminal and storage medium for whole cabinet server
CN113419618A (en) Server decoding card power-off control method, system, terminal and storage medium
CN112463516A (en) Method and system for collecting and verifying integrity of BMC log
CN107907762B (en) Test method and system for detecting influence of BBU on stability of whole cabinet
CN112003727A (en) Multi-node server power supply testing method, system, terminal and storage medium
CN111966548B (en) Fault detection method and system for slow startup of server
CN114328044B (en) AIC+box topology testing method, device and system
CN116204502B (en) NAS storage service method and system with high availability

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination