CN115237719B - A method and system for early warning of server power supply reliability - Google Patents

A method and system for early warning of server power supply reliability

Info

Publication number
CN115237719B
CN115237719B CN202210902093.1A CN202210902093A CN115237719B CN 115237719 B CN115237719 B CN 115237719B CN 202210902093 A CN202210902093 A CN 202210902093A CN 115237719 B CN115237719 B CN 115237719B
Authority
CN
China
Prior art keywords
power supply
server
power
information
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210902093.1A
Other languages
Chinese (zh)
Other versions
CN115237719A (en
Inventor
刘坤
张鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Suzhou Metabrain Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Suzhou Metabrain Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd, Suzhou Metabrain Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202210902093.1A priority Critical patent/CN115237719B/en
Publication of CN115237719A publication Critical patent/CN115237719A/en
Application granted granted Critical
Publication of CN115237719B publication Critical patent/CN115237719B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Power Sources (AREA)
  • Remote Monitoring And Control Of Power-Distribution Networks (AREA)

Abstract

本发明提供了一种服务器电源可靠性的预警方法及系统,方法包括分别对电源的状态信息、电源各部件特征参数信息和电源运行的指标数据进行监控;将监控到的异常信息与预设的异常值比较,得到对应的风险等级;基于风险等级对应预设应对策略,应对策略包括现场巡检,结合现场巡检结果及风险等级,进行服务器电源的预警提示,所述现场巡检用于获取服务器外部环境信息。本发明设置了对服务器电源的多方位监控,包括状态信息、指标数据和特征参数,对其进行轮询监控,且对于BMC监控的状态信息,通过CPLD验证,得到准确的电源状态信息,避免了现有BMC单一方式监控出现误告警的状况,保证预警的准确性。

This invention provides a method and system for early warning of server power supply reliability. The method includes monitoring the power supply's status information, characteristic parameter information of each component, and operational index data; comparing the monitored abnormal information with preset abnormal values to obtain the corresponding risk level; and based on the risk level, implementing a preset response strategy, including on-site inspection. The on-site inspection results, combined with the risk level, trigger an early warning for the server power supply. The on-site inspection is used to obtain information about the server's external environment. This invention establishes multi-dimensional monitoring of the server power supply, including status information, index data, and characteristic parameters, and performs polling monitoring. Furthermore, the status information monitored by the BMC (Power Management System) is verified using a CPLD (Content Management Device) to obtain accurate power supply status information, avoiding false alarms that can occur with existing BMC-based single-method monitoring and ensuring the accuracy of the early warning.

Description

Early warning method and system for reliability of server power supply
Technical Field
The invention relates to the technical field of server power management, in particular to a method and a system for early warning of server power reliability.
Background
With rapid popularization and development of the Internet, cloud computing technology has made great progress, cloud data centers are built in each place successively, and cloud service products have gradually moved into daily life of people. The daily life of people relies on network communication more, and servers serving as network centers become increasingly important, with the great use of the servers, the number and the scale of the servers in the data center machine room are gradually increased, so that the data security of the servers in the data center machine room is ensured, the power supply stability of the servers in the data center machine room is particularly important, and the power supply of the servers is the most important power supply module of the servers.
At present, a rack type storage mode is generally adopted for server loading of a data center machine room, a plurality of servers are placed in each group of racks, the loading density is high, and each server is independently powered by at least 2 redundant power supplies and works. In order to cope with long-time uninterrupted operation demands and complex front-end data processing working conditions of a server, a server power supply is required to have higher reliability. If the power supply of the server is powered down due to the internal software and hardware faults of the power supply or the external complex working condition, the server is possibly powered down due to the fact that redundancy disappears and power supply interruption of the power supply, and potential safety hazards are brought to customer data.
In actual working of the machine room server, a server BMC (Baseboard Management controller) is mainly used for monitoring the working state of a power supply, for example, the working state of the power supply of the server is alarmed, the BMC can record the alarm content and transmit the alarm content to a front-end monitoring interface through network cable communication by a BMC communication port, and a machine room maintainer judges the failure cause by analyzing a BMC feedback log. However, the monitoring mode is only suitable for power replacement maintenance after failure, and the risk of power failure of the server power supply cannot be avoided by pre-judging the power failure in advance, so that the operation continuity and safety of the server are greatly affected.
Disclosure of Invention
The invention provides a method and a system for early warning of the reliability of a server power supply, which are used for solving the problem that the stability of the server power supply is affected due to the fact that an accurate fault early warning is lacked in the existing server power supply monitoring strategy.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
The first aspect of the invention provides a method for early warning of the power supply reliability of a server, which comprises the following steps:
Monitoring state information of the power supply, characteristic parameter information of each part of the power supply and index data of power supply operation respectively;
Comparing the monitored abnormal information with a preset abnormal value to obtain a corresponding risk level;
and based on the corresponding preset coping strategy of the risk level, the coping strategy comprises on-site inspection, and the early warning prompt of the server power supply is carried out by combining the on-site inspection result and the risk level, wherein the on-site inspection is used for acquiring the external environment information of the server.
Further, the method comprises the steps of:
and positioning and maintaining a fault point through the fault maintenance robot for the server power supply sending the early warning prompt.
Further, the state information includes temperature information of the power supply, a power supply output overcurrent signal and an overvoltage signal.
Further, the monitoring of the state information specifically includes:
and responding to the state information abnormal alarm reported by the baseboard management controller, calling the complex programmable logic device to acquire the state information of an alarm item corresponding to the current abnormal alarm, comparing the state information with the state information acquired by the baseboard management controller, and forming abnormal information if the comparison result is consistent.
Further, the power supply component for monitoring the characteristic parameter information comprises a power factor correction feedback circuit, a diode circuit, a communication optocoupler, driving chips of each path and a standby control circuit.
Further, the monitoring of the characteristic parameter information specifically includes:
The complex programmable logic device polls the characteristic parameter information in a server power supply register, and the characteristic parameter information is collected in real time through a sensor;
comparing the characteristic parameter information with a preset value, recording the occurrence times of the abnormal information, marking the occurrence times according to a preset rule, and taking the marked value as the abnormal information.
Further, the index data includes output power consumption of the power supply, output current, and voltage value of the output signal.
Further, the monitoring of the index data specifically includes:
The complex programmable logic device polls index data in a power register of the server, and the index data is acquired and/or calculated in real time through a power chip;
comparing the index data with a preset value, recording the occurrence times of the abnormal information, marking the occurrence times according to a preset rule, and taking the marked value as the abnormal information.
The second aspect of the present invention provides a server power reliability early warning system, the system comprising:
The power supply online monitoring module is used for respectively monitoring state information of the power supply, characteristic parameter information of each part of the power supply and index data of power supply operation;
the reliability early warning module is used for comparing the monitored abnormal information with a preset abnormal value to obtain a corresponding risk level;
The data center machine room control module is used for carrying out early warning prompt on a server power supply based on the risk level corresponding to a preset coping strategy, wherein the coping strategy comprises on-site inspection, and the on-site inspection is used for acquiring external environment information of the server in combination with an on-site inspection result and the risk level.
Further, the system also comprises a server power supply maintenance module, wherein the server power supply maintenance module is used for positioning a fault point through the fault maintenance robot and maintaining the fault point according to a preset strategy for the server power supply sending the early warning prompt.
The early warning system for the power supply reliability of the server according to the second aspect of the present invention can implement the methods in the first aspect and the implementations of the first aspect, and achieve the same effects.
The effects provided in the summary of the invention are merely effects of embodiments, not all effects of the invention, and one of the above technical solutions has the following advantages or beneficial effects:
the invention sets multi-azimuth monitoring of the server power supply, comprising state information, index data and characteristic parameters, carries out polling monitoring on the state information, and obtains accurate power supply state information through CPLD verification on the state information monitored by the BMC, thereby avoiding the situation of false alarm in the existing BMC single mode monitoring and ensuring the accuracy of early warning. The early warning power supply is positioned and overhauled through the machine room robot, so that personnel are prevented from entering the machine room, the influence of the environment of the machine room is avoided, and the labor cost is saved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a schematic flow chart of an embodiment of the method of the present invention;
fig. 2 is a schematic diagram of an embodiment of the system of the present invention.
Detailed Description
In order to clearly illustrate the technical features of the present solution, the present invention will be described in detail below with reference to the following detailed description and the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different structures of the invention. In order to simplify the present disclosure, components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and processes are omitted so as to not unnecessarily obscure the present invention.
The embodiment of the invention provides a pre-warning method for the reliability of a server power supply, which comprises the following steps:
s1, respectively monitoring state information of a power supply, characteristic parameter information of each part of the power supply and index data of power supply operation;
S2, comparing the monitored abnormal information with a preset abnormal value to obtain a corresponding risk level;
s3, corresponding to a preset coping strategy based on the risk level, wherein the coping strategy comprises on-site inspection, and early warning prompt of a server power supply is carried out by combining an on-site inspection result and the risk level, wherein the on-site inspection is used for acquiring external environment information of the server.
In one implementation manner of the embodiment of the present invention, the method further includes the steps of:
and positioning and maintaining a fault point through the fault maintenance robot for the server power supply sending the early warning prompt.
In step S1, the status information includes temperature information of the power supply, a power supply output overcurrent signal and an overvoltage signal.
The monitoring of the state information specifically comprises the steps that the server BMC polls the state information of the server power supply and compares the state information with a specification value, if the state information of the power supply does not meet the requirement of the specification value, the BMC displays a power supply alarm state, and based on the power supply alarm state, the CPLD reads parameter information of the server alarm power supply and compares the parameter information with alarm information fed back by the BMC. If the comparison results are different, the BMC is issued with a command to read again and feed back until the comparison results are the same. If the fault alarm information is matched with the fault alarm information fed back by the BMC, the fault alarm information is determined to be abnormal information.
The CPLD real-time polls characteristic parameter (real-time voltage value and working state of PFC feedback circuit, PFC OVP detection loop, etc., working temperature and state of key circuit diode, communication optocoupler, isolation driving IC, standby control chip, standby integrated chip) index data of each part of power supply collected in real time by a power supply sensor in a register of the server power supply, and the CPLD records the parameter data of each key part of the power supply, analyzes the key parameter data of each part of the power supply and compares the key parameter data with a specification interval. The power state output is 00 if the polling data is within the standard specification range, 01 if the polling data exceeds the standard specification range, and 10 if the polling data exceeds the standard specification range for 3 times continuously.
The CPLD monitors the output power consumption, output current, output signal voltage (12V, vingood, alert, PG) and other index data of the server power supply in real time, the index data are acquired through a power supply register, the power supply chip acquires and/or calculates the index data in real time and stores the index data in the power supply register, and the CPLD polls and records the parameter data and compares the parameter data with the specification value. In order to ensure the redundancy of the power supply of the whole machine, the output power consumption of the power supply of the server of the machine room is less than 50 percent (standard specification range) of the rated power of the power supply, in order to ensure the stability of the power supply of the whole machine, the output current of 2 power supplies of the server of the machine room is in accordance with the current sharing requirement (standard specification range: less than 20 percent of load, the non-current sharing degree is less than 10 percent, and the non-current sharing degree is more than 20 percent of load and less than 5 percent), and in order to ensure the power supply and communication reliability of the server power supply, the quality of output signals of 12V (standard specification range: 12.0V-12.8V), vingood (standard specification range: 2.4V-3.46V), alert (standard specification range: 2.4V-3.46V) and PG (standard specification range: 2.4V-3.46V) of the server power supply is in the specification range. If the polling data is within the standard specification interval range, the power state output is 00, if the polling data exceeds the standard specification interval range, each power abnormal state output is 01, and if the polling data exceeds the standard specification interval range for 3 times continuously, the power state output is 10. And the server power supply state online monitoring module collects the power supply alarm information and the power supply state output value and transmits the collected information to the server power supply reliability early warning module.
In step S2, the alarm information and the power status output value of each power supply of the server transmitted by the server power status online monitoring module are received and summarized, a total table of power reliability status of the server in the machine room is generated, and the server power is divided into a low risk area (power status output value is 00), a medium risk area (power status output value is less than 10) and a high risk area (power status output value is more than or equal to 10) based on the total table information of power reliability status.
And for the power supply in the risk area, generating a risk power supply list and corresponding alarm information to a data center machine room control module for power supply reliability identification and analysis. And for the power supply of the high risk area, the server power supply reliability early warning module transmits a high risk power supply list and corresponding warning information to the data center machine room control module for power supply overhaul flow analysis.
In step S3, for the risk server power supply, the machine room manager needs to determine whether the risk alarm information of each power supply is an effective alarm to be managed and controlled, and finally evaluates the early warning level of the server power supply. If the final early warning level of the server power supply is in danger and has the field inspection requirement, the accurate position of the fault server component is positioned, a server power supply index acquisition command is issued to the machine room automatic maintenance robot through the wireless transmission technology of the Internet of things, the machine room automatic maintenance robot shoots the working state of the fault power supply, acquires indexes such as videos and odors according to the fault positioning and moving to the position of the fault server, acquired data are transmitted to a machine room control module of the data center, and a machine room manager views and processes the feedback information through a visual interface of the control module.
For the high risk server power supply, a machine room manager judges whether the risk alarm information of each power supply is an effective alarm to be controlled or not, and finally evaluates the early warning level of the server power supply. If the final early warning level of the server power supply is high in risk and the requirement of on-site power supply replacement exists, a machine room manager sends a power supply replacement on-site confirmation instruction to the server power supply automatic maintenance module, the server power supply automatic maintenance module receives the requirement and positions the accurate position of a fault server component, sends a server power supply index acquisition instruction to the machine room automatic maintenance robot, and the machine room automatic maintenance robot shoots the working state of the fault power supply, acquires indexes such as videos and odors according to the fault positioning and moves to the position of the fault server to transmit acquired data to a data center machine room control module. The machine room manager checks the feedback information and finally confirms the maintenance requirement, issues a formal maintenance instruction, and the machine room automatic maintenance robot is positioned and moved to a fault power supply position to complete automatic replacement and re-electrifying operation of the fault power supply through the power line and power supply poking and inserting actions of the mechanical arm.
As shown in fig. 2, the embodiment of the invention also provides a pre-warning system for the power supply reliability of the server, which comprises a power supply on-line monitoring module 1, a reliability pre-warning module 2, a data center machine room control module 3 and a server power supply maintenance module 4.
The power supply online monitoring module 1 is used for respectively monitoring state information of a power supply, characteristic parameter information of each part of the power supply and index data of power supply operation, the reliability early warning module 2 is used for comparing monitored abnormal information with a preset abnormal value to obtain a corresponding risk grade, the data center machine room control module 3 corresponds to a preset coping strategy based on the risk grade, the coping strategy comprises on-site inspection, and early warning prompt of a server power supply is carried out by combining on-site inspection results and the risk grade, and the on-site inspection is used for acquiring external environment information of the server. And the server power supply overhaul module 4 is used for positioning a fault point of the server power supply sending the early warning prompt through the fault overhaul robot and maintaining the fault point according to a preset strategy.
On one hand, the server BMC polls the server power state information and compares the server power state information with the specification value, and if the power state information does not meet the specification value requirement, the BMC displays the power alarm state and transmits 10 to the power on-line monitoring module. The power supply on-line monitoring module receives abnormal alarm information fed back by the BMC, immediately reacts, reads parameter information of the server alarm power supply through the CPLD, and compares the parameter information with the alarm information fed back by the BMC. If the comparison results are different, the BMC is issued with a command to read again and feed back until the comparison results are the same. If the power supply on-line monitoring module is matched with the fault alarm information fed back by the BMC, the power supply on-line monitoring module determines that the fault alarm information is correct, and transmits the fault alarm information and a power supply state output value to the reliability early warning module.
On the one hand, the CPLD real-time polls characteristic parameter (real-time voltage value and working state of PFC feedback circuit, PFC OVP detection loop, etc., working temperature and state of key circuit diode, communication optocoupler, isolation driving IC, standby control chip, standby integrated chip) index data of each part of power supply collected by the power supply sensor in the register of the server power supply in real time, the CPLD records the parameter data of each key part of power supply and transmits the parameter data to the power supply on-line monitoring module, and the server power supply state on-line monitoring module analyzes the key parameter data of each part of power supply and compares the key parameter data with the specification interval. The power state output is 00 if the polling data is within the standard specification range, 01 if the polling data exceeds the standard specification range, and 10 if the polling data exceeds the standard specification range for 3 times continuously. And the server power supply state online monitoring module collects the power supply alarm information and the power supply state output value and transmits the collected information to the reliability early warning module.
On the other hand, the CPLD monitors in real time the output power consumption, output current, output signal voltage (12V, vingood, alert, PG) and other index data of the server power supply (what these data come from), and the CPLD polls and records each parameter data and compares it with the specification value. In order to ensure the redundancy of the power supply of the whole machine, the output power consumption of the power supply of the server of the machine room is less than 50 percent (standard specification range) of the rated power of the power supply, in order to ensure the stability of the power supply of the whole machine, the output current of 2 power supplies of the server of the machine room is in accordance with the current sharing requirement (standard specification range: less than 20 percent of load, the non-current sharing degree is less than 10 percent, and the non-current sharing degree is more than 20 percent of load and less than 5 percent), and in order to ensure the power supply and communication reliability of the server power supply, the quality of output signals of 12V (standard specification range: 12.0V-12.8V), vingood (standard specification range: 2.4V-3.46V), alert (standard specification range: 2.4V-3.46V) and PG (standard specification range: 2.4V-3.46V) of the server power supply is in the specification range. If the polling data is within the standard specification interval range, the power state output is 00, if the polling data exceeds the standard specification interval range, each power abnormal state output is 01, and if the polling data exceeds the standard specification interval range for 3 times continuously, the power state output is 10. And the server power supply state online monitoring module collects the power supply alarm information and the power supply state output value and transmits the collected information to the reliability early warning module.
The reliability early warning module receives and gathers the alarm information and the power state output value of each power supply of the server transmitted by the power supply on-line monitoring module, generates a total table of the power supply reliability states of the server in the machine room, and divides the power supply of the server into a low risk area (the power state output value is 00), a medium risk area (the power state output value is smaller than 10) and a high risk area (the power state output value is larger than or equal to 10) based on the total table information of the power supply reliability states.
For low risk zone power supplies, the reliability pre-warning module communicates a low risk power supply list to the data center room control module for display. And for the power supply in the risk area, the reliability early warning module transmits the risk power supply list and corresponding warning information to the data center machine room control module for power supply reliability identification and analysis. For the high-risk area power supply, the reliability early warning module conveys a high-risk power supply list and corresponding warning information to the data center machine room control module for power supply overhaul flow analysis.
The data center machine room control module receives the server power supply risk list and the corresponding alarm information fed back by the reliability early warning module, and machine room management staff checks the machine room server power supply risk list and the corresponding alarm information through a visual interface of the data center machine room control module.
For the risk server power supply, a machine room manager needs to determine whether risk alarm information of each power supply is an effective alarm to be managed and controlled, and finally evaluates the early warning level of the server power supply. If the final early warning level of the server power supply is a risk of wind and a field inspection requirement exists, machine room management personnel issues an inspection instruction to an inspection module, the inspection module receives the requirement and positions the accurate position of a fault server component, a wireless transmission technology of the Internet of things issues a server power supply index acquisition command to an automatic machine room inspection robot, the automatic machine room inspection robot moves to the position of the fault server according to fault positioning to take photos, acquire indexes such as videos and odors of the working state of the fault power supply, acquired data are transmitted to a data center machine room control module, and the machine room management personnel views and processes the feedback information through a visual interface of the control module.
For the high risk server power supply, a machine room manager needs to determine whether risk alarm information of each power supply is an effective alarm to be managed and controlled, and finally evaluates the early warning level of the server power supply. If the final early warning level of the server power supply is high in risk and the requirement of on-site power supply replacement exists, a machine room manager sends a power supply replacement on-site confirmation instruction to an overhaul module, the overhaul module receives the requirement and positions the accurate position of a fault server component, a server power supply index acquisition command is sent to a machine room automatic overhaul robot, the machine room automatic overhaul robot shoots the working state of the fault power supply, acquires indexes such as videos and odors according to the fault positioning and moves to the position of the fault server, and acquired data are transmitted to a data center machine room control module. The machine room manager checks the feedback information and finally confirms the maintenance requirement, issues a formal maintenance instruction, and the machine room automatic maintenance robot is positioned and moved to a fault power supply position to complete automatic replacement and re-electrifying operation of the fault power supply through the power line and power supply poking and inserting actions of the mechanical arm.
The scheme can also realize automatic monitoring, early warning and maintenance of the power supply reliability of the server in the machine room.
While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.

Claims (4)

1. The early warning method for the power supply reliability of the server is characterized by comprising the following steps of:
Monitoring state information of the power supply, characteristic parameter information of each part of the power supply and index data of power supply operation respectively;
The state information comprises temperature information of a power supply, a power supply output overcurrent signal and an overvoltage signal;
The monitoring of the state information specifically comprises the following steps:
Responding to the state information abnormal alarm reported by the baseboard management controller, calling a complex programmable logic device to acquire the state information of an alarm item corresponding to the current abnormal alarm, comparing the state information with the state information acquired by the baseboard management controller, and forming abnormal information if the comparison result is consistent;
The power supply component for monitoring the characteristic parameter information comprises a power factor correction feedback circuit, a diode circuit, a communication optocoupler, driving chips of each path and a standby control circuit;
the monitoring of the characteristic parameter information specifically comprises the following steps:
The complex programmable logic device polls the characteristic parameter information in a server power supply register, and the characteristic parameter information is collected in real time through a sensor;
comparing the characteristic parameter information with a preset value, recording the occurrence times of abnormal information, marking the times according to a preset rule, and taking a marked value as an abnormal value;
The CPLD real-time polls characteristic parameter index data of all parts of the power supply collected by a power supply sensor in a register of the power supply of the server in real time, the CPLD records the parameter data of all key parts of the power supply, analyzes the key parameter data of all parts of the power supply and compares the key parameter data with a specification interval, if the polling data is in the standard specification interval range, the power supply state output is 00, if the polling data exceeds the standard specification interval range, the power supply state output of each abnormal part of the power supply is 01, and if the polling data of the same part or the same characteristic parameter exceeds the standard specification interval range for 3 times, the power supply state output of each abnormal part of the power supply is 10;
The index data comprise the output power consumption, the output current and the voltage value of an output signal of the power supply;
the monitoring of the index data specifically comprises the following steps:
The complex programmable logic device polls index data in a power register of the server, and the index data is acquired and/or calculated in real time through a power chip;
Comparing the index data with a preset value, recording the occurrence times of the abnormal information, marking the occurrence times according to a preset rule, and taking the marked value as an abnormal value;
The CPLD monitors index data of a server power supply in real time, the index data is acquired through a power register, a power chip acquires and/or calculates the index data in real time and stores the index data in the power register, the CPLD polls and records each parameter data and compares the parameter data with a specification value, in order to ensure the power redundancy of the whole machine, the output power consumption of the power supply of the machine room server is less than 50% of the rated power of the power supply, in order to ensure the power stability of the whole machine, the output current of 2 power supplies of the machine room server power supply meets the current sharing requirement, the standard specification range is less than 20% of load, the non-uniform flow is less than 10%, the non-uniform flow is less than 20% of load, the non-uniform flow is less than 5%, the output signal quality of 12V, vingood, alert, PG% of the server power supply is in the specification range, if the polling data is in the standard specification range, the power state output is 00, if the polling data exceeds the standard specification range, each abnormal state output of the power supply is 01, if the polling data exceeds the standard specification range for 3 times, the power state output is 10%, and the power state output is summarized by the power state monitoring module on line, and the power state information is transmitted to the power supply reliability information summarizing module;
Comparing the monitored abnormal value with a preset abnormal value to obtain a corresponding risk level, wherein the method specifically comprises the following steps:
the method comprises the steps of receiving and summarizing alarm information and power state output values of each power supply of a server, which are transmitted by a server power state online monitoring module, generating a total table of power reliability states of a server in a machine room, dividing the server power supply into low risk areas based on the total table of the power reliability states, namely, the power state output values are 00;
For the power supply of the low risk area, a low risk power supply list is generated, and for the power supply of the medium risk area, the risk power supply list and corresponding alarm information are generated to the data center machine room control module for power supply reliability identification analysis;
and based on the corresponding preset coping strategy of the risk level, the coping strategy comprises on-site inspection, and the early warning prompt of the server power supply is carried out by combining the on-site inspection result and the risk level, wherein the on-site inspection is used for acquiring the external environment information of the server.
2. The method for early warning of server power reliability according to claim 1, characterized in that the method further comprises the steps of:
and positioning and maintaining a fault point through the fault maintenance robot for the server power supply sending the early warning prompt.
3. A server power reliability pre-warning system for implementing the method of claim 1, the system comprising:
The power supply online monitoring module is used for respectively monitoring state information of the power supply, characteristic parameter information of each part of the power supply and index data of power supply operation;
the reliability early warning module is used for comparing the monitored abnormal information with a preset abnormal value to obtain a corresponding risk level;
The data center machine room control module is used for carrying out early warning prompt on a server power supply based on the risk level corresponding to a preset coping strategy, wherein the coping strategy comprises on-site inspection, and the on-site inspection is used for acquiring external environment information of the server in combination with an on-site inspection result and the risk level.
4. The server power reliability warning system of claim 3, further comprising a server power overhaul module, wherein the server power overhaul module locates a fault point and repairs the fault point according to a preset strategy for the server power that sends the warning prompt through a fault overhaul robot.
CN202210902093.1A 2022-07-28 2022-07-28 A method and system for early warning of server power supply reliability Active CN115237719B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210902093.1A CN115237719B (en) 2022-07-28 2022-07-28 A method and system for early warning of server power supply reliability

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210902093.1A CN115237719B (en) 2022-07-28 2022-07-28 A method and system for early warning of server power supply reliability

Publications (2)

Publication Number Publication Date
CN115237719A CN115237719A (en) 2022-10-25
CN115237719B true CN115237719B (en) 2025-12-12

Family

ID=83677338

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210902093.1A Active CN115237719B (en) 2022-07-28 2022-07-28 A method and system for early warning of server power supply reliability

Country Status (1)

Country Link
CN (1) CN115237719B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115914282A (en) * 2022-11-01 2023-04-04 北部湾大学 Multi-dimensional monitoring of buried natural gas pipeline leakage monitoring system
CN116578461B (en) * 2023-04-21 2024-04-05 广东云下汇金科技有限公司 A warning device and method for abnormal status of a data center server
CN117032186B (en) * 2023-08-31 2024-11-01 北京东土科技股份有限公司 Controller system with power supply abnormality early warning protection function
CN118112454A (en) * 2024-03-01 2024-05-31 东莞市嘉田电子科技有限公司 A device for diagnosing potential power failure of a server and a method for diagnosing the same
CN118487365B (en) * 2024-05-13 2025-02-11 郑州兴科电子技术有限公司 A power backup control system for equipment power failure

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704049A (en) * 2021-04-13 2021-11-26 腾讯科技(深圳)有限公司 Server power failure monitoring method and device and electronic equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9857865B2 (en) * 2015-12-10 2018-01-02 Aspeed Technology Inc. Balancing of servers based on sampled utilization ratio and corresponding power consumption
CN109254895A (en) * 2018-08-21 2019-01-22 山东超越数控电子股份有限公司 A kind of high-performance server accident analysis prediction technique based on BMC
CN109683696A (en) * 2018-12-25 2019-04-26 浪潮电子信息产业股份有限公司 Fault of server power supply detection system, method, apparatus, equipment and medium
CN110488961A (en) * 2019-07-19 2019-11-22 苏州浪潮智能科技有限公司 A kind of server power supply test method and system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704049A (en) * 2021-04-13 2021-11-26 腾讯科技(深圳)有限公司 Server power failure monitoring method and device and electronic equipment

Also Published As

Publication number Publication date
CN115237719A (en) 2022-10-25

Similar Documents

Publication Publication Date Title
CN115237719B (en) A method and system for early warning of server power supply reliability
CN102005736B (en) On-line monitoring method of state of relay protection equipment
CN117977816B (en) A power safety intelligent power supply system
CN106980562A (en) A kind of hard disk monitoring method and device
CN118609339A (en) Fire safety management system for electrical circuits
CN104063987B (en) Master control room of nuclear power station back-up disk alarm method and system thereof
CN205692017U (en) Power integrated monitoring system
CN117221145A (en) Equipment fault predictive maintenance system based on Internet of things platform
CN112987696A (en) Regional power distribution network equipment management platform and operation method thereof
CN205506103U (en) Communication engine room equipment on -line monitoring device
CN116126772A (en) UART serial port management system and method applied to ARM server
CN114137302A (en) Monitoring system for whole verification process of electric energy metering device
CN119690732B (en) Positioning and troubleshooting system based on distributed architecture
CN110750427A (en) Data center equipment inspection method and system
CN116094175A (en) Safety early warning system and method for power distribution cabinet
CN118503060A (en) Server detection fault alarm system
CN115277353B (en) Remote fault active and passive early warning method for intelligent cabinet
CN111338891A (en) A kind of fan stability testing method and device
CN119356916A (en) A fault prediction method based on BMC health management module
CN118536043A (en) Abnormality detection method, device, electronic device and storage medium for substation equipment
CN110737256B (en) Method and device for controlling variable-frequency transmission system
CN118470916A (en) Intelligent early warning treatment method and device for equipment, electronic equipment and storage medium
CN118626780A (en) LCD display screen fault warning system and method
CN118131682A (en) Method for monitoring running state of PLC
CN115576736A (en) Refined intelligent monitoring method for data center

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Country or region after: China

Address after: Building 9, No.1, guanpu Road, Guoxiang street, Wuzhong Economic Development Zone, Wuzhong District, Suzhou City, Jiangsu Province

Applicant after: Suzhou Yuannao Intelligent Technology Co.,Ltd.

Address before: Building 9, No.1, guanpu Road, Guoxiang street, Wuzhong Economic Development Zone, Wuzhong District, Suzhou City, Jiangsu Province

Applicant before: SUZHOU LANGCHAO INTELLIGENT TECHNOLOGY Co.,Ltd.

Country or region before: China

GR01 Patent grant
GR01 Patent grant