CN111984471A - Cabinet power BMC redundancy management system and method - Google Patents
Cabinet power BMC redundancy management system and method Download PDFInfo
- Publication number
- CN111984471A CN111984471A CN202010820535.9A CN202010820535A CN111984471A CN 111984471 A CN111984471 A CN 111984471A CN 202010820535 A CN202010820535 A CN 202010820535A CN 111984471 A CN111984471 A CN 111984471A
- Authority
- CN
- China
- Prior art keywords
- bmc
- communication bus
- bus controller
- programmable logic
- logic device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 238000004891 communication Methods 0.000 claims abstract description 64
- 230000002159 abnormal effect Effects 0.000 claims abstract description 9
- 238000007726 management method Methods 0.000 claims description 23
- 238000012544 monitoring process Methods 0.000 claims description 15
- 230000009471 action Effects 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 8
- 238000006467 substitution reaction Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000017525 heat dissipation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1415—Saving, restoring, recovering or retrying at system level
- G06F11/142—Reconfiguring to eliminate the error
- G06F11/1423—Reconfiguring to eliminate the error by reconfiguration of paths
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1415—Saving, restoring, recovering or retrying at system level
- G06F11/1438—Restarting or rejuvenating
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Power Sources (AREA)
Abstract
The invention provides a cabinet power BMC redundant management system, which comprises: the system comprises a first BMC and a second BMC, wherein the first BMC and the second BMC are connected with a complex programmable logic device; the complex programmable logic device is connected with the output end of the communication bus controller; the input end of the communication bus controller is respectively connected with the first BMC and the second BMC; the communication bus controller is connected with a slave device. The invention can effectively solve the problems of shutdown and IIC abnormity of the cabinet power supply system in the BMC starting and running processes without restarting or powering down, reduces the starting time of the BMC in abnormal starting, and greatly improves the stability and reliability of the power supply system.
Description
Technical Field
The invention belongs to the technical field of management of a whole cabinet, and particularly relates to a system and a method for managing power BMC redundancy of a cabinet.
Background
With the rapid development of industries such as big data and cloud networks, the demand of servers is continuously increased, and the rack server has more and more application markets due to the advantages of high power utilization efficiency, high space utilization rate and the like. The power supply part in the cabinet is responsible for the calculation, storage, direct current power supply of nodes such as the switch and the like of the whole cabinet, and has the characteristics of high power consumption, high heat dissipation requirement, heavy load and the like, so that higher requirements on the stability and reliability of the monitoring of the power supply system are provided.
In the existing cabinet power supply monitoring system, the IIC of the BMC is connected with the corresponding power supply, fan control chips and other components, such as PSU (power system unit), so as to perform real-time monitoring on the states of system temperature, voltage, current and key devices, control output power supply and fault diagnosis.
Because only one BMC in the power monitoring system performs control and state monitoring through the IIC bus, when the IIC bus of the BMC has a problem or when the BMC has a fault during the operation of the power monitoring system, the power monitoring system cannot timely process related problems, and the problem of power supply of the whole cabinet can cause a functional problem. In a common cabinet power supply monitoring and management method, when a problem occurs in the starting of the BMC, the BMC can be started from the standby FLASH after the system is restarted, so that the problem of low starting speed is caused. When a power supply system normally operates, if a BMC (baseboard management controller) IIC (inter-integrated circuit) or a system has problems, the system needs to be restarted after related problem data are uploaded through the RJ45, and even similar problems can be solved through manual power failure restart, so that the power failure of the whole cabinet can be caused, and the stability of the system is insufficient.
Disclosure of Invention
In view of the above disadvantages in the prior art, the present invention provides a system and method for BMC redundancy management of a rack power supply, so as to solve the above technical problems.
The invention provides a cabinet power BMC redundancy management system, which comprises:
the system comprises a first BMC and a second BMC, wherein the first BMC and the second BMC are connected with a complex programmable logic device; the complex programmable logic device is connected with the output end of the communication bus controller; the input end of the communication bus controller is respectively connected with the first BMC and the second BMC; the communication bus controller is connected with a slave device.
Further, the system further comprises:
the input/output interface of the complex programmable logic device is respectively connected with the first BMC and the second BMC;
the complex programmable logic device is respectively connected with a first BMC and a second BMC through two I2C communication bus ports.
Further, the system further comprises:
the first communication bus controller, the second communication bus controller, the third communication bus controller, the fourth communication bus controller and the fifth communication bus controller; the first communication bus controller and the second communication bus controller are both connected with the sensor processing unit through an I2C communication bus; the third communication bus controller and the fourth communication bus controller are both connected with the power supply module through an I2C communication bus; the fifth communication bus controller is connected with the fan management module.
The invention also provides a cabinet power BMC redundancy management method, which comprises the following steps:
the communication bus controller monitors that the communication bus communicated with the first BMC is abnormal, and then the input end is switched to the second BMC;
and the complex programmable logic device monitors the input end switching action of the communication bus controller, issues an instruction for switching the input end to the second BMC to all the communication bus controllers, and issues a restart instruction to the first BMC.
Further, the method further comprises:
the complex programmable logic device monitors the states of the first BMC and the second BMC;
setting a first BMC (baseboard management controller) as a default application BMC in the complex programmable logic device;
and if the complex programmable logic device monitors that the first BMC applied at present has a fault, issuing an input end switching instruction to all the communication bus controllers, and switching the input ends of all the communication bus controllers to the second BMC in the starting state.
Further, the method for monitoring the states of the first BMC and the second BMC by the complex programmable logic device includes:
the complex programmable logic device acquires the running states of the first BMC and the second BMC by sending watchdog signals to the first BMC and the second BMC.
Further, the method for monitoring the input end switching action of the communication bus controller by the complex programmable logic device comprises the following steps:
and if the complex programmable logic device receives a level conversion signal of the communication bus controller, judging that the communication bus controller executes input end switching action.
Further, the method further comprises:
and if the complex programmable logic device monitors that the first BMC is restarted successfully, issuing an input end switching instruction to all the communication bus controllers, and switching the input ends of all the communication bus controllers from the second BMC to the first BMC.
The beneficial effect of the invention is that,
according to the cabinet power supply BMC redundancy management system and method, the problem that when a BMC starting process is in trouble, the loading of the image file is restarted slowly can be solved by adding the BMC monitoring module and the IIC conversion circuit. Meanwhile, in the running process of the power supply system, the problems of downtime and IIC of the BMC system are solved on the premise of not restarting BMC, powering off and other operations. The invention can effectively solve the problems of shutdown and IIC abnormity of the cabinet power supply system in the BMC starting and running processes without restarting or powering down, reduces the starting time of the BMC in abnormal starting, and greatly improves the stability and reliability of the power supply system.
In addition, the invention has reliable design principle, simple structure and very wide application prospect.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of a rack power BMC redundancy management system according to an embodiment of the present application;
fig. 2 is an exemplary flowchart of a rack power BMC redundancy management method according to an embodiment of the present disclosure.
Fig. 3 is an exemplary flowchart of a rack power BMC redundancy management method according to an embodiment of the present disclosure.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
In the description of the present invention, it is to be understood that the terms "first", "second", and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first," "second," etc. may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless otherwise specified.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art through specific situations.
The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
Example 1
The embodiment provides a BMC redundancy management system for a rack power supply, as shown in fig. 1, including the following structures:
the system comprises a first BMC (BMC0) and a second BMC (BMC1), wherein the BMC0 and the BMC1 are connected with a Complex Programmable Logic Device (CPLD) through an I2C bus, and WDT interfaces and RESET interfaces of the BMC0 and the BMC1 are connected with GPIO interfaces of the CPLD. The system further comprises: a first communication bus controller (chip 1), a second communication bus controller (chip 2), a third communication bus controller (chip 3), a fourth communication bus controller (chip 4) and a fifth communication bus controller (chip 5); chip 1 and chip 2 are both connected to the sensor processing unit (managing various sensor signals) via an I2C communication bus; the chip 3 and the chip 4 are both connected with a power supply module through an I2C communication bus; the chip 5 is connected with the fan management module. Chips 1-5 are all IIC multiple master selector chips.
The chips 1 to 5 are all connected to a GPIO interface (I/o interface) of the CPLD, and taking the chip 1 as an example, the input end of the chip 1 is connected to an interface (I2C1) of the BMC0 and an interface (I2C2) of the I2C of the BMC1, respectively. The other chips are the same as the input end of the chip 1, and are respectively connected with an I2C interface of the BMC0 and an I2C interface of the BMC 1.
The IIC link of the BMC0 is connected with the IIC slave device at the rear end after power-on, when one path of IIC of the BMC0 cannot work normally or is moved out of the system, the input end of the IIC multi-main selector chip of the path is switched to the IIC link of the BMC1, and IIC switching information is fed back to the CPLD through GPIO level change.
Example 2
The embodiment provides a BMC redundancy management method for a rack power supply, as shown in fig. 2, including the following steps:
and S1, when the system is started, the two BMC systems are loaded and run, and the added CPLD chip collects the starting states of the two BMC systems by monitoring watchdog signals output by the BMC0 and the BMC 1.
S2, in the process of starting up, the system is firstly controlled by the BMC0 system. When the BMC0 is abnormally started or overtime, the CPLD reads the system starting state of the BMC1, reads the starting state of the BMC1 through the IIC, and the system is controlled by the BMC1 when the BMC1 is normally started. The CPLD controls the BMC0 to restart through the backup FLASH, and when the BMC0 is restarted, 5 IIC multi-main selector chips switch the input end of the IIC to the IIC corresponding link of the BMC1 for management and control.
And S3, after the BMC0 is restarted, the CPLD sends a setting signal to the IIC multi-main selector chip through the GPIO, and the input ends of the 5 groups of IICs are switched back to the BMC0 from the BMC 1. Therefore, the restart time of the BMC0 can be saved, and the problem of slow startup caused by abnormal system startup can be reduced.
Example 3
The embodiment provides a BMC redundancy management method for a rack power supply, as shown in fig. 3, including the following steps:
and S1, when the power supply monitoring system runs, the CPLD chip monitors the states of the 5 IIC multi-main selector chips, and if the GPIO conversion level of one path of the 5 IIC multi-main selector chips is received, the situation that one path of IIC of the BMC0 is abnormal is shown.
And S2, reading and recording the corresponding values of the IIC register and the BMC state register through the IIC interface connected to the BMC0, and sending a signal to the BMC0 through the GPIO to restart the CPLD after the recording is finished. At this time, since the BMC0 restarts, the 5 IIC multi-master selector chips switch the input terminal of the IIC to the IIC corresponding link of the BMC1 for management and control.
And S3, when the BMC0 is restarted, the CPLD sends a signal to the IIC multi-main selector chip through the GPIO, and the input ends of the 5 groups of IICs are switched back to the BMC0 from the BMC 1. Therefore, the problem that the power supply cannot be normally supplied due to the fact that the system needs to be restarted when the IIC of the cabinet power supply system is abnormal can be solved, and the stability of the system is improved.
And S4, when the power supply monitoring system runs, the CPLD chip monitors the state of the BMC0 watchdog, if the abnormal state of the BMC0 is monitored, the problem that the system is down and the like is solved, the system needs to be restarted, the CPLD sends a signal to the BMC0 through the GPIO to restart, and the restarting process is the same as the IIC abnormal process. Therefore, when the BMC0 is down and the like, the system can solve similar problems under the states of no restart and no power-off.
Although the present invention has been described in detail by referring to the drawings in connection with the preferred embodiments, the present invention is not limited thereto. Various equivalent modifications or substitutions can be made on the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and these modifications or substitutions are within the scope of the present invention/any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (8)
1. A cabinet power BMC redundant management system, comprising:
the system comprises a first BMC and a second BMC, wherein the first BMC and the second BMC are connected with a complex programmable logic device; the complex programmable logic device is connected with the output end of the communication bus controller; the input end of the communication bus controller is respectively connected with the first BMC and the second BMC; the communication bus controller is connected with a slave device.
2. The system of claim 1, further comprising:
the input/output interface of the complex programmable logic device is respectively connected with the first BMC and the second BMC;
the complex programmable logic device is respectively connected with the first BMC and the second BMC through two I2C communication bus ports.
3. The system of claim 1, further comprising:
the first communication bus controller, the second communication bus controller, the third communication bus controller, the fourth communication bus controller and the fifth communication bus controller; the first communication bus controller and the second communication bus controller are both connected with the sensor processing unit through an I2C communication bus; the third communication bus controller and the fourth communication bus controller are both connected with the power supply module through an I2C communication bus; the fifth communication bus controller is connected with the fan management module.
4. A cabinet power BMC redundancy management method is characterized by comprising the following steps:
the communication bus controller monitors that the communication bus communicated with the first BMC is abnormal, and then the input end is switched to the second BMC;
and the complex programmable logic device monitors the input end switching action of the communication bus controller, issues an instruction for switching the input end to the second BMC to all the communication bus controllers, and issues a restart instruction to the first BMC.
5. The method of claim 4, further comprising:
the complex programmable logic device monitors the states of the first BMC and the second BMC;
setting a first BMC (baseboard management controller) as a default application BMC in the complex programmable logic device;
and if the complex programmable logic device monitors that the first BMC applied at present has a fault, issuing an input end switching instruction to all the communication bus controllers, and switching the input ends of all the communication bus controllers to the second BMC in the starting state.
6. The method of claim 5, wherein the method for the complex programmable logic device to monitor the status of the first BMC and the second BMC comprises:
the complex programmable logic device acquires the running states of the first BMC and the second BMC by sending watchdog signals to the first BMC and the second BMC.
7. The method of claim 4, wherein the method of the complex programmable logic device monitoring input switching actions to the communication bus controller comprises:
and if the complex programmable logic device receives a level conversion signal of the communication bus controller, judging that the communication bus controller executes input end switching action.
8. The method of claim 5, further comprising:
and if the complex programmable logic device monitors that the first BMC is restarted successfully, issuing an input end switching instruction to all the communication bus controllers, and switching the input ends of all the communication bus controllers from the second BMC to the first BMC.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010820535.9A CN111984471B (en) | 2020-08-14 | 2020-08-14 | Cabinet power BMC redundancy management system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010820535.9A CN111984471B (en) | 2020-08-14 | 2020-08-14 | Cabinet power BMC redundancy management system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111984471A true CN111984471A (en) | 2020-11-24 |
CN111984471B CN111984471B (en) | 2022-11-25 |
Family
ID=73435314
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010820535.9A Active CN111984471B (en) | 2020-08-14 | 2020-08-14 | Cabinet power BMC redundancy management system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111984471B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113127302A (en) * | 2021-04-16 | 2021-07-16 | 山东英信计算机技术有限公司 | Method and device for monitoring GPIO (general purpose input/output) of board card |
CN113778930A (en) * | 2021-11-12 | 2021-12-10 | 苏州浪潮智能科技有限公司 | AVS (Audio video Standard) adjusting system, method, device and equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104169905A (en) * | 2012-03-28 | 2014-11-26 | 英特尔公司 | Configurable and fault-tolerant baseboard management controller arrangement |
CN105867572A (en) * | 2016-04-26 | 2016-08-17 | 浪潮(北京)电子信息产业有限公司 | Power supply managing method for rack server and rack server |
CN106971586A (en) * | 2017-05-05 | 2017-07-21 | 深圳市哈工大交通电子技术有限公司 | The whistle control system of principal and subordinate's automated back-up switching |
CN109471770A (en) * | 2018-09-11 | 2019-03-15 | 华为技术有限公司 | A kind of method for managing system and device |
-
2020
- 2020-08-14 CN CN202010820535.9A patent/CN111984471B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104169905A (en) * | 2012-03-28 | 2014-11-26 | 英特尔公司 | Configurable and fault-tolerant baseboard management controller arrangement |
CN105867572A (en) * | 2016-04-26 | 2016-08-17 | 浪潮(北京)电子信息产业有限公司 | Power supply managing method for rack server and rack server |
CN106971586A (en) * | 2017-05-05 | 2017-07-21 | 深圳市哈工大交通电子技术有限公司 | The whistle control system of principal and subordinate's automated back-up switching |
CN109471770A (en) * | 2018-09-11 | 2019-03-15 | 华为技术有限公司 | A kind of method for managing system and device |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113127302A (en) * | 2021-04-16 | 2021-07-16 | 山东英信计算机技术有限公司 | Method and device for monitoring GPIO (general purpose input/output) of board card |
CN113127302B (en) * | 2021-04-16 | 2023-05-26 | 山东英信计算机技术有限公司 | Board GPIO monitoring method and device |
CN113778930A (en) * | 2021-11-12 | 2021-12-10 | 苏州浪潮智能科技有限公司 | AVS (Audio video Standard) adjusting system, method, device and equipment |
WO2023082531A1 (en) * | 2021-11-12 | 2023-05-19 | 苏州浪潮智能科技有限公司 | Avs adjustment system, method and apparatus, and device and storage mdium |
Also Published As
Publication number | Publication date |
---|---|
CN111984471B (en) | 2022-11-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6957353B2 (en) | System and method for providing minimal power-consuming redundant computing hardware for distributed services | |
CN111767244B (en) | Dual-redundancy computer equipment based on domestic Loongson platform | |
US20070220301A1 (en) | Remote access control management module | |
US20080281475A1 (en) | Fan control scheme | |
US8990632B2 (en) | System for monitoring state information in a multiplex system | |
CN111984471B (en) | Cabinet power BMC redundancy management system and method | |
CN104050061A (en) | Multi-main-control-panel redundant backup system based on PCIe bus | |
CN111831488B (en) | TCMS-MPU control unit with safety level design | |
CN109236710B (en) | Server fan control system and control method thereof | |
CN212541329U (en) | Dual-redundancy computer equipment based on domestic Loongson platform | |
CN117992270B (en) | Memory resource management system, method, device, equipment and storage medium | |
US20120159241A1 (en) | Information processing system | |
CN116319618A (en) | Switch operation control method, device, system, equipment and storage medium | |
CN112019455B (en) | Switch monitoring device and method based on programmable logic device | |
CN111628944B (en) | Switch and switch system | |
CN111880999B (en) | High-availability monitoring management device for high-density blade server and redundancy switching method | |
CN218824636U (en) | Power supply detection device for server hard disk backboard | |
CN116610430A (en) | Method for realizing electrified operation and maintenance of processor and server system | |
CN111158963A (en) | Server firmware redundancy starting method and server | |
CN115237684A (en) | Power supply system and data center of multi-node server | |
JP2009237758A (en) | Server system, server management method, and program therefor | |
CN113535472A (en) | Cluster server | |
JP6953710B2 (en) | Computer system | |
CN112068991A (en) | High-reliability dual-management system based on master-slave synchronization | |
CN117666746B (en) | Multi-node server, method, device and medium applied to multi-node server |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |