CN111984471A - Cabinet power BMC redundancy management system and method - Google Patents

Cabinet power BMC redundancy management system and method Download PDF

Info

Publication number
CN111984471A
CN111984471A CN202010820535.9A CN202010820535A CN111984471A CN 111984471 A CN111984471 A CN 111984471A CN 202010820535 A CN202010820535 A CN 202010820535A CN 111984471 A CN111984471 A CN 111984471A
Authority
CN
China
Prior art keywords
bmc
communication bus
bus controller
programmable logic
logic device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010820535.9A
Other languages
Chinese (zh)
Other versions
CN111984471B (en
Inventor
张瑜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010820535.9A priority Critical patent/CN111984471B/en
Publication of CN111984471A publication Critical patent/CN111984471A/en
Application granted granted Critical
Publication of CN111984471B publication Critical patent/CN111984471B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/142Reconfiguring to eliminate the error
    • G06F11/1423Reconfiguring to eliminate the error by reconfiguration of paths
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1438Restarting or rejuvenating

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Power Sources (AREA)

Abstract

The invention provides a cabinet power BMC redundant management system, which comprises: the system comprises a first BMC and a second BMC, wherein the first BMC and the second BMC are connected with a complex programmable logic device; the complex programmable logic device is connected with the output end of the communication bus controller; the input end of the communication bus controller is respectively connected with the first BMC and the second BMC; the communication bus controller is connected with a slave device. The invention can effectively solve the problems of shutdown and IIC abnormity of the cabinet power supply system in the BMC starting and running processes without restarting or powering down, reduces the starting time of the BMC in abnormal starting, and greatly improves the stability and reliability of the power supply system.

Description

Cabinet power BMC redundancy management system and method
Technical Field
The invention belongs to the technical field of management of a whole cabinet, and particularly relates to a system and a method for managing power BMC redundancy of a cabinet.
Background
With the rapid development of industries such as big data and cloud networks, the demand of servers is continuously increased, and the rack server has more and more application markets due to the advantages of high power utilization efficiency, high space utilization rate and the like. The power supply part in the cabinet is responsible for the calculation, storage, direct current power supply of nodes such as the switch and the like of the whole cabinet, and has the characteristics of high power consumption, high heat dissipation requirement, heavy load and the like, so that higher requirements on the stability and reliability of the monitoring of the power supply system are provided.
In the existing cabinet power supply monitoring system, the IIC of the BMC is connected with the corresponding power supply, fan control chips and other components, such as PSU (power system unit), so as to perform real-time monitoring on the states of system temperature, voltage, current and key devices, control output power supply and fault diagnosis.
Because only one BMC in the power monitoring system performs control and state monitoring through the IIC bus, when the IIC bus of the BMC has a problem or when the BMC has a fault during the operation of the power monitoring system, the power monitoring system cannot timely process related problems, and the problem of power supply of the whole cabinet can cause a functional problem. In a common cabinet power supply monitoring and management method, when a problem occurs in the starting of the BMC, the BMC can be started from the standby FLASH after the system is restarted, so that the problem of low starting speed is caused. When a power supply system normally operates, if a BMC (baseboard management controller) IIC (inter-integrated circuit) or a system has problems, the system needs to be restarted after related problem data are uploaded through the RJ45, and even similar problems can be solved through manual power failure restart, so that the power failure of the whole cabinet can be caused, and the stability of the system is insufficient.
Disclosure of Invention
In view of the above disadvantages in the prior art, the present invention provides a system and method for BMC redundancy management of a rack power supply, so as to solve the above technical problems.
The invention provides a cabinet power BMC redundancy management system, which comprises:
the system comprises a first BMC and a second BMC, wherein the first BMC and the second BMC are connected with a complex programmable logic device; the complex programmable logic device is connected with the output end of the communication bus controller; the input end of the communication bus controller is respectively connected with the first BMC and the second BMC; the communication bus controller is connected with a slave device.
Further, the system further comprises:
the input/output interface of the complex programmable logic device is respectively connected with the first BMC and the second BMC;
the complex programmable logic device is respectively connected with a first BMC and a second BMC through two I2C communication bus ports.
Further, the system further comprises:
the first communication bus controller, the second communication bus controller, the third communication bus controller, the fourth communication bus controller and the fifth communication bus controller; the first communication bus controller and the second communication bus controller are both connected with the sensor processing unit through an I2C communication bus; the third communication bus controller and the fourth communication bus controller are both connected with the power supply module through an I2C communication bus; the fifth communication bus controller is connected with the fan management module.
The invention also provides a cabinet power BMC redundancy management method, which comprises the following steps:
the communication bus controller monitors that the communication bus communicated with the first BMC is abnormal, and then the input end is switched to the second BMC;
and the complex programmable logic device monitors the input end switching action of the communication bus controller, issues an instruction for switching the input end to the second BMC to all the communication bus controllers, and issues a restart instruction to the first BMC.
Further, the method further comprises:
the complex programmable logic device monitors the states of the first BMC and the second BMC;
setting a first BMC (baseboard management controller) as a default application BMC in the complex programmable logic device;
and if the complex programmable logic device monitors that the first BMC applied at present has a fault, issuing an input end switching instruction to all the communication bus controllers, and switching the input ends of all the communication bus controllers to the second BMC in the starting state.
Further, the method for monitoring the states of the first BMC and the second BMC by the complex programmable logic device includes:
the complex programmable logic device acquires the running states of the first BMC and the second BMC by sending watchdog signals to the first BMC and the second BMC.
Further, the method for monitoring the input end switching action of the communication bus controller by the complex programmable logic device comprises the following steps:
and if the complex programmable logic device receives a level conversion signal of the communication bus controller, judging that the communication bus controller executes input end switching action.
Further, the method further comprises:
and if the complex programmable logic device monitors that the first BMC is restarted successfully, issuing an input end switching instruction to all the communication bus controllers, and switching the input ends of all the communication bus controllers from the second BMC to the first BMC.
The beneficial effect of the invention is that,
according to the cabinet power supply BMC redundancy management system and method, the problem that when a BMC starting process is in trouble, the loading of the image file is restarted slowly can be solved by adding the BMC monitoring module and the IIC conversion circuit. Meanwhile, in the running process of the power supply system, the problems of downtime and IIC of the BMC system are solved on the premise of not restarting BMC, powering off and other operations. The invention can effectively solve the problems of shutdown and IIC abnormity of the cabinet power supply system in the BMC starting and running processes without restarting or powering down, reduces the starting time of the BMC in abnormal starting, and greatly improves the stability and reliability of the power supply system.
In addition, the invention has reliable design principle, simple structure and very wide application prospect.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of a rack power BMC redundancy management system according to an embodiment of the present application;
fig. 2 is an exemplary flowchart of a rack power BMC redundancy management method according to an embodiment of the present disclosure.
Fig. 3 is an exemplary flowchart of a rack power BMC redundancy management method according to an embodiment of the present disclosure.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
In the description of the present invention, it is to be understood that the terms "first", "second", and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first," "second," etc. may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless otherwise specified.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art through specific situations.
The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
Example 1
The embodiment provides a BMC redundancy management system for a rack power supply, as shown in fig. 1, including the following structures:
the system comprises a first BMC (BMC0) and a second BMC (BMC1), wherein the BMC0 and the BMC1 are connected with a Complex Programmable Logic Device (CPLD) through an I2C bus, and WDT interfaces and RESET interfaces of the BMC0 and the BMC1 are connected with GPIO interfaces of the CPLD. The system further comprises: a first communication bus controller (chip 1), a second communication bus controller (chip 2), a third communication bus controller (chip 3), a fourth communication bus controller (chip 4) and a fifth communication bus controller (chip 5); chip 1 and chip 2 are both connected to the sensor processing unit (managing various sensor signals) via an I2C communication bus; the chip 3 and the chip 4 are both connected with a power supply module through an I2C communication bus; the chip 5 is connected with the fan management module. Chips 1-5 are all IIC multiple master selector chips.
The chips 1 to 5 are all connected to a GPIO interface (I/o interface) of the CPLD, and taking the chip 1 as an example, the input end of the chip 1 is connected to an interface (I2C1) of the BMC0 and an interface (I2C2) of the I2C of the BMC1, respectively. The other chips are the same as the input end of the chip 1, and are respectively connected with an I2C interface of the BMC0 and an I2C interface of the BMC 1.
The IIC link of the BMC0 is connected with the IIC slave device at the rear end after power-on, when one path of IIC of the BMC0 cannot work normally or is moved out of the system, the input end of the IIC multi-main selector chip of the path is switched to the IIC link of the BMC1, and IIC switching information is fed back to the CPLD through GPIO level change.
Example 2
The embodiment provides a BMC redundancy management method for a rack power supply, as shown in fig. 2, including the following steps:
and S1, when the system is started, the two BMC systems are loaded and run, and the added CPLD chip collects the starting states of the two BMC systems by monitoring watchdog signals output by the BMC0 and the BMC 1.
S2, in the process of starting up, the system is firstly controlled by the BMC0 system. When the BMC0 is abnormally started or overtime, the CPLD reads the system starting state of the BMC1, reads the starting state of the BMC1 through the IIC, and the system is controlled by the BMC1 when the BMC1 is normally started. The CPLD controls the BMC0 to restart through the backup FLASH, and when the BMC0 is restarted, 5 IIC multi-main selector chips switch the input end of the IIC to the IIC corresponding link of the BMC1 for management and control.
And S3, after the BMC0 is restarted, the CPLD sends a setting signal to the IIC multi-main selector chip through the GPIO, and the input ends of the 5 groups of IICs are switched back to the BMC0 from the BMC 1. Therefore, the restart time of the BMC0 can be saved, and the problem of slow startup caused by abnormal system startup can be reduced.
Example 3
The embodiment provides a BMC redundancy management method for a rack power supply, as shown in fig. 3, including the following steps:
and S1, when the power supply monitoring system runs, the CPLD chip monitors the states of the 5 IIC multi-main selector chips, and if the GPIO conversion level of one path of the 5 IIC multi-main selector chips is received, the situation that one path of IIC of the BMC0 is abnormal is shown.
And S2, reading and recording the corresponding values of the IIC register and the BMC state register through the IIC interface connected to the BMC0, and sending a signal to the BMC0 through the GPIO to restart the CPLD after the recording is finished. At this time, since the BMC0 restarts, the 5 IIC multi-master selector chips switch the input terminal of the IIC to the IIC corresponding link of the BMC1 for management and control.
And S3, when the BMC0 is restarted, the CPLD sends a signal to the IIC multi-main selector chip through the GPIO, and the input ends of the 5 groups of IICs are switched back to the BMC0 from the BMC 1. Therefore, the problem that the power supply cannot be normally supplied due to the fact that the system needs to be restarted when the IIC of the cabinet power supply system is abnormal can be solved, and the stability of the system is improved.
And S4, when the power supply monitoring system runs, the CPLD chip monitors the state of the BMC0 watchdog, if the abnormal state of the BMC0 is monitored, the problem that the system is down and the like is solved, the system needs to be restarted, the CPLD sends a signal to the BMC0 through the GPIO to restart, and the restarting process is the same as the IIC abnormal process. Therefore, when the BMC0 is down and the like, the system can solve similar problems under the states of no restart and no power-off.
Although the present invention has been described in detail by referring to the drawings in connection with the preferred embodiments, the present invention is not limited thereto. Various equivalent modifications or substitutions can be made on the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and these modifications or substitutions are within the scope of the present invention/any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. A cabinet power BMC redundant management system, comprising:
the system comprises a first BMC and a second BMC, wherein the first BMC and the second BMC are connected with a complex programmable logic device; the complex programmable logic device is connected with the output end of the communication bus controller; the input end of the communication bus controller is respectively connected with the first BMC and the second BMC; the communication bus controller is connected with a slave device.
2. The system of claim 1, further comprising:
the input/output interface of the complex programmable logic device is respectively connected with the first BMC and the second BMC;
the complex programmable logic device is respectively connected with the first BMC and the second BMC through two I2C communication bus ports.
3. The system of claim 1, further comprising:
the first communication bus controller, the second communication bus controller, the third communication bus controller, the fourth communication bus controller and the fifth communication bus controller; the first communication bus controller and the second communication bus controller are both connected with the sensor processing unit through an I2C communication bus; the third communication bus controller and the fourth communication bus controller are both connected with the power supply module through an I2C communication bus; the fifth communication bus controller is connected with the fan management module.
4. A cabinet power BMC redundancy management method is characterized by comprising the following steps:
the communication bus controller monitors that the communication bus communicated with the first BMC is abnormal, and then the input end is switched to the second BMC;
and the complex programmable logic device monitors the input end switching action of the communication bus controller, issues an instruction for switching the input end to the second BMC to all the communication bus controllers, and issues a restart instruction to the first BMC.
5. The method of claim 4, further comprising:
the complex programmable logic device monitors the states of the first BMC and the second BMC;
setting a first BMC (baseboard management controller) as a default application BMC in the complex programmable logic device;
and if the complex programmable logic device monitors that the first BMC applied at present has a fault, issuing an input end switching instruction to all the communication bus controllers, and switching the input ends of all the communication bus controllers to the second BMC in the starting state.
6. The method of claim 5, wherein the method for the complex programmable logic device to monitor the status of the first BMC and the second BMC comprises:
the complex programmable logic device acquires the running states of the first BMC and the second BMC by sending watchdog signals to the first BMC and the second BMC.
7. The method of claim 4, wherein the method of the complex programmable logic device monitoring input switching actions to the communication bus controller comprises:
and if the complex programmable logic device receives a level conversion signal of the communication bus controller, judging that the communication bus controller executes input end switching action.
8. The method of claim 5, further comprising:
and if the complex programmable logic device monitors that the first BMC is restarted successfully, issuing an input end switching instruction to all the communication bus controllers, and switching the input ends of all the communication bus controllers from the second BMC to the first BMC.
CN202010820535.9A 2020-08-14 2020-08-14 Cabinet power BMC redundancy management system and method Active CN111984471B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010820535.9A CN111984471B (en) 2020-08-14 2020-08-14 Cabinet power BMC redundancy management system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010820535.9A CN111984471B (en) 2020-08-14 2020-08-14 Cabinet power BMC redundancy management system and method

Publications (2)

Publication Number Publication Date
CN111984471A true CN111984471A (en) 2020-11-24
CN111984471B CN111984471B (en) 2022-11-25

Family

ID=73435314

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010820535.9A Active CN111984471B (en) 2020-08-14 2020-08-14 Cabinet power BMC redundancy management system and method

Country Status (1)

Country Link
CN (1) CN111984471B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113127302A (en) * 2021-04-16 2021-07-16 山东英信计算机技术有限公司 Method and device for monitoring GPIO (general purpose input/output) of board card
CN113778930A (en) * 2021-11-12 2021-12-10 苏州浪潮智能科技有限公司 AVS (Audio video Standard) adjusting system, method, device and equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104169905A (en) * 2012-03-28 2014-11-26 英特尔公司 Configurable and fault-tolerant baseboard management controller arrangement
CN105867572A (en) * 2016-04-26 2016-08-17 浪潮(北京)电子信息产业有限公司 Power supply managing method for rack server and rack server
CN106971586A (en) * 2017-05-05 2017-07-21 深圳市哈工大交通电子技术有限公司 The whistle control system of principal and subordinate's automated back-up switching
CN109471770A (en) * 2018-09-11 2019-03-15 华为技术有限公司 A kind of method for managing system and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104169905A (en) * 2012-03-28 2014-11-26 英特尔公司 Configurable and fault-tolerant baseboard management controller arrangement
CN105867572A (en) * 2016-04-26 2016-08-17 浪潮(北京)电子信息产业有限公司 Power supply managing method for rack server and rack server
CN106971586A (en) * 2017-05-05 2017-07-21 深圳市哈工大交通电子技术有限公司 The whistle control system of principal and subordinate's automated back-up switching
CN109471770A (en) * 2018-09-11 2019-03-15 华为技术有限公司 A kind of method for managing system and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113127302A (en) * 2021-04-16 2021-07-16 山东英信计算机技术有限公司 Method and device for monitoring GPIO (general purpose input/output) of board card
CN113127302B (en) * 2021-04-16 2023-05-26 山东英信计算机技术有限公司 Board GPIO monitoring method and device
CN113778930A (en) * 2021-11-12 2021-12-10 苏州浪潮智能科技有限公司 AVS (Audio video Standard) adjusting system, method, device and equipment
WO2023082531A1 (en) * 2021-11-12 2023-05-19 苏州浪潮智能科技有限公司 Avs adjustment system, method and apparatus, and device and storage mdium

Also Published As

Publication number Publication date
CN111984471B (en) 2022-11-25

Similar Documents

Publication Publication Date Title
US6957353B2 (en) System and method for providing minimal power-consuming redundant computing hardware for distributed services
CN111767244B (en) Dual-redundancy computer equipment based on domestic Loongson platform
US20070220301A1 (en) Remote access control management module
US20080281475A1 (en) Fan control scheme
US8990632B2 (en) System for monitoring state information in a multiplex system
CN111984471B (en) Cabinet power BMC redundancy management system and method
CN104050061A (en) Multi-main-control-panel redundant backup system based on PCIe bus
CN111831488B (en) TCMS-MPU control unit with safety level design
CN109236710B (en) Server fan control system and control method thereof
CN212541329U (en) Dual-redundancy computer equipment based on domestic Loongson platform
CN117992270B (en) Memory resource management system, method, device, equipment and storage medium
US20120159241A1 (en) Information processing system
CN116319618A (en) Switch operation control method, device, system, equipment and storage medium
CN112019455B (en) Switch monitoring device and method based on programmable logic device
CN111628944B (en) Switch and switch system
CN111880999B (en) High-availability monitoring management device for high-density blade server and redundancy switching method
CN218824636U (en) Power supply detection device for server hard disk backboard
CN116610430A (en) Method for realizing electrified operation and maintenance of processor and server system
CN111158963A (en) Server firmware redundancy starting method and server
CN115237684A (en) Power supply system and data center of multi-node server
JP2009237758A (en) Server system, server management method, and program therefor
CN113535472A (en) Cluster server
JP6953710B2 (en) Computer system
CN112068991A (en) High-reliability dual-management system based on master-slave synchronization
CN117666746B (en) Multi-node server, method, device and medium applied to multi-node server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant