CN113886307A - Thermal maintenance method and system for BMC module, server mainboard and BMC module - Google Patents
Thermal maintenance method and system for BMC module, server mainboard and BMC module Download PDFInfo
- Publication number
- CN113886307A CN113886307A CN202111161915.7A CN202111161915A CN113886307A CN 113886307 A CN113886307 A CN 113886307A CN 202111161915 A CN202111161915 A CN 202111161915A CN 113886307 A CN113886307 A CN 113886307A
- Authority
- CN
- China
- Prior art keywords
- bmc module
- bmc
- module
- server
- isolation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012423 maintenance Methods 0.000 title claims description 28
- 238000000034 method Methods 0.000 title claims description 26
- 238000002955 isolation Methods 0.000 claims abstract description 77
- 238000012544 monitoring process Methods 0.000 claims description 17
- 230000002159 abnormal effect Effects 0.000 claims description 13
- 238000010586 diagram Methods 0.000 description 10
- 230000003993 interaction Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 3
- 230000001052 transient effect Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 1
- 229910044991 metal oxide Inorganic materials 0.000 description 1
- 150000004706 metal oxides Chemical class 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/40—Bus structure
- G06F13/4063—Device-to-bus coupling
- G06F13/4068—Electrical coupling
- G06F13/4081—Live connection to bus, e.g. hot-plugging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0751—Error or fault detection not based on redundancy
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0766—Error or fault reporting or storing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Computer Hardware Design (AREA)
- Debugging And Monitoring (AREA)
Abstract
An embodiment of the present specification provides a server management system, including: the system comprises a server mainboard and a Baseboard Management Controller (BMC) module which is arranged independently of the server mainboard; the BMC module and the server mainboard are connected in a pluggable manner through a first interface and a second interface of the BMC module and the server mainboard respectively; the BMC module is provided with a power supply slow starting circuit and is used for supporting power supply protection when the BMC module is in hot plug relative to the server mainboard; the server mainboard is provided with a signal isolation circuit for supporting signal isolation when the BMC module is in hot plug relative to the server mainboard.
Description
Technical Field
One or more embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a baseboard management controller BMC module, a server motherboard, a server management system, and a thermal maintenance method and system for a baseboard management controller BMC module.
Background
The BMC (baseboard management controller) is used for monitoring and managing the in-place conditions of a server fan, a power supply and equipment in a server system, and has data interaction with the CPU.
At present, if a BMC fails, a service needs to be stopped and the entire server needs to be offline for maintenance, which results in a long maintenance period and high cost. Based on this, the embodiments of the present specification pioneer a BMC online hot maintenance scheme that does not affect the service.
Disclosure of Invention
One or more embodiments of the present disclosure describe an online thermal maintenance method and system for a baseboard management controller BMC module, a server motherboard, a server management system, and a BMC module, where the BMC module is modularized, so that the BMC module and the server motherboard are independently arranged and connected in a hot-pluggable manner, thereby implementing online thermal maintenance for the BMC module without affecting services.
According to a first aspect, a BMC module is provided, which is independent of a server motherboard, the BMC module comprising: the first interface is used for being connected with the server mainboard in a pluggable manner; and the power supply slow starting circuit is used for supporting power supply protection when the BMC module is subjected to hot plug relative to the server mainboard.
According to a second aspect, there is provided a server motherboard comprising: the second interface is used for being connected with the independently arranged BMC module in a pluggable manner; and the signal isolation circuit is used for supporting signal isolation when the BMC module is subjected to hot plug relative to the server mainboard.
According to a third aspect, there is provided a server management system comprising: the system comprises a server mainboard and a BMC module which is independent from the server mainboard; the BMC module and the server mainboard are connected in a pluggable manner through a first interface and a second interface of the BMC module and the server mainboard respectively; the BMC module is provided with a power supply slow starting circuit and is used for supporting power supply protection when the BMC module is in hot plug relative to the server mainboard; the server mainboard is provided with a signal isolation circuit for supporting signal isolation when the BMC module is in hot plug relative to the server mainboard.
In one embodiment, the system further comprises: and the inter-board connector is used for connecting the BMC module and the server mainboard by respectively connecting the first interface and the second interface.
According to a fourth aspect, an online thermal maintenance method for a BMC module is provided, where the BMC module is connected to a server motherboard in a pluggable manner; the method comprises the following steps: after monitoring that the BMC module operates abnormally, the complex programmable logic device CPLD sends a first notification to the BIOS of the basic input/output system; the BIOS system records an abnormal event of the BMC module according to the first notification, and sends an isolation completion mark to the CPLD device after fault isolation and/or alarm shielding aiming at the BMC module; and the CPLD device prompts maintainers that the BMC module can be pulled out according to the isolation completion mark.
In one embodiment, after monitoring that the BMC module is abnormally operated, the CPLD sends a first notification to the BIOS, including: after monitoring that the BMC module operates abnormally, the CPLD device resets the BMC module; and the CPLD device sends the first notice to the BIOS system under the condition that the operation of the BMC module is still abnormal.
In one embodiment, a signal isolation circuit is arranged in the server mainboard; wherein, carry out fault isolation and/or alarm shielding to BMC module, include: and disconnecting the signal circuit in the server mainboard from the signal circuit in the BMC module by controlling the signal isolation circuit.
In one embodiment, the BIOS system further updates a status of the BMC module in the in-band management system to a failure, and/or is removable, based on the first notification.
In one embodiment, the prompting, by the CPLD device, the removal of the BMC module by the maintenance person according to the isolation completion flag includes: and the CPLD device carries out the prompt by lightening an indicator lamp according to the isolation completion mark.
In one embodiment, after the CPLD device prompts a maintainer that the BMC module can be unplugged according to the isolation complete flag, the method further includes: after monitoring that the BMC module recovers normal operation, the CPLD device sends a second notification to the BIOS system; and the BIOS system removes the fault isolation and/or the alarm shielding according to the second notice.
In a specific embodiment, a signal isolation circuit is arranged in the server mainboard; wherein removing the fault isolation and/or alarm shielding comprises: and restoring the connection between the signal circuit in the server mainboard and the signal circuit in the BMC module by controlling the signal isolation circuit.
In one embodiment, the CPLD device determines whether the BMC module is operating normally by monitoring a heartbeat signal of the BMC module.
According to a fifth aspect, there is provided a thermal maintenance system of a BMC module, the BMC module being pluggable to a server motherboard, the system comprising: the complex programmable logic device CPLD is used for sending a first notice to the BIOS of the basic input and output system after monitoring that the BMC module operates abnormally; the BIOS system is used for recording abnormal events of the BMC module according to the first notification, and sending an isolation completion mark to the CPLD device after fault isolation and/or alarm shielding aiming at the BMC module; and the CPLD device is also used for prompting maintenance personnel that the BMC module can be pulled out according to the isolation completion mark.
According to a sixth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method provided by the fourth aspect described above.
According to a seventh aspect, there is provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method provided by the fourth aspect above.
To sum up, for a hardware fault which cannot be solved by simply resetting the BMC, a series of processes of hot plugging the BMC module are realized by combining software and hardware disclosed in the embodiment of the specification, fault isolation is achieved, the BMC module can be quickly replaced without powering down a server, normal operation of the BMC module is automatically recovered after replacement, and no influence is caused on service operation.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 illustrates a block diagram of a BMC module according to one embodiment;
FIG. 2 illustrates a schematic diagram of a server motherboard, according to one embodiment;
FIG. 3 shows a schematic structural diagram of a server management system according to one embodiment;
FIG. 4 illustrates a multi-party interaction diagram implementing BMC module online thermal maintenance, according to one embodiment;
FIG. 5 illustrates a block diagram of a thermal maintenance system of the BMC module, according to one embodiment.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
At present, a BMC is integrated in a server motherboard, and after a BMC small system fails, the BMC may be reset by pressing a uid (unit identification light) button for a long time.
Based on this, the embodiments of the present specification pioneer a BMC online hot maintenance scheme that does not affect the service. The scheme comprises a hardware implementation part and a software implementation part. And in the hardware implementation part, the BMC is modularized, so that the BMC and the server mainboard are independently arranged and can be connected in a hot-pluggable manner. Specifically, the following introduces a scheme of a hardware implementation part from three points of view, namely, a BMC module, a server motherboard, and a server management system including the BMC module and the server motherboard.
Fig. 1 is a schematic diagram illustrating a configuration of a BMC module according to an embodiment, where the BMC module is provided independently of a server motherboard. It should be noted that the BMC module disclosed in the embodiments of the present disclosure may refer to various management units for managing a server out of band, and the names of the management units include, but are not limited to, a server management board, a server management module, and a server management unit.
As shown in fig. 1, the BMC module 100 includes a first interface 110 for connecting with a server motherboard in a pluggable manner; it is to be understood that this connection may be a direct connection or an indirect connection via a connector; also, the number of the first interfaces 110 may be one or more; pluggable means that the BMC module is detachably connected with the server mainboard, and the BMC module can be inserted into or pulled out of the server mainboard.
The BMC module 100 further includes a power buffer starting circuit 120, which is used to support power protection when the BMC module 100 is hot-plugged with respect to a server motherboard. Specifically, when the BMC module 100 is inserted into or removed from the server motherboard, the power supply slow start circuit 120 disposed thereon can limit the transient surge current on the server power bus to a lower level, and simultaneously, the voltage of the whole server is not reduced, so that the damage to the server power supply in the plugging process is avoided, and thus, the hot plugging of the power supply is realized.
In one embodiment, the power soft start circuit 120 is implemented as a voltage slope type. In another embodiment, the power soft start circuit 120 is implemented as a current slope type. It should be noted that, the voltage slope type slow start circuit has a simple structure, but the change of the output current thereof is greatly influenced by the load impedance, and the change of the output current of the current slope type slow start circuit is not influenced by the load, but the circuit structure is complex, so that the power supply slow start circuit 120 can be realized as a voltage slope type or a current slope type according to the actual requirement. On the other hand, a slow starting circuit can be designed by using an MOS (metal oxide semiconductor) tube, the MOS tube has the characteristics of low on-resistance Rds and simple driving, and the slow starting circuit can be formed by adding a small number of components around the MOS tube; in general, PMOS is used for the positive power supply, and NMOS is used for the negative power supply.
It is to be understood that the power soft start circuit 120 is connected to the BMC module power supply circuit, and the BMC module further includes other conventional circuits such as a signal circuit.
Above, a description is given of an independent modular BMC.
Fig. 2 is a schematic structural diagram of a server motherboard according to an embodiment, and as shown in fig. 2, the server motherboard 200 includes a second interface 210 for pluggable connection with the independently-located BMC module 100; it should be noted that the connection may be a direct connection or an indirect connection via a connector; also, the number of the second interfaces 210 may be one or more.
Server motherboard 200 further includes a signal isolation circuit 220 for supporting signal isolation when BMC module 100 is hot-plugged with respect to server motherboard 200. Specifically, when BMC module 100 is plugged into server motherboard 200 or unplugged from server motherboard 200, the setting of signal isolation circuit 220 enables limiting the transient voltage on the signal line of the motherboard signal circuit to a reasonable level, thereby maintaining normal communication of the server. It should be noted that, the signal isolation circuit 220 needs to be adaptively designed according to the signal type (such as a high-speed signal or a low-speed signal) transmitted in the motherboard signal circuit of the server motherboard 200; further, the signal isolation circuit 220 may be implemented by, for example, serially connecting a buffer (buffer) in the signal circuit of the server main board 200.
It should be understood that the server motherboard also includes other conventional circuits such as motherboard power supply circuits.
The BMC module may further include other conventional circuits such as a signal circuit.
In the above, a server motherboard that can be connected to an independently installed BMC module in a pluggable manner is introduced.
Fig. 3 is a schematic structural diagram of a server management system according to an embodiment, and as shown in fig. 3, the server management system 300 includes the BMC module 100 and the server motherboard 200.
The BMC module 100 and the server motherboard 200 are connected to each other in a pluggable manner via the first interface 110 and the second interface 210 (the first interface and the second interface are not shown in fig. 3, see fig. 1 and fig. 2). In one embodiment, the server management system 300 further includes an inter-board connector 310 for connecting the BMC module 100 and the server motherboard 200 by connecting the first interface and the second interface respectively. In a specific embodiment, the board-to-board connector 310 includes a third interface and a fourth interface (not shown in fig. 3), the board-to-board connector 310 and the BMC module 100 are pluggable to each other through the first interface 100 and the third interface, and the board-to-board connector 310 and the server motherboard 200 are pluggable to each other through the second interface 210 and the fourth interface. As such, a pluggable connection between the BMC module 100 and the server motherboard 200 may be achieved using the board connector 310.
The BMC module 100 is provided with a power supply slow start circuit 120, and the slow start circuit 120 may conduct a BMC module power supply circuit and a motherboard power supply circuit, so as to implement power protection when the BMC module 100 performs hot plug with respect to the server motherboard 200.
Server motherboard 200 is provided with signal isolation circuit 220, and signal isolation circuit 220 may control the connection and disconnection of signals between server motherboard 200 and BMC module 100, thereby implementing signal isolation when BMC module 100 is hot-plugged with respect to server motherboard 200.
It should be noted that, for the description of the power soft start circuit 120 and the signal isolation circuit 220, reference may also be made to the related descriptions in the foregoing embodiments, which are not described herein again.
In the above, the hardware implementation part of the above scheme is introduced. By independently modularizing the BMC, a hardware circuit in charge of out-of-band management is integrated into a single board to realize modular design, and the server mainboard is adaptively designed, so that the BMC can be replaced without unpacking or powering down after a fault. Furthermore, it should be understood that fig. 1, fig. 2 and fig. 3 only schematically show the server motherboard, the BMC module and the server management system including both, and the shape and style of the server motherboard and the BMC module in practical application are not limited.
The software portion of the scheme may be implemented based on the design of the hardware portion. Fig. 4 shows a schematic diagram of interaction among multiple parties for implementing online thermal maintenance of a BMC module according to an embodiment, where the multiple parties include a BMC module that is disposed independently from a server motherboard, a BIOS (Basic Input Output System) that is solidified on a chip (typically, a ROM chip) in the server motherboard, and a CPLD (Complex Programmable Logic Device). It should be noted that the CPLD device includes related devices integrated in the BMC module and the server motherboard.
As shown in FIG. 4, the multi-party interaction includes the following steps:
and S410, monitoring abnormal operation of the BMC module by the CPLD device. In one embodiment, the CPLD device may determine whether the BMC module is abnormal by monitoring a heartbeat signal of the BMC module. Further, under the condition that the heartbeat signal sent by the BMC module according to the preset time interval (for example, 1s) is continuously monitored, the normal operation of the BMC module is judged; otherwise, judging that the BMC module operates abnormally under the condition that the heartbeat signal sent by the BMC module is not received after the preset time interval is exceeded.
In step S420, the CPLD device sends a first notification to the BIOS.
It should be noted that, in an embodiment, after step S410 and before step S420, the interaction process may further include step S412, where the CPLD device resets the BMC module, and further, after the BMC module is reset, if the CPLD device monitors that the BMC module still operates abnormally at step S414, the step S420 is executed, otherwise, the heartbeat signal of the BMC module continues to be detected.
The first notice indicates that the BMC module is abnormally operated. Based on this, in step S430, the BIOS system records an abnormal event of the BMC module according to the first notification. It should be appreciated that the BMC exception event may be categorized into an exception record that is stored in a log file of the BIOS system for subsequent invocation and analysis.
In this step, the BIOS system further performs fault isolation and alarm shielding for the BMC module according to the first notification. In one embodiment, the BIOS system may stop signal collection for the BMC module, filter fault signals and alarm signals for the BMC module, and thereby implement fault isolation and alarm shielding. In one embodiment, the signal isolation circuit is disposed in the server motherboard, and based on this, the BIOS system disconnects the signal circuit in the server motherboard from the BMC signal circuit by controlling the signal isolation circuit, so as to prevent a transient voltage from being generated in the signal circuit due to the subsequent removal of the BMC module, which may affect normal communication of the system.
On the other hand, in one embodiment, this step may further include: the BIOS system updates the state of the BMC module in the in-band management system to a fault and/or can be unplugged. It should be understood that the server in-band management refers to managing the server device under the service level os (operating system), and updating the state of the BMC module to be used for querying related services.
Further, in an embodiment, after the BIOS system performs fault isolation and/or alarm shielding for the BMC module, in step S440, an isolation completion flag is sent to the CPLD device. And, in step S450, the CPLD device prompts the maintainer that the BMC module can be removed according to the isolation completion flag. In one embodiment, the CPLD device prompts maintenance personnel that the BMC module can be removed by illuminating an indicator light. It is to be understood that the indicator light may assist in locating to the corresponding BMC module. In another embodiment, the CPLD device may also be prompted by voice. Therefore, maintenance personnel can replace the failed BMC module according to the prompt.
In another embodiment, the BIOS system updates the state of the BMC module in the in-band management system to be removable, and correspondingly, the CPLD device prompts a maintainer that the BMC module is removable after querying the BMC module in-band that the state is removable.
According to another embodiment, after step S450, the above interaction process may further include: and step S460, the CPLD device monitors that the BMC module normally operates. In one embodiment, the CPLD detects the heartbeat signal according to the in-place state of the BMC, and detects that the heartbeat signal is recovered to be normal, so that the normal operation of the BMC module is judged.
Further, the CPLD device sends a second notification to the BIOS system in step S470 to instruct the BMC module to resume normal operation, so that the BIOS system removes the fault isolation and/or the alarm shielding in step S480. In one embodiment, the BIOS system may resume signal acquisition to the BMC module, thereby removing fault isolation and alarm shielding. In one embodiment, the signal isolation circuit is disposed in the server motherboard, and based on this, the BIOS system restores the connection between the signal circuit in the server motherboard and the BMC signal circuit by controlling the signal isolation circuit, thereby restoring the normal communication related to the BMC module.
In the above, the software implementation part of the scheme is introduced. The operation state of the BMC module is monitored through the CPLD, when the BMC module is abnormal, the CPLD can inform the BIOS of fault isolation, and inform maintenance personnel of replacement operation of the BMC module after isolation is completed, and the system automatically recovers operation after replacement.
To sum up, for a hardware fault which cannot be solved by simply resetting the BMC, a series of processes of hot plugging the BMC module are realized by combining software and hardware disclosed in the embodiment of the specification, fault isolation is achieved, the BMC module can be quickly replaced without powering down a server, normal operation of the BMC module is automatically recovered after replacement, and no influence is caused on service operation.
In correspondence with the multi-party interaction of online thermal maintenance, the embodiment of the specification also discloses an online thermal maintenance system. Fig. 5 shows a schematic structural diagram of a thermal maintenance system of a BMC module according to an embodiment, wherein the BMC module is connected to a server motherboard in a pluggable manner. As shown in fig. 5, the system 500 includes:
the complex programmable logic device CPLD is used for sending a first notice to the BIOS of the basic input and output system after monitoring that the BMC module operates abnormally; the BIOS system is used for recording abnormal events of the BMC module according to the first notification, and sending an isolation completion mark to the CPLD device after fault isolation and/or alarm shielding aiming at the BMC module; and the CPLD device is also used for prompting maintenance personnel that the BMC module can be pulled out according to the isolation completion mark.
In one embodiment, the CPLD device is specifically configured to: resetting the BMC module after monitoring that the BMC module operates abnormally; and sending the first notification to the BIOS system under the condition that the operation of the BMC module is monitored to be still abnormal.
In one embodiment, a signal isolation circuit is disposed in a server motherboard, and the BIOS system is configured to perform fault isolation and/or alarm shielding for a BMC module, and specifically includes: and disconnecting the signal circuit in the server mainboard from the signal circuit in the BMC module by controlling the signal isolation circuit.
In one embodiment, the BIOS system is further to: and updating the state of the BMC module in the in-band management system to be a fault and/or removable according to the first notification.
In one embodiment, the CPLD device is specifically configured to: and prompting maintenance personnel that the BMC module can be pulled out by lighting an indicator lamp according to the isolation completion mark.
In one embodiment, the CPLD device is further configured to: after the BMC module is monitored to recover to normal operation, sending a second notification to the BIOS system; the BIOS system is further configured to: and releasing the fault isolation and/or the alarm shielding according to the second notice.
Further, in a specific embodiment, a signal isolation circuit is arranged in the server motherboard; the BIOS system is configured to remove the fault isolation and/or the alarm mask, and specifically includes: and restoring the connection between the signal circuit in the server mainboard and the signal circuit in the BMC module by controlling the signal isolation circuit.
In one embodiment, the CPLD device is specifically configured to: and judging whether the BMC module normally operates or not by monitoring the heartbeat signal of the BMC module.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 4.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory having stored therein executable code, and a processor that, when executing the executable code, implements the method described in connection with fig. 4.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.
Claims (13)
1. A Baseboard Management Controller (BMC) module, configured independently of a server motherboard, the BMC module comprising:
the first interface is used for being connected with the server mainboard in a pluggable manner;
and the power supply slow starting circuit is used for supporting power supply protection when the BMC module is subjected to hot plug relative to the server mainboard.
2. A server motherboard, comprising:
the second interface is used for being connected with the independently arranged baseboard management controller BMC module in a pluggable manner;
and the signal isolation circuit is used for supporting signal isolation when the BMC module is subjected to hot plug relative to the server mainboard.
3. A server management system, comprising: the system comprises a server mainboard and a Baseboard Management Controller (BMC) module which is arranged independently of the server mainboard;
the BMC module and the server mainboard are connected in a pluggable manner through a first interface and a second interface of the BMC module and the server mainboard respectively;
the BMC module is provided with a power supply slow starting circuit and is used for supporting power supply protection when the BMC module is in hot plug relative to the server mainboard;
the server mainboard is provided with a signal isolation circuit for supporting signal isolation when the BMC module is in hot plug relative to the server mainboard.
4. The system of claim 3, further comprising:
and the inter-board connector is used for connecting the BMC module and the server mainboard by respectively connecting the first interface and the second interface.
5. A thermal maintenance method of a Baseboard Management Controller (BMC) module is disclosed, wherein the BMC module is connected with a server mainboard in a pluggable manner; the method comprises the following steps:
after monitoring that the BMC module operates abnormally, the complex programmable logic device CPLD sends a first notification to the BIOS of the basic input/output system;
the BIOS system records an abnormal event of the BMC module according to the first notification, and sends an isolation completion mark to the CPLD device after fault isolation and/or alarm shielding aiming at the BMC module;
and the CPLD device prompts maintainers that the BMC module can be pulled out according to the isolation completion mark.
6. The method of claim 5, wherein the sending, by the CPLD, the first notification to the BIOS after the abnormal operation of the BMC module is monitored comprises:
after monitoring that the BMC module operates abnormally, the CPLD device resets the BMC module;
and the CPLD device sends the first notice to the BIOS system under the condition that the operation of the BMC module is still abnormal.
7. The method of claim 5, wherein a signal isolation circuit is provided in the server motherboard; wherein, carry out fault isolation and/or alarm shielding to BMC module, include:
and disconnecting the signal circuit in the server mainboard from the signal circuit in the BMC module by controlling the signal isolation circuit.
8. The method of claim 5, wherein the BIOS system further updates a status of the BMC module in the in-band management system to failed, and/or is unpluggable, based on the first notification.
9. The method of claim 5, wherein the CPLD device prompts maintenance personnel that the BMC module is unpluggable based on the isolation complete flag, including:
and the CPLD device carries out the prompt by lightening an indicator lamp according to the isolation completion mark.
10. The method of claim 5, wherein after the CPLD device prompts a maintainer that the BMC module is unpluggable based on the isolation complete flag, the method further comprises:
after monitoring that the BMC module recovers normal operation, the CPLD device sends a second notification to the BIOS system;
and the BIOS system removes the fault isolation and/or the alarm shielding according to the second notice.
11. The method of claim 10, wherein a signal isolation circuit is provided in the server motherboard; wherein removing the fault isolation and/or alarm shielding comprises:
and restoring the connection between the signal circuit in the server mainboard and the signal circuit in the BMC module by controlling the signal isolation circuit.
12. The method of claim 5, wherein the CPLD device determines whether the BMC module is operating properly by monitoring a heartbeat signal of the BMC module.
13. A thermal maintenance system of a baseboard management controller, BMC, module that is pluggably connected to a server motherboard, the system comprising:
the complex programmable logic device CPLD is used for sending a first notice to the BIOS of the basic input and output system after monitoring that the BMC module operates abnormally;
the BIOS system is used for recording abnormal events of the BMC module according to the first notification, and sending an isolation completion mark to the CPLD device after fault isolation and/or alarm shielding aiming at the BMC module;
and the CPLD device is also used for prompting maintenance personnel that the BMC module can be pulled out according to the isolation completion mark.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111161915.7A CN113886307A (en) | 2021-09-30 | 2021-09-30 | Thermal maintenance method and system for BMC module, server mainboard and BMC module |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111161915.7A CN113886307A (en) | 2021-09-30 | 2021-09-30 | Thermal maintenance method and system for BMC module, server mainboard and BMC module |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113886307A true CN113886307A (en) | 2022-01-04 |
Family
ID=79004910
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111161915.7A Pending CN113886307A (en) | 2021-09-30 | 2021-09-30 | Thermal maintenance method and system for BMC module, server mainboard and BMC module |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113886307A (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030182483A1 (en) * | 2002-03-08 | 2003-09-25 | Hawkins Peter A. | System management controller negotiation protocol |
CN101963949A (en) * | 2010-10-11 | 2011-02-02 | 北京星网锐捷网络技术有限公司 | Hot plug realization method, hot plug system and board card |
CN201804320U (en) * | 2010-08-20 | 2011-04-20 | 环达电脑(上海)有限公司 | Hot plug type BMC upgrade module |
CN102325081A (en) * | 2011-07-15 | 2012-01-18 | 福建星网锐捷网络有限公司 | Hot-pluggable isolation method, device and master control board |
US20140344431A1 (en) * | 2013-05-16 | 2014-11-20 | Aspeed Technology Inc. | Baseboard management system architecture |
CN104169905A (en) * | 2012-03-28 | 2014-11-26 | 英特尔公司 | Configurable and fault-tolerant baseboard management controller arrangement |
CN109117404A (en) * | 2018-07-17 | 2019-01-01 | 深圳市同泰怡信息技术有限公司 | A kind of hot-swappable server B BU device |
CN109471770A (en) * | 2018-09-11 | 2019-03-15 | 华为技术有限公司 | A kind of method for managing system and device |
CN113204466A (en) * | 2021-04-29 | 2021-08-03 | 山东英信计算机技术有限公司 | Over-temperature protection method and electronic equipment |
-
2021
- 2021-09-30 CN CN202111161915.7A patent/CN113886307A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030182483A1 (en) * | 2002-03-08 | 2003-09-25 | Hawkins Peter A. | System management controller negotiation protocol |
CN201804320U (en) * | 2010-08-20 | 2011-04-20 | 环达电脑(上海)有限公司 | Hot plug type BMC upgrade module |
CN101963949A (en) * | 2010-10-11 | 2011-02-02 | 北京星网锐捷网络技术有限公司 | Hot plug realization method, hot plug system and board card |
CN102325081A (en) * | 2011-07-15 | 2012-01-18 | 福建星网锐捷网络有限公司 | Hot-pluggable isolation method, device and master control board |
CN104169905A (en) * | 2012-03-28 | 2014-11-26 | 英特尔公司 | Configurable and fault-tolerant baseboard management controller arrangement |
US20140344431A1 (en) * | 2013-05-16 | 2014-11-20 | Aspeed Technology Inc. | Baseboard management system architecture |
CN109117404A (en) * | 2018-07-17 | 2019-01-01 | 深圳市同泰怡信息技术有限公司 | A kind of hot-swappable server B BU device |
CN109471770A (en) * | 2018-09-11 | 2019-03-15 | 华为技术有限公司 | A kind of method for managing system and device |
CN113204466A (en) * | 2021-04-29 | 2021-08-03 | 山东英信计算机技术有限公司 | Over-temperature protection method and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
USRE39855E1 (en) | Power management strategy to support hot swapping of system blades during run time | |
EP0373773B1 (en) | Disengaging electrical circuit boards from power-supply units | |
US20160073541A1 (en) | Separated server back plane | |
CN111399879A (en) | Firmware upgrading system and method of CP L D | |
CN115686935A (en) | Data backup method, computer device and storage medium | |
US7490252B2 (en) | Abnormal power interruption internal circuitry protection method and system for computer platform | |
CN113886307A (en) | Thermal maintenance method and system for BMC module, server mainboard and BMC module | |
CN218824636U (en) | Power supply detection device for server hard disk backboard | |
BRPI0613779A2 (en) | modular fieldbus segment protector | |
CN116540856A (en) | Device, method and server for correcting state after power supply module fault recovery | |
CN111984471A (en) | Cabinet power BMC redundancy management system and method | |
CN111858148A (en) | PCIE Switch chip configuration file recovery system and method | |
US6801973B2 (en) | Hot swap circuit module | |
CN115098294A (en) | Abnormal event processing method, electronic equipment and management terminal | |
US6415391B1 (en) | Control method and system for resetting backup data | |
US20070204088A1 (en) | Modularized circuit board bus connection control method and system | |
CN111209143B (en) | Recovery method and device of embedded system, embedded device and storage medium | |
CN111708426A (en) | Server and power supply protection circuit thereof | |
CN112463707A (en) | I2C link management system and method | |
US7263569B1 (en) | Method and system for distributing power in a computer system | |
CN114116315B (en) | USB failure recovery method and system applied to industrial information security mainboard | |
CN214151684U (en) | Mainboard assembly with monitoring function and system thereof | |
CN211148841U (en) | DC Cycle testing arrangement | |
JPH11175206A (en) | Peripheral equipment connector | |
CN117666746B (en) | Multi-node server, method, device and medium applied to multi-node server |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 40065675 Country of ref document: HK |