CN113886307A - Thermal maintenance method and system for BMC module, server mainboard and BMC module - Google Patents

Thermal maintenance method and system for BMC module, server mainboard and BMC module Download PDF

Info

Publication number
CN113886307A
CN113886307A CN202111161915.7A CN202111161915A CN113886307A CN 113886307 A CN113886307 A CN 113886307A CN 202111161915 A CN202111161915 A CN 202111161915A CN 113886307 A CN113886307 A CN 113886307A
Authority
CN
China
Prior art keywords
bmc module
bmc
module
server
isolation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111161915.7A
Other languages
Chinese (zh)
Inventor
郑龙
张胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202111161915.7A priority Critical patent/CN113886307A/en
Publication of CN113886307A publication Critical patent/CN113886307A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4063Device-to-bus coupling
    • G06F13/4068Electrical coupling
    • G06F13/4081Live connection to bus, e.g. hot-plugging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)

Abstract

An embodiment of the present specification provides a server management system, including: the system comprises a server mainboard and a Baseboard Management Controller (BMC) module which is arranged independently of the server mainboard; the BMC module and the server mainboard are connected in a pluggable manner through a first interface and a second interface of the BMC module and the server mainboard respectively; the BMC module is provided with a power supply slow starting circuit and is used for supporting power supply protection when the BMC module is in hot plug relative to the server mainboard; the server mainboard is provided with a signal isolation circuit for supporting signal isolation when the BMC module is in hot plug relative to the server mainboard.

Description

Thermal maintenance method and system for BMC module, server mainboard and BMC module
Technical Field
One or more embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a baseboard management controller BMC module, a server motherboard, a server management system, and a thermal maintenance method and system for a baseboard management controller BMC module.
Background
The BMC (baseboard management controller) is used for monitoring and managing the in-place conditions of a server fan, a power supply and equipment in a server system, and has data interaction with the CPU.
At present, if a BMC fails, a service needs to be stopped and the entire server needs to be offline for maintenance, which results in a long maintenance period and high cost. Based on this, the embodiments of the present specification pioneer a BMC online hot maintenance scheme that does not affect the service.
Disclosure of Invention
One or more embodiments of the present disclosure describe an online thermal maintenance method and system for a baseboard management controller BMC module, a server motherboard, a server management system, and a BMC module, where the BMC module is modularized, so that the BMC module and the server motherboard are independently arranged and connected in a hot-pluggable manner, thereby implementing online thermal maintenance for the BMC module without affecting services.
According to a first aspect, a BMC module is provided, which is independent of a server motherboard, the BMC module comprising: the first interface is used for being connected with the server mainboard in a pluggable manner; and the power supply slow starting circuit is used for supporting power supply protection when the BMC module is subjected to hot plug relative to the server mainboard.
According to a second aspect, there is provided a server motherboard comprising: the second interface is used for being connected with the independently arranged BMC module in a pluggable manner; and the signal isolation circuit is used for supporting signal isolation when the BMC module is subjected to hot plug relative to the server mainboard.
According to a third aspect, there is provided a server management system comprising: the system comprises a server mainboard and a BMC module which is independent from the server mainboard; the BMC module and the server mainboard are connected in a pluggable manner through a first interface and a second interface of the BMC module and the server mainboard respectively; the BMC module is provided with a power supply slow starting circuit and is used for supporting power supply protection when the BMC module is in hot plug relative to the server mainboard; the server mainboard is provided with a signal isolation circuit for supporting signal isolation when the BMC module is in hot plug relative to the server mainboard.
In one embodiment, the system further comprises: and the inter-board connector is used for connecting the BMC module and the server mainboard by respectively connecting the first interface and the second interface.
According to a fourth aspect, an online thermal maintenance method for a BMC module is provided, where the BMC module is connected to a server motherboard in a pluggable manner; the method comprises the following steps: after monitoring that the BMC module operates abnormally, the complex programmable logic device CPLD sends a first notification to the BIOS of the basic input/output system; the BIOS system records an abnormal event of the BMC module according to the first notification, and sends an isolation completion mark to the CPLD device after fault isolation and/or alarm shielding aiming at the BMC module; and the CPLD device prompts maintainers that the BMC module can be pulled out according to the isolation completion mark.
In one embodiment, after monitoring that the BMC module is abnormally operated, the CPLD sends a first notification to the BIOS, including: after monitoring that the BMC module operates abnormally, the CPLD device resets the BMC module; and the CPLD device sends the first notice to the BIOS system under the condition that the operation of the BMC module is still abnormal.
In one embodiment, a signal isolation circuit is arranged in the server mainboard; wherein, carry out fault isolation and/or alarm shielding to BMC module, include: and disconnecting the signal circuit in the server mainboard from the signal circuit in the BMC module by controlling the signal isolation circuit.
In one embodiment, the BIOS system further updates a status of the BMC module in the in-band management system to a failure, and/or is removable, based on the first notification.
In one embodiment, the prompting, by the CPLD device, the removal of the BMC module by the maintenance person according to the isolation completion flag includes: and the CPLD device carries out the prompt by lightening an indicator lamp according to the isolation completion mark.
In one embodiment, after the CPLD device prompts a maintainer that the BMC module can be unplugged according to the isolation complete flag, the method further includes: after monitoring that the BMC module recovers normal operation, the CPLD device sends a second notification to the BIOS system; and the BIOS system removes the fault isolation and/or the alarm shielding according to the second notice.
In a specific embodiment, a signal isolation circuit is arranged in the server mainboard; wherein removing the fault isolation and/or alarm shielding comprises: and restoring the connection between the signal circuit in the server mainboard and the signal circuit in the BMC module by controlling the signal isolation circuit.
In one embodiment, the CPLD device determines whether the BMC module is operating normally by monitoring a heartbeat signal of the BMC module.
According to a fifth aspect, there is provided a thermal maintenance system of a BMC module, the BMC module being pluggable to a server motherboard, the system comprising: the complex programmable logic device CPLD is used for sending a first notice to the BIOS of the basic input and output system after monitoring that the BMC module operates abnormally; the BIOS system is used for recording abnormal events of the BMC module according to the first notification, and sending an isolation completion mark to the CPLD device after fault isolation and/or alarm shielding aiming at the BMC module; and the CPLD device is also used for prompting maintenance personnel that the BMC module can be pulled out according to the isolation completion mark.
According to a sixth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method provided by the fourth aspect described above.
According to a seventh aspect, there is provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method provided by the fourth aspect above.
To sum up, for a hardware fault which cannot be solved by simply resetting the BMC, a series of processes of hot plugging the BMC module are realized by combining software and hardware disclosed in the embodiment of the specification, fault isolation is achieved, the BMC module can be quickly replaced without powering down a server, normal operation of the BMC module is automatically recovered after replacement, and no influence is caused on service operation.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 illustrates a block diagram of a BMC module according to one embodiment;
FIG. 2 illustrates a schematic diagram of a server motherboard, according to one embodiment;
FIG. 3 shows a schematic structural diagram of a server management system according to one embodiment;
FIG. 4 illustrates a multi-party interaction diagram implementing BMC module online thermal maintenance, according to one embodiment;
FIG. 5 illustrates a block diagram of a thermal maintenance system of the BMC module, according to one embodiment.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
At present, a BMC is integrated in a server motherboard, and after a BMC small system fails, the BMC may be reset by pressing a uid (unit identification light) button for a long time.
Based on this, the embodiments of the present specification pioneer a BMC online hot maintenance scheme that does not affect the service. The scheme comprises a hardware implementation part and a software implementation part. And in the hardware implementation part, the BMC is modularized, so that the BMC and the server mainboard are independently arranged and can be connected in a hot-pluggable manner. Specifically, the following introduces a scheme of a hardware implementation part from three points of view, namely, a BMC module, a server motherboard, and a server management system including the BMC module and the server motherboard.
Fig. 1 is a schematic diagram illustrating a configuration of a BMC module according to an embodiment, where the BMC module is provided independently of a server motherboard. It should be noted that the BMC module disclosed in the embodiments of the present disclosure may refer to various management units for managing a server out of band, and the names of the management units include, but are not limited to, a server management board, a server management module, and a server management unit.
As shown in fig. 1, the BMC module 100 includes a first interface 110 for connecting with a server motherboard in a pluggable manner; it is to be understood that this connection may be a direct connection or an indirect connection via a connector; also, the number of the first interfaces 110 may be one or more; pluggable means that the BMC module is detachably connected with the server mainboard, and the BMC module can be inserted into or pulled out of the server mainboard.
The BMC module 100 further includes a power buffer starting circuit 120, which is used to support power protection when the BMC module 100 is hot-plugged with respect to a server motherboard. Specifically, when the BMC module 100 is inserted into or removed from the server motherboard, the power supply slow start circuit 120 disposed thereon can limit the transient surge current on the server power bus to a lower level, and simultaneously, the voltage of the whole server is not reduced, so that the damage to the server power supply in the plugging process is avoided, and thus, the hot plugging of the power supply is realized.
In one embodiment, the power soft start circuit 120 is implemented as a voltage slope type. In another embodiment, the power soft start circuit 120 is implemented as a current slope type. It should be noted that, the voltage slope type slow start circuit has a simple structure, but the change of the output current thereof is greatly influenced by the load impedance, and the change of the output current of the current slope type slow start circuit is not influenced by the load, but the circuit structure is complex, so that the power supply slow start circuit 120 can be realized as a voltage slope type or a current slope type according to the actual requirement. On the other hand, a slow starting circuit can be designed by using an MOS (metal oxide semiconductor) tube, the MOS tube has the characteristics of low on-resistance Rds and simple driving, and the slow starting circuit can be formed by adding a small number of components around the MOS tube; in general, PMOS is used for the positive power supply, and NMOS is used for the negative power supply.
It is to be understood that the power soft start circuit 120 is connected to the BMC module power supply circuit, and the BMC module further includes other conventional circuits such as a signal circuit.
Above, a description is given of an independent modular BMC.
Fig. 2 is a schematic structural diagram of a server motherboard according to an embodiment, and as shown in fig. 2, the server motherboard 200 includes a second interface 210 for pluggable connection with the independently-located BMC module 100; it should be noted that the connection may be a direct connection or an indirect connection via a connector; also, the number of the second interfaces 210 may be one or more.
Server motherboard 200 further includes a signal isolation circuit 220 for supporting signal isolation when BMC module 100 is hot-plugged with respect to server motherboard 200. Specifically, when BMC module 100 is plugged into server motherboard 200 or unplugged from server motherboard 200, the setting of signal isolation circuit 220 enables limiting the transient voltage on the signal line of the motherboard signal circuit to a reasonable level, thereby maintaining normal communication of the server. It should be noted that, the signal isolation circuit 220 needs to be adaptively designed according to the signal type (such as a high-speed signal or a low-speed signal) transmitted in the motherboard signal circuit of the server motherboard 200; further, the signal isolation circuit 220 may be implemented by, for example, serially connecting a buffer (buffer) in the signal circuit of the server main board 200.
It should be understood that the server motherboard also includes other conventional circuits such as motherboard power supply circuits.
The BMC module may further include other conventional circuits such as a signal circuit.
In the above, a server motherboard that can be connected to an independently installed BMC module in a pluggable manner is introduced.
Fig. 3 is a schematic structural diagram of a server management system according to an embodiment, and as shown in fig. 3, the server management system 300 includes the BMC module 100 and the server motherboard 200.
The BMC module 100 and the server motherboard 200 are connected to each other in a pluggable manner via the first interface 110 and the second interface 210 (the first interface and the second interface are not shown in fig. 3, see fig. 1 and fig. 2). In one embodiment, the server management system 300 further includes an inter-board connector 310 for connecting the BMC module 100 and the server motherboard 200 by connecting the first interface and the second interface respectively. In a specific embodiment, the board-to-board connector 310 includes a third interface and a fourth interface (not shown in fig. 3), the board-to-board connector 310 and the BMC module 100 are pluggable to each other through the first interface 100 and the third interface, and the board-to-board connector 310 and the server motherboard 200 are pluggable to each other through the second interface 210 and the fourth interface. As such, a pluggable connection between the BMC module 100 and the server motherboard 200 may be achieved using the board connector 310.
The BMC module 100 is provided with a power supply slow start circuit 120, and the slow start circuit 120 may conduct a BMC module power supply circuit and a motherboard power supply circuit, so as to implement power protection when the BMC module 100 performs hot plug with respect to the server motherboard 200.
Server motherboard 200 is provided with signal isolation circuit 220, and signal isolation circuit 220 may control the connection and disconnection of signals between server motherboard 200 and BMC module 100, thereby implementing signal isolation when BMC module 100 is hot-plugged with respect to server motherboard 200.
It should be noted that, for the description of the power soft start circuit 120 and the signal isolation circuit 220, reference may also be made to the related descriptions in the foregoing embodiments, which are not described herein again.
In the above, the hardware implementation part of the above scheme is introduced. By independently modularizing the BMC, a hardware circuit in charge of out-of-band management is integrated into a single board to realize modular design, and the server mainboard is adaptively designed, so that the BMC can be replaced without unpacking or powering down after a fault. Furthermore, it should be understood that fig. 1, fig. 2 and fig. 3 only schematically show the server motherboard, the BMC module and the server management system including both, and the shape and style of the server motherboard and the BMC module in practical application are not limited.
The software portion of the scheme may be implemented based on the design of the hardware portion. Fig. 4 shows a schematic diagram of interaction among multiple parties for implementing online thermal maintenance of a BMC module according to an embodiment, where the multiple parties include a BMC module that is disposed independently from a server motherboard, a BIOS (Basic Input Output System) that is solidified on a chip (typically, a ROM chip) in the server motherboard, and a CPLD (Complex Programmable Logic Device). It should be noted that the CPLD device includes related devices integrated in the BMC module and the server motherboard.
As shown in FIG. 4, the multi-party interaction includes the following steps:
and S410, monitoring abnormal operation of the BMC module by the CPLD device. In one embodiment, the CPLD device may determine whether the BMC module is abnormal by monitoring a heartbeat signal of the BMC module. Further, under the condition that the heartbeat signal sent by the BMC module according to the preset time interval (for example, 1s) is continuously monitored, the normal operation of the BMC module is judged; otherwise, judging that the BMC module operates abnormally under the condition that the heartbeat signal sent by the BMC module is not received after the preset time interval is exceeded.
In step S420, the CPLD device sends a first notification to the BIOS.
It should be noted that, in an embodiment, after step S410 and before step S420, the interaction process may further include step S412, where the CPLD device resets the BMC module, and further, after the BMC module is reset, if the CPLD device monitors that the BMC module still operates abnormally at step S414, the step S420 is executed, otherwise, the heartbeat signal of the BMC module continues to be detected.
The first notice indicates that the BMC module is abnormally operated. Based on this, in step S430, the BIOS system records an abnormal event of the BMC module according to the first notification. It should be appreciated that the BMC exception event may be categorized into an exception record that is stored in a log file of the BIOS system for subsequent invocation and analysis.
In this step, the BIOS system further performs fault isolation and alarm shielding for the BMC module according to the first notification. In one embodiment, the BIOS system may stop signal collection for the BMC module, filter fault signals and alarm signals for the BMC module, and thereby implement fault isolation and alarm shielding. In one embodiment, the signal isolation circuit is disposed in the server motherboard, and based on this, the BIOS system disconnects the signal circuit in the server motherboard from the BMC signal circuit by controlling the signal isolation circuit, so as to prevent a transient voltage from being generated in the signal circuit due to the subsequent removal of the BMC module, which may affect normal communication of the system.
On the other hand, in one embodiment, this step may further include: the BIOS system updates the state of the BMC module in the in-band management system to a fault and/or can be unplugged. It should be understood that the server in-band management refers to managing the server device under the service level os (operating system), and updating the state of the BMC module to be used for querying related services.
Further, in an embodiment, after the BIOS system performs fault isolation and/or alarm shielding for the BMC module, in step S440, an isolation completion flag is sent to the CPLD device. And, in step S450, the CPLD device prompts the maintainer that the BMC module can be removed according to the isolation completion flag. In one embodiment, the CPLD device prompts maintenance personnel that the BMC module can be removed by illuminating an indicator light. It is to be understood that the indicator light may assist in locating to the corresponding BMC module. In another embodiment, the CPLD device may also be prompted by voice. Therefore, maintenance personnel can replace the failed BMC module according to the prompt.
In another embodiment, the BIOS system updates the state of the BMC module in the in-band management system to be removable, and correspondingly, the CPLD device prompts a maintainer that the BMC module is removable after querying the BMC module in-band that the state is removable.
According to another embodiment, after step S450, the above interaction process may further include: and step S460, the CPLD device monitors that the BMC module normally operates. In one embodiment, the CPLD detects the heartbeat signal according to the in-place state of the BMC, and detects that the heartbeat signal is recovered to be normal, so that the normal operation of the BMC module is judged.
Further, the CPLD device sends a second notification to the BIOS system in step S470 to instruct the BMC module to resume normal operation, so that the BIOS system removes the fault isolation and/or the alarm shielding in step S480. In one embodiment, the BIOS system may resume signal acquisition to the BMC module, thereby removing fault isolation and alarm shielding. In one embodiment, the signal isolation circuit is disposed in the server motherboard, and based on this, the BIOS system restores the connection between the signal circuit in the server motherboard and the BMC signal circuit by controlling the signal isolation circuit, thereby restoring the normal communication related to the BMC module.
In the above, the software implementation part of the scheme is introduced. The operation state of the BMC module is monitored through the CPLD, when the BMC module is abnormal, the CPLD can inform the BIOS of fault isolation, and inform maintenance personnel of replacement operation of the BMC module after isolation is completed, and the system automatically recovers operation after replacement.
To sum up, for a hardware fault which cannot be solved by simply resetting the BMC, a series of processes of hot plugging the BMC module are realized by combining software and hardware disclosed in the embodiment of the specification, fault isolation is achieved, the BMC module can be quickly replaced without powering down a server, normal operation of the BMC module is automatically recovered after replacement, and no influence is caused on service operation.
In correspondence with the multi-party interaction of online thermal maintenance, the embodiment of the specification also discloses an online thermal maintenance system. Fig. 5 shows a schematic structural diagram of a thermal maintenance system of a BMC module according to an embodiment, wherein the BMC module is connected to a server motherboard in a pluggable manner. As shown in fig. 5, the system 500 includes:
the complex programmable logic device CPLD is used for sending a first notice to the BIOS of the basic input and output system after monitoring that the BMC module operates abnormally; the BIOS system is used for recording abnormal events of the BMC module according to the first notification, and sending an isolation completion mark to the CPLD device after fault isolation and/or alarm shielding aiming at the BMC module; and the CPLD device is also used for prompting maintenance personnel that the BMC module can be pulled out according to the isolation completion mark.
In one embodiment, the CPLD device is specifically configured to: resetting the BMC module after monitoring that the BMC module operates abnormally; and sending the first notification to the BIOS system under the condition that the operation of the BMC module is monitored to be still abnormal.
In one embodiment, a signal isolation circuit is disposed in a server motherboard, and the BIOS system is configured to perform fault isolation and/or alarm shielding for a BMC module, and specifically includes: and disconnecting the signal circuit in the server mainboard from the signal circuit in the BMC module by controlling the signal isolation circuit.
In one embodiment, the BIOS system is further to: and updating the state of the BMC module in the in-band management system to be a fault and/or removable according to the first notification.
In one embodiment, the CPLD device is specifically configured to: and prompting maintenance personnel that the BMC module can be pulled out by lighting an indicator lamp according to the isolation completion mark.
In one embodiment, the CPLD device is further configured to: after the BMC module is monitored to recover to normal operation, sending a second notification to the BIOS system; the BIOS system is further configured to: and releasing the fault isolation and/or the alarm shielding according to the second notice.
Further, in a specific embodiment, a signal isolation circuit is arranged in the server motherboard; the BIOS system is configured to remove the fault isolation and/or the alarm mask, and specifically includes: and restoring the connection between the signal circuit in the server mainboard and the signal circuit in the BMC module by controlling the signal isolation circuit.
In one embodiment, the CPLD device is specifically configured to: and judging whether the BMC module normally operates or not by monitoring the heartbeat signal of the BMC module.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 4.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory having stored therein executable code, and a processor that, when executing the executable code, implements the method described in connection with fig. 4.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (13)

1. A Baseboard Management Controller (BMC) module, configured independently of a server motherboard, the BMC module comprising:
the first interface is used for being connected with the server mainboard in a pluggable manner;
and the power supply slow starting circuit is used for supporting power supply protection when the BMC module is subjected to hot plug relative to the server mainboard.
2. A server motherboard, comprising:
the second interface is used for being connected with the independently arranged baseboard management controller BMC module in a pluggable manner;
and the signal isolation circuit is used for supporting signal isolation when the BMC module is subjected to hot plug relative to the server mainboard.
3. A server management system, comprising: the system comprises a server mainboard and a Baseboard Management Controller (BMC) module which is arranged independently of the server mainboard;
the BMC module and the server mainboard are connected in a pluggable manner through a first interface and a second interface of the BMC module and the server mainboard respectively;
the BMC module is provided with a power supply slow starting circuit and is used for supporting power supply protection when the BMC module is in hot plug relative to the server mainboard;
the server mainboard is provided with a signal isolation circuit for supporting signal isolation when the BMC module is in hot plug relative to the server mainboard.
4. The system of claim 3, further comprising:
and the inter-board connector is used for connecting the BMC module and the server mainboard by respectively connecting the first interface and the second interface.
5. A thermal maintenance method of a Baseboard Management Controller (BMC) module is disclosed, wherein the BMC module is connected with a server mainboard in a pluggable manner; the method comprises the following steps:
after monitoring that the BMC module operates abnormally, the complex programmable logic device CPLD sends a first notification to the BIOS of the basic input/output system;
the BIOS system records an abnormal event of the BMC module according to the first notification, and sends an isolation completion mark to the CPLD device after fault isolation and/or alarm shielding aiming at the BMC module;
and the CPLD device prompts maintainers that the BMC module can be pulled out according to the isolation completion mark.
6. The method of claim 5, wherein the sending, by the CPLD, the first notification to the BIOS after the abnormal operation of the BMC module is monitored comprises:
after monitoring that the BMC module operates abnormally, the CPLD device resets the BMC module;
and the CPLD device sends the first notice to the BIOS system under the condition that the operation of the BMC module is still abnormal.
7. The method of claim 5, wherein a signal isolation circuit is provided in the server motherboard; wherein, carry out fault isolation and/or alarm shielding to BMC module, include:
and disconnecting the signal circuit in the server mainboard from the signal circuit in the BMC module by controlling the signal isolation circuit.
8. The method of claim 5, wherein the BIOS system further updates a status of the BMC module in the in-band management system to failed, and/or is unpluggable, based on the first notification.
9. The method of claim 5, wherein the CPLD device prompts maintenance personnel that the BMC module is unpluggable based on the isolation complete flag, including:
and the CPLD device carries out the prompt by lightening an indicator lamp according to the isolation completion mark.
10. The method of claim 5, wherein after the CPLD device prompts a maintainer that the BMC module is unpluggable based on the isolation complete flag, the method further comprises:
after monitoring that the BMC module recovers normal operation, the CPLD device sends a second notification to the BIOS system;
and the BIOS system removes the fault isolation and/or the alarm shielding according to the second notice.
11. The method of claim 10, wherein a signal isolation circuit is provided in the server motherboard; wherein removing the fault isolation and/or alarm shielding comprises:
and restoring the connection between the signal circuit in the server mainboard and the signal circuit in the BMC module by controlling the signal isolation circuit.
12. The method of claim 5, wherein the CPLD device determines whether the BMC module is operating properly by monitoring a heartbeat signal of the BMC module.
13. A thermal maintenance system of a baseboard management controller, BMC, module that is pluggably connected to a server motherboard, the system comprising:
the complex programmable logic device CPLD is used for sending a first notice to the BIOS of the basic input and output system after monitoring that the BMC module operates abnormally;
the BIOS system is used for recording abnormal events of the BMC module according to the first notification, and sending an isolation completion mark to the CPLD device after fault isolation and/or alarm shielding aiming at the BMC module;
and the CPLD device is also used for prompting maintenance personnel that the BMC module can be pulled out according to the isolation completion mark.
CN202111161915.7A 2021-09-30 2021-09-30 Thermal maintenance method and system for BMC module, server mainboard and BMC module Pending CN113886307A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111161915.7A CN113886307A (en) 2021-09-30 2021-09-30 Thermal maintenance method and system for BMC module, server mainboard and BMC module

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111161915.7A CN113886307A (en) 2021-09-30 2021-09-30 Thermal maintenance method and system for BMC module, server mainboard and BMC module

Publications (1)

Publication Number Publication Date
CN113886307A true CN113886307A (en) 2022-01-04

Family

ID=79004910

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111161915.7A Pending CN113886307A (en) 2021-09-30 2021-09-30 Thermal maintenance method and system for BMC module, server mainboard and BMC module

Country Status (1)

Country Link
CN (1) CN113886307A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030182483A1 (en) * 2002-03-08 2003-09-25 Hawkins Peter A. System management controller negotiation protocol
CN101963949A (en) * 2010-10-11 2011-02-02 北京星网锐捷网络技术有限公司 Hot plug realization method, hot plug system and board card
CN201804320U (en) * 2010-08-20 2011-04-20 环达电脑(上海)有限公司 Hot plug type BMC upgrade module
CN102325081A (en) * 2011-07-15 2012-01-18 福建星网锐捷网络有限公司 Hot-pluggable isolation method, device and master control board
US20140344431A1 (en) * 2013-05-16 2014-11-20 Aspeed Technology Inc. Baseboard management system architecture
CN104169905A (en) * 2012-03-28 2014-11-26 英特尔公司 Configurable and fault-tolerant baseboard management controller arrangement
CN109117404A (en) * 2018-07-17 2019-01-01 深圳市同泰怡信息技术有限公司 A kind of hot-swappable server B BU device
CN109471770A (en) * 2018-09-11 2019-03-15 华为技术有限公司 A kind of method for managing system and device
CN113204466A (en) * 2021-04-29 2021-08-03 山东英信计算机技术有限公司 Over-temperature protection method and electronic equipment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030182483A1 (en) * 2002-03-08 2003-09-25 Hawkins Peter A. System management controller negotiation protocol
CN201804320U (en) * 2010-08-20 2011-04-20 环达电脑(上海)有限公司 Hot plug type BMC upgrade module
CN101963949A (en) * 2010-10-11 2011-02-02 北京星网锐捷网络技术有限公司 Hot plug realization method, hot plug system and board card
CN102325081A (en) * 2011-07-15 2012-01-18 福建星网锐捷网络有限公司 Hot-pluggable isolation method, device and master control board
CN104169905A (en) * 2012-03-28 2014-11-26 英特尔公司 Configurable and fault-tolerant baseboard management controller arrangement
US20140344431A1 (en) * 2013-05-16 2014-11-20 Aspeed Technology Inc. Baseboard management system architecture
CN109117404A (en) * 2018-07-17 2019-01-01 深圳市同泰怡信息技术有限公司 A kind of hot-swappable server B BU device
CN109471770A (en) * 2018-09-11 2019-03-15 华为技术有限公司 A kind of method for managing system and device
CN113204466A (en) * 2021-04-29 2021-08-03 山东英信计算机技术有限公司 Over-temperature protection method and electronic equipment

Similar Documents

Publication Publication Date Title
USRE39855E1 (en) Power management strategy to support hot swapping of system blades during run time
EP0373773B1 (en) Disengaging electrical circuit boards from power-supply units
US20160073541A1 (en) Separated server back plane
CN111399879A (en) Firmware upgrading system and method of CP L D
CN115686935A (en) Data backup method, computer device and storage medium
US7490252B2 (en) Abnormal power interruption internal circuitry protection method and system for computer platform
CN113886307A (en) Thermal maintenance method and system for BMC module, server mainboard and BMC module
CN218824636U (en) Power supply detection device for server hard disk backboard
BRPI0613779A2 (en) modular fieldbus segment protector
CN116540856A (en) Device, method and server for correcting state after power supply module fault recovery
CN111984471A (en) Cabinet power BMC redundancy management system and method
CN111858148A (en) PCIE Switch chip configuration file recovery system and method
US6801973B2 (en) Hot swap circuit module
CN115098294A (en) Abnormal event processing method, electronic equipment and management terminal
US6415391B1 (en) Control method and system for resetting backup data
US20070204088A1 (en) Modularized circuit board bus connection control method and system
CN111209143B (en) Recovery method and device of embedded system, embedded device and storage medium
CN111708426A (en) Server and power supply protection circuit thereof
CN112463707A (en) I2C link management system and method
US7263569B1 (en) Method and system for distributing power in a computer system
CN114116315B (en) USB failure recovery method and system applied to industrial information security mainboard
CN214151684U (en) Mainboard assembly with monitoring function and system thereof
CN211148841U (en) DC Cycle testing arrangement
JPH11175206A (en) Peripheral equipment connector
CN117666746B (en) Multi-node server, method, device and medium applied to multi-node server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40065675

Country of ref document: HK