CN113590203A - Failure processing method and system for substrate management controller, storage medium and single chip microcomputer - Google Patents

Failure processing method and system for substrate management controller, storage medium and single chip microcomputer Download PDF

Info

Publication number
CN113590203A
CN113590203A CN202110801103.8A CN202110801103A CN113590203A CN 113590203 A CN113590203 A CN 113590203A CN 202110801103 A CN202110801103 A CN 202110801103A CN 113590203 A CN113590203 A CN 113590203A
Authority
CN
China
Prior art keywords
management controller
substrate management
substrate
baseboard management
single chip
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110801103.8A
Other languages
Chinese (zh)
Inventor
赵杰
孟崴
周义
叶小令
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI HI-TECH CONTROL SYSTEM CO LTD
Original Assignee
SHANGHAI HI-TECH CONTROL SYSTEM CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI HI-TECH CONTROL SYSTEM CO LTD filed Critical SHANGHAI HI-TECH CONTROL SYSTEM CO LTD
Priority to CN202110801103.8A priority Critical patent/CN113590203A/en
Publication of CN113590203A publication Critical patent/CN113590203A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/4401Bootstrapping

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention provides a failure processing method and a system of a substrate management controller, a storage medium and a single chip microcomputer, comprising the following steps: collecting working state information of a first baseboard management controller serving as a working baseboard management controller; and when the first substrate management controller is judged to be failed according to the working state information, selecting a second substrate management controller as a working substrate management controller, refreshing the first substrate management controller again and restarting the first substrate management controller. The failure processing method and system of the baseboard management controller, the storage medium and the single chip microcomputer realize quick recovery of the baseboard management controller after failure through a double backup mode of the baseboard management controller.

Description

Failure processing method and system for substrate management controller, storage medium and single chip microcomputer
Technical Field
The present invention relates to the technical field of Baseboard Management Controllers (BMCs), and in particular, to a method and a system for processing failure of a Baseboard Management Controller, a storage medium, and a single chip microcomputer.
Background
The baseboard management controller can perform operations such as firmware upgrading, machine equipment checking and the like on the server in a state that the server is not started, so that functions such as local and remote diagnosis, console support, configuration management, hardware management, fault removal and the like are realized.
In the prior art, a certain failure condition exists in the operation of a baseboard management controller chip, so that the function of the baseboard management controller is lost. When the baseboard management controller chip fails, various parameters such as the state and the temperature of the sensor cannot be detected and controlled, and therefore normal operation of the server system cannot be guaranteed. When the temperature is too high, even the server is damaged, and a dangerous situation occurs.
Disclosure of Invention
In view of the above drawbacks of the prior art, an object of the present invention is to provide a method and a system for processing failure of a bmc, a storage medium, and a single chip, which implement fast recovery after failure of the bmc by using a dual backup manner of the bmc.
To achieve the above and other related objects, the present invention provides a failure processing method for a bmc, comprising: collecting working state information of a first baseboard management controller serving as a working baseboard management controller; and when the first substrate management controller is judged to be failed according to the working state information, selecting a second substrate management controller as a working substrate management controller, refreshing the first substrate management controller again and restarting the first substrate management controller.
In an embodiment of the invention, the working state information is a pulse signal output by the first bmc.
In an embodiment of the present invention, the switching selection from the first bmc to the second bmc is realized by controlling a gate.
The invention provides a failure processing system of a substrate management controller, which comprises an acquisition module and a processing module, wherein the acquisition module is used for acquiring a failure signal of the substrate management controller;
the acquisition module is used for acquiring the working state information of a first substrate management controller serving as a working substrate management controller;
and the processing module is used for selecting a second substrate management controller as a working substrate management controller when judging that the first substrate management controller fails according to the working state information, and refreshing and restarting the first substrate management controller.
In an embodiment of the invention, the working state information is a pulse signal output by the first bmc.
In an embodiment of the invention, the processing module controls the gate to implement the switching selection from the first bmc to the second bmc.
The invention provides a storage medium, which stores a computer program, and the program realizes the failure processing method of the baseboard management controller when being executed by a processor.
The invention provides a single chip microcomputer, which comprises: a processor and a memory;
the memory is used for storing a computer program;
the processor is used for executing the computer program stored in the memory so as to enable the single chip microcomputer to execute the failure processing method of the baseboard management controller.
The invention provides a failure processing system of a substrate management controller, which comprises the singlechip, a first substrate management controller, a second substrate management controller and a gate;
the first substrate management controller is connected with the single chip microcomputer, is used as a working substrate management controller, and outputs working state information to the single chip microcomputer;
the second substrate management controller is connected with the single chip microcomputer and is used as a working substrate management controller under the control of the single chip microcomputer;
the gate is connected with the single chip microcomputer, the first substrate management controller and the second substrate management controller and used for selecting the first substrate management controller or the second substrate management controller as a working substrate management controller under the control of the single chip microcomputer.
In an embodiment of the invention, the gate adopts a two-out-of-one gate.
As described above, the failure processing method and system for the bmc, the storage medium, and the single chip microcomputer according to the present invention have the following advantages:
(1) the rapid recovery after the failure of the baseboard management controller is realized through a dual backup mode of the baseboard management controller;
(2) the normal work of each sensor is guaranteed, and the abnormal condition of the server caused by the absence of the substrate management controller is avoided.
Drawings
FIG. 1 is a flow chart illustrating a baseboard management controller failure processing method according to an embodiment of the invention;
FIG. 2 is a schematic diagram of an embodiment of a BMC failure handling system of the invention;
FIG. 3 is a schematic structural diagram of a single-chip microcomputer according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a bmc failure processing system according to another embodiment of the invention.
Description of the element reference numerals
21 acquisition module
22 processing module
31 processor
32 memory
41 singlechip
42 first baseboard management controller
43 second baseboard management controller
44 gating device
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
The failure processing method and system for the substrate management controllers, the storage medium and the single chip microcomputer can immediately start the standby basic management controller after the basic management controller in the working state fails by arranging the two substrate management controllers, thereby avoiding the abnormal condition of a server caused by the absence of the substrate management controllers, ensuring the stability and reliability of the system and having strong practicability.
As shown in fig. 1, in an embodiment, the bmc failure processing method of the present invention includes the following steps:
and step S1, collecting the working state information of the first baseboard management controller as the working baseboard management controller.
Specifically, two basic management controllers, namely a first baseboard management controller and a second baseboard management controller, are provided in the present invention. And setting the first baseboard management controller as a current working baseboard management controller, and setting the second baseboard management controller as a standby baseboard management controller. The first baseboard management controller and the second baseboard management controller can be used for controlling the rotating speed of the fan, and when the fan is in a normal working state, the fan can output periodic pulse signals.
And step S2, when the first baseboard management controller is judged to be invalid according to the working state information, selecting a second baseboard management controller as a working baseboard management controller, and refreshing the first baseboard management controller again and restarting the first baseboard management controller.
Specifically, the single chip microcomputer connected to the first baseboard management controller is configured to monitor the working state information of the first baseboard management controller, that is, the pulse signal, so as to determine whether the first baseboard management controller is in a normal working state. Specifically, when the monitored pulse signal is a periodic signal, the first baseboard management controller is judged to be in a normal working state; and when the monitored pulse signal is abnormal, namely the pulse signal is a non-periodic signal or cannot be monitored, judging that the first substrate management controller fails.
When the first substrate pulse processor is judged to be invalid, the singlechip outputs a high level and a low level to an alternative gate to switch between the first substrate management controller and the second substrate management controller, so that the second substrate management controller is selected as a working substrate management controller, and the server can work normally. And meanwhile, refreshing and restarting the failed first baseboard management controller to serve as a standby basic management controller. Therefore, the basic management controller serving as the working substrate management controller can be always ensured to be in a normal working state through monitoring, judging and switching the two basic management controllers.
As shown in fig. 2, in an embodiment, the bmc failure processing system of the present invention includes an acquisition module 21 and a processing module 22.
The collection module 21 is configured to collect the operating state information of a first baseboard management controller as a working baseboard management controller.
Specifically, two basic management controllers, namely a first baseboard management controller and a second baseboard management controller, are provided in the present invention. And setting the first baseboard management controller as a current working baseboard management controller, and setting the second baseboard management controller as a standby baseboard management controller. The first baseboard management controller and the second baseboard management controller can be used for controlling the rotating speed of the fan, and when the fan is in a normal working state, the fan can output periodic pulse signals.
The processing module 22 is connected to the acquisition module 21, and is configured to select a second baseboard management controller as a working baseboard management controller when the failure of the first baseboard management controller is determined according to the working state information, and refresh and restart the first baseboard management controller again.
Specifically, the single chip microcomputer connected to the first baseboard management controller is configured to monitor the working state information of the first baseboard management controller, that is, the pulse signal, so as to determine whether the first baseboard management controller is in a normal working state. Specifically, when the monitored pulse signal is a periodic signal, the first baseboard management controller is judged to be in a normal working state; and when the monitored pulse signal is abnormal, namely the pulse signal is a non-periodic signal or cannot be monitored, judging that the first substrate management controller fails.
When the first substrate pulse processor is judged to be invalid, the singlechip outputs a high level and a low level to an alternative gate to switch between the first substrate management controller and the second substrate management controller, so that the second substrate management controller is selected as a working substrate management controller, and the server can work normally. And meanwhile, refreshing and restarting the failed first baseboard management controller to serve as a standby basic management controller. Therefore, the basic management controller serving as the working substrate management controller can be always ensured to be in a normal working state through monitoring, judging and switching the two basic management controllers.
It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the x module may be a processing element that is set up separately, or may be implemented by being integrated in a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and the function of the x module may be called and executed by a processing element of the apparatus. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.
For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
The storage medium of the present invention stores a computer program that realizes the above-described failure processing method of the bmc when executed by a processor. The storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic disk, U-disk, memory card, or optical disk.
As shown in fig. 3, in an embodiment, the single chip of the present invention includes: a processor 31 and a memory 32.
The memory 32 is used for storing computer programs.
The memory 32 includes: various media that can store program codes, such as ROM, RAM, magnetic disk, U-disk, memory card, or optical disk.
The processor 31 is connected to the memory 32, and is configured to execute the computer program stored in the memory 32, so that the single chip microcomputer executes the above-mentioned failure processing method for the bmc.
Preferably, the Processor 31 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components.
As shown in fig. 4, in an embodiment, the bmc failure processing system of the present invention includes the above-mentioned single chip 41, a first bmc 42, a second bmc 43, and a gate 44.
The first baseboard management controller 42 is connected to the single chip 41, and is configured to serve as a work baseboard management controller and output work state information to the single chip 42.
The second baseboard management controller 43 is connected to the single chip 41, and is used as a work baseboard management controller under the control of the single chip 41.
The gate 44 is connected to the single chip 41, the first baseboard management controller 42 and the second baseboard management controller 43, and is configured to select the first baseboard management controller 42 or the second baseboard management controller 43 as a working baseboard management controller under the control of the single chip 41.
In an embodiment of the present invention, the gate 44 is an alternative gate.
In summary, the failure processing method and system for the bmc, the storage medium and the single chip microcomputer of the present invention implement fast recovery after failure of the bmc through a dual backup mode of the bmc; the normal work of each sensor is guaranteed, and the abnormal condition of the server caused by the absence of the substrate management controller is avoided. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (10)

1. A failure processing method for a baseboard management controller is characterized by comprising the following steps: the method comprises the following steps:
collecting working state information of a first baseboard management controller serving as a working baseboard management controller;
and when the first substrate management controller is judged to be failed according to the working state information, selecting a second substrate management controller as a working substrate management controller, refreshing the first substrate management controller again and restarting the first substrate management controller.
2. The baseboard management controller failure processing method of claim 1, wherein: the working state information adopts a pulse signal output by the first substrate management controller.
3. The baseboard management controller failure processing method of claim 1, wherein: and switching selection from the first baseboard management controller to the second baseboard management controller is realized by controlling a gate.
4. A kind of base plate management controller failure processing system, characterized by that: comprises an acquisition module and a processing module;
the acquisition module is used for acquiring the working state information of a first substrate management controller serving as a working substrate management controller;
and the processing module is used for selecting a second substrate management controller as a working substrate management controller when judging that the first substrate management controller fails according to the working state information, and refreshing and restarting the first substrate management controller.
5. The baseboard management controller failure processing system of claim 4, wherein: the working state information adopts a pulse signal output by the first substrate management controller.
6. The baseboard management controller failure processing system of claim 4, wherein: the processing module realizes switching selection from the first baseboard management controller to the second baseboard management controller by controlling a gate.
7. A storage medium having a computer program stored thereon, characterized in that: the program, when executed by a processor, implements the baseboard management controller failure processing method of any of claims 1 to 3.
8. A singlechip, its characterized in that: the method comprises the following steps: a processor and a memory;
the memory is used for storing a computer program;
the processor is used for executing the computer program stored in the memory so as to enable the single chip microcomputer to execute the failure processing method of the baseboard management controller as set forth in any one of claims 1 to 3.
9. A kind of base plate management controller failure processing system, characterized by that: the system comprises the single chip microcomputer, a first substrate management controller, a second substrate management controller and a gate of claim 8;
the first substrate management controller is connected with the single chip microcomputer, is used as a working substrate management controller, and outputs working state information to the single chip microcomputer;
the second substrate management controller is connected with the single chip microcomputer and is used as a working substrate management controller under the control of the single chip microcomputer;
the gate is connected with the single chip microcomputer, the first substrate management controller and the second substrate management controller and used for selecting the first substrate management controller or the second substrate management controller as a working substrate management controller under the control of the single chip microcomputer.
10. The baseboard management controller failure processing system of claim 9, wherein: the gate adopts a two-out-of-one gate.
CN202110801103.8A 2021-07-15 2021-07-15 Failure processing method and system for substrate management controller, storage medium and single chip microcomputer Pending CN113590203A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110801103.8A CN113590203A (en) 2021-07-15 2021-07-15 Failure processing method and system for substrate management controller, storage medium and single chip microcomputer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110801103.8A CN113590203A (en) 2021-07-15 2021-07-15 Failure processing method and system for substrate management controller, storage medium and single chip microcomputer

Publications (1)

Publication Number Publication Date
CN113590203A true CN113590203A (en) 2021-11-02

Family

ID=78247708

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110801103.8A Pending CN113590203A (en) 2021-07-15 2021-07-15 Failure processing method and system for substrate management controller, storage medium and single chip microcomputer

Country Status (1)

Country Link
CN (1) CN113590203A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113835770A (en) * 2021-11-30 2021-12-24 四川华鲲振宇智能科技有限责任公司 Online replacement method and system for server management module

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424054A (en) * 2013-09-03 2015-03-18 纬创资通股份有限公司 Server system and backup management method thereof
TWI633416B (en) * 2017-06-30 2018-08-21 神雲科技股份有限公司 Server fan control system and control method
CN109236710A (en) * 2017-07-10 2019-01-18 佛山市顺德区顺达电脑厂有限公司 Server fan control system and its control method
CN111737037A (en) * 2020-06-12 2020-10-02 浪潮(北京)电子信息产业有限公司 Substrate management control method, master-slave heterogeneous BMC control system and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424054A (en) * 2013-09-03 2015-03-18 纬创资通股份有限公司 Server system and backup management method thereof
TWI633416B (en) * 2017-06-30 2018-08-21 神雲科技股份有限公司 Server fan control system and control method
CN109236710A (en) * 2017-07-10 2019-01-18 佛山市顺德区顺达电脑厂有限公司 Server fan control system and its control method
CN111737037A (en) * 2020-06-12 2020-10-02 浪潮(北京)电子信息产业有限公司 Substrate management control method, master-slave heterogeneous BMC control system and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113835770A (en) * 2021-11-30 2021-12-24 四川华鲲振宇智能科技有限责任公司 Online replacement method and system for server management module
CN113835770B (en) * 2021-11-30 2022-02-18 四川华鲲振宇智能科技有限责任公司 Online replacement method and system for server management module

Similar Documents

Publication Publication Date Title
WO2022198972A1 (en) Method, system and apparatus for fault positioning in starting process of server
US10055296B2 (en) System and method for selective BIOS restoration
WO2015169199A1 (en) Anomaly recovery method for virtual machine in distributed environment
CN106789306B (en) Method and system for detecting, collecting and recovering software fault of communication equipment
CN111324192A (en) System board power supply detection method, device, equipment and storage medium
EP3591485B1 (en) Method and device for monitoring for equipment failure
CN103092746A (en) Positioning method and system for thread anomaly
CN112286709B (en) Diagnosis method, diagnosis device and diagnosis equipment for server hardware faults
CN114328102B (en) Equipment state monitoring method, equipment state monitoring device, equipment and computer readable storage medium
CN104320308A (en) Method and device for detecting anomalies of server
CN104636221A (en) Method and device for processing computer system fault
TWI668567B (en) Server and method for restoring a baseboard management controller automatically
CN111752776A (en) Cyclic power-on and power-off test method and system for server
CN111858122A (en) Fault detection method, device, equipment and storage medium of storage link
CN113590203A (en) Failure processing method and system for substrate management controller, storage medium and single chip microcomputer
CN113672306B (en) Server component self-checking abnormity recovery method, device, system and medium
CN109358982B (en) Hard disk self-healing device and method and hard disk
JP2018180982A (en) Information processing device and log recording method
CN110471800B (en) Server and method for automatically overhauling substrate management controller
CN107179911B (en) Method and equipment for restarting management engine
CN115728665A (en) Power failure detection circuit, method and system
CN115080132A (en) Information processing method, information processing apparatus, server, and storage medium
CN110399258B (en) Stability testing method, system and device for server system
TW202242655A (en) Method, computer system and computer program product for storing state data of finite state machine
CN111208889A (en) Server temperature control method and system and substrate management controller

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination