CN111190632A - Method and device for realizing dual activities of BMC (baseboard management controller) - Google Patents

Method and device for realizing dual activities of BMC (baseboard management controller) Download PDF

Info

Publication number
CN111190632A
CN111190632A CN201911388465.8A CN201911388465A CN111190632A CN 111190632 A CN111190632 A CN 111190632A CN 201911388465 A CN201911388465 A CN 201911388465A CN 111190632 A CN111190632 A CN 111190632A
Authority
CN
China
Prior art keywords
bmc
version
working
logic partition
partition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911388465.8A
Other languages
Chinese (zh)
Other versions
CN111190632B (en
Inventor
邢科钰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN201911388465.8A priority Critical patent/CN111190632B/en
Publication of CN111190632A publication Critical patent/CN111190632A/en
Application granted granted Critical
Publication of CN111190632B publication Critical patent/CN111190632B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/65Updates
    • G06F8/654Updates using techniques specially adapted for alterable solid state memories, e.g. for EEPROM or flash memories
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Quality & Reliability (AREA)
  • Stored Programmes (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for realizing dual activities of a BMC (baseboard management controller), which comprises the following steps: the method comprises the steps of setting a first logic partition and a second logic partition in an EEPROM chip of the BMC, monitoring the working state of the BMC in real time, receiving a command by the second logic partition to replace a BMC working version in the first logic partition and enter the working state if the BMC is monitored to be abnormal, effectively solving the problem that the work of the server is stopped once the BMC fails due to the fact that a single version of the BMC chip is caused, improving the working efficiency and the management efficiency of the server, packaging a third logic area in the EEPROM of the BMC chip, storing a BMC version mirror image file, refreshing and replacing an original BMC working version once the BMC works abnormally, and ensuring the high efficiency and the reliability of the running work of the server.

Description

Method and device for realizing dual activities of BMC (baseboard management controller)
Technical Field
The invention relates to the field of server BMC design, in particular to a method and a device for realizing dual activities of server BMC.
Background
With the increasing function of server products, BMCs as management centers are playing more and more important roles, and their values are played from the daily management of server functions, remote debugging of systems, and analysis of system logs. While enjoying the convenience of a BMC, there is also a risk of damage to the BMC, and once the BMC fails, this server means a derailment from the control platform.
In the debugging stage of the server, the BMC chip fails the function of the BMC due to the interruption of an artificial refreshing signal or sudden power failure, so that the self-checking failure of the server of the whole machine cannot be started, and the whole machine stops working; in the server operation stage, the failure of the BMC function causes the whole server to lose connection, and the machine room management system cannot acquire the operation state of the server, thereby causing the interruption of service and the failure in management.
At present, BMC chips are all single versions, only one version can be contained in one chip, and the functions of restoration and failure activation are not provided.
Disclosure of Invention
The invention aims to solve the problems in the prior art, innovatively provides a method and a device for realizing the dual-activity of a BMC of a server, effectively solves the problem that the work of the server is stopped once the BMC fails due to the fact that a BMC chip is single in version, and effectively improves the work and management efficiency of the server.
The first aspect of the present invention provides a method for implementing BMC dual activity of a server, including:
setting a first logic partition and a second logic partition in an EEPROM chip of the BMC, wherein the first logic partition is a BMC working version, and the second logic partition is a BMC activated version;
and monitoring the working state of the BMC in real time, and if the abnormal work of the BMC is monitored, receiving a command by the second logic partition, replacing the working version of the BMC in the first logic partition, and entering the working state.
With reference to the first aspect, in a first possible implementation manner of the first aspect, the method further includes: and setting a third logic partition in the EEPROM chip of the BMC, wherein the third logic partition stores a BMC version image file, and when the abnormal working state of the BMC is monitored, decompressing and refreshing the BMC version image file to realize the reactivation of the BMC version.
Further, the BMC working version automatically backs up configuration information during working and stores the configuration information in a configuration file.
And further, when the BMC version image file is decompressed and refreshed, and the reactivation of the BMC version is completed, importing the configuration file.
With reference to the first aspect, in a second possible implementation manner of the first aspect, the real-time monitoring of the BMC operating state specifically includes: and monitoring the state register value of the BMC working version in real time by the watchdog in the BMC.
The second aspect of the present invention provides a device for implementing BMC dual-activity of a server, including:
the first setting module is used for setting a first logic partition and a second logic partition in an EEPROM chip of the BMC, wherein the first logic partition is a BMC working version, and the second logic partition is a BMC activated version;
and the monitoring module monitors the BMC working state in real time, and if the BMC working is monitored to be abnormal, the second logic partition receives a command to replace the BMC working version in the first logic partition and enters the working state.
With reference to the second aspect, in a first possible implementation manner of the second aspect, the method further includes: and the second setting module is used for setting a third logic partition in the EEPROM chip of the BMC, wherein the third logic partition stores a BMC version image file, and when the abnormal working state of the BMC is monitored, the BMC version image file is decompressed and refreshed to realize the reactivation of the BMC version.
Further, the BMC working version automatically backs up configuration information during working and stores the configuration information in a configuration file.
And further, when the BMC version image file is decompressed and refreshed, and the reactivation of the BMC version is completed, importing the configuration file.
With reference to the second aspect, in a second possible implementation manner of the second aspect, the real-time monitoring of the operating state of the BMC in the monitoring module specifically includes: and monitoring the state register value of the BMC working version in real time by the watchdog in the BMC.
The technical scheme adopted by the invention comprises the following technical effects:
1. the invention effectively solves the problem that once the BMC fails, the server stops working due to the single version of the BMC chip, and effectively improves the working and management efficiency of the server.
2. According to the invention, the third logic area is packaged in the EEPROM of the BMC chip, the BMC version image file is stored in the third logic partition, and when the abnormal work state of the BMC is monitored, the BMC version image file is decompressed and refreshed, so that the BMC work version and the BMC activation version in the BMC chip are in the activation state in real time, and once the abnormal work state of the BMC is detected, the original BMC work version can be refreshed and replaced, thereby ensuring the high efficiency and reliability of the operation work of the server.
3. The BMC working version automatically backs up the configuration information when working and stores the configuration information in the configuration file, when the BMC version image file is decompressed and refreshed, the configuration file is imported after the reactivation of the BMC version is completed, the management connection of the BMC to the server is realized, and the server management is ensured not to be disconnected.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without any creative effort.
FIG. 1 is a schematic flow diagram of a process according to an embodiment of the present invention;
FIG. 2 is a block diagram of a BMC logical partition in accordance with a first embodiment of the invention;
FIG. 3 is a schematic flow diagram of a second method embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a third embodiment of an apparatus according to the present invention;
fig. 5 is a schematic structural diagram of a fourth apparatus according to an embodiment of the present invention.
Detailed Description
In order to clearly explain the technical features of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different features of the invention. To simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and procedures are omitted so as to not unnecessarily limit the invention.
Example one
As shown in fig. 1-2, the present invention provides a method for implementing BMC dual activity of a server, including:
s1, setting a first logic partition and a second logic partition in an EEPROM chip of the BMC, wherein the first logic partition is a BMC working version, and the second logic partition is a BMC activated version;
s2, monitoring the BMC working state in real time, judging whether the BMC works abnormally, if so, executing a step S3, and if not, executing a step S2;
and S3, the second logic partition receives the command to replace the BMC working version in the first logic partition and enters a working state.
In step S1, the BMC version in the first logical partition is a working version, and the BMC version in the second logical partition is an active version, and both versions are the same, and may be in a replacement state or a replaced state in real time.
In step S2, the real-time monitoring of the BMC operating state specifically includes: and monitoring the state register value of the BMC working version in real time by the watchdog in the BMC. The state of the BMC version can be written into a state register for storage, and the BMC working version state is determined by checking the value of the state register. For example, 00 is active, 01 is inactive, and 10 is active.
And setting two pointers in the state register, wherein the first pointer points to the BMC working version state in the first logic partition, the second pointer points to the BMC activated version state in the second logic partition, and the BMC working version state and the BMC activated version state are determined according to the value stored in the state register.
In step S3, the second logical partition receives the command to replace the B MC working version in the first logical partition, and the entering into the working state specifically includes: when the watchdog monitors that the BMC works abnormally, a command is sent to the second logic partition, and after the second logic partition receives the command, the BMC working version of the first logic partition is replaced, and the BMC enters a working state. The command may be implemented by an IPMI (intelligent platform Management Interface) command.
The invention effectively solves the problem that once the BMC fails, the server stops working due to the single version of the BMC chip, and effectively improves the working and management efficiency of the server.
Example two
As shown in fig. 3, the present invention provides a method for implementing BMC dual activity of a server, including:
s1, setting a first logic partition and a second logic partition in an EEPROM chip of the BMC, wherein the first logic partition is a BMC working version, and the second logic partition is a BMC activated version;
s2, monitoring the BMC working state in real time, judging whether the BMC works abnormally, if so, executing a step S3, and if not, executing a step S2;
s3, the second logic partition receives the command, replaces the BMC working version in the first logic partition, and enters a working state;
and S4, setting a third logic partition in the EEPROM chip of the BMC, wherein the third logic partition stores a BMC version image file, and when the abnormal working state of the BMC is monitored, decompressing and refreshing the BMC version image file to realize the reactivation of the BMC version.
In step S4, the BMC version image file may be a BMC version stable version or a BMC version latest version, and is preferably a BMC version stable version in order to achieve high management efficiency, so that the refreshed BMC working version or BMC enabled version can be more stable. The space occupied by the third logical partition may be half of that of the second logical partition, the BMC version image file is a compressed packet in a zip format, a built-in BMC decompression tool may be used to decompress a zip file first, the decompressed file is a bin file by default, and then a BMC writing tool is used to write the decompressed bin file into the first logical partition (or the second logical partition) to be refreshed, so that the difference between the first logical partition and the second logical partition may be realized by encoding the first logical partition and the second logical partition.
Similarly, when the BMC activation version of the second logical partition is in a working state, and after the BMC version in the first logical partition is refreshed, once the BMC is abnormal again in operation, the BMC version (the original BMC working version, which is in the active state at present) in the first logical partition replaces the BMC version (the original BMC activation version, which is in the working state at present) in the second logical partition, and at the same time, the BMC version (the original BMC activation version, which is in the working state at present, which is in the failure state after the BMC is abnormal again) in the second logical partition is refreshed.
When the BMC working version works, the configuration information is automatically backed up and stored in the configuration file, namely the configuration information of the BMC can be stored into a config file (configuration file) through the BMCweb interface, when the image file of the BMC version is decompressed and refreshed, and the reactivation of the BMC version is completed, the configuration file is imported, so that the management connection of the BMC to the server is realized, and the management of the server is ensured not to be disconnected.
According to the invention, the third logic area is packaged in the EEPROM of the BMC chip, the BMC version image file is stored in the third logic partition, and when the abnormal work state of the BMC is monitored, the BMC version image file is decompressed and refreshed, so that the BMC work version and the BMC activation version in the BMC chip are in the activation state in real time, and once the abnormal work state of the BMC is detected, the original BMC work version can be refreshed and replaced, thereby ensuring the high efficiency and reliability of the operation work of the server.
EXAMPLE III
As shown in fig. 4, the technical solution of the present invention further provides a device for implementing BMC dual-activity of a server, including:
the first setting module 101 sets a first logical partition and a second logical partition in an EEPROM chip of the BMC, wherein the first logical partition is a BMC working version, and the second logical partition is a BMC activated version;
and the monitoring module 102 is used for monitoring the BMC working state in real time, and if the BMC working abnormality is monitored, the second logic partition receives a command to replace the BMC working version in the first logic partition and enters the working state.
The real-time monitoring of the BMC operating state in the monitoring module 102 is specifically: and monitoring the state register value of the BMC working version in real time by the watchdog in the BMC.
The invention effectively solves the problem that once the BMC fails, the server stops working due to the single version of the BMC chip, and effectively improves the working and management efficiency of the server.
Example four
As shown in fig. 5, the technical solution of the present invention further provides a device for implementing BMC dual-activity of a server, including:
the first setting module 101 sets a first logical partition and a second logical partition in an EEPROM chip of the BMC, wherein the first logical partition is a BMC working version, and the second logical partition is a BMC activated version;
the monitoring module 102 is used for monitoring the BMC working state in real time, and if the BMC working abnormality is monitored, the second logic partition receives a command to replace the BMC working version in the first logic partition and enters the working state;
the second setting module 103 sets a third logical partition in the EEPROM chip of the BMC, where the third logical partition stores a BMC version image file, and when the BMC working state is monitored to be abnormal, the BMC version image file is decompressed and refreshed to activate the BMC version again.
According to the invention, the third logic area is packaged in the EEPROM of the BMC chip, the BMC version image file is stored in the third logic partition, and when the abnormal work state of the BMC is monitored, the BMC version image file is decompressed and refreshed, so that the BMC work version and the BMC activation version in the BMC chip are in the activation state in real time, and once the abnormal work state of the BMC is detected, the original BMC work version can be refreshed and replaced, thereby ensuring the high efficiency and reliability of the operation work of the server.
The BMC working version automatically backs up the configuration information when working and stores the configuration information in the configuration file, when the BMC version image file is decompressed and refreshed, the configuration file is imported after the reactivation of the BMC version is completed, the management connection of the BMC to the server is realized, and the management of the server is ensured not to be disconnected.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (10)

1. A method for realizing the dual activity of a server BMC is characterized by comprising the following steps:
setting a first logic partition and a second logic partition in an EEPROM chip of the BMC, wherein the first logic partition is a BMC working version, and the second logic partition is a BMC activated version;
and monitoring the working state of the BMC in real time, and if the abnormal work of the BMC is monitored, receiving a command by the second logic partition, replacing the working version of the BMC in the first logic partition, and entering the working state.
2. The method of claim 1, further comprising: and setting a third logic partition in the EEPROM chip of the BMC, wherein the third logic partition stores a BMC version image file, and when the abnormal working state of the BMC is monitored, decompressing and refreshing the BMC version image file to realize the reactivation of the BMC version.
3. The method as claimed in claim 2, wherein the BMC working version automatically backs up configuration information during working and stores it in a configuration file.
4. The method as claimed in claim 3, wherein the configuration file is imported when the BMC version image file is decompressed and refreshed to complete reactivation of the BMC version.
5. The method of claim 1, wherein the real-time monitoring of the BMC operating status specifically comprises: and monitoring the state register value of the BMC working version in real time by the watchdog in the BMC.
6. A device for realizing dual activities of a server BMC (baseboard management controller) is characterized by comprising:
the first setting module is used for setting a first logic partition and a second logic partition in an EEPROM chip of the BMC, wherein the first logic partition is a BMC working version, and the second logic partition is a BMC activated version;
and the monitoring module monitors the BMC working state in real time, and if the BMC working is monitored to be abnormal, the second logic partition receives a command to replace the BMC working version in the first logic partition and enters the working state.
7. The apparatus of claim 6, further comprising: and the second setting module is used for setting a third logic partition in the EEPROM chip of the BMC, wherein the third logic partition stores a BMC version image file, and when the abnormal working state of the BMC is monitored, the BMC version image file is decompressed and refreshed to realize the reactivation of the BMC version.
8. The apparatus of claim 7, wherein the BMC working version automatically backs up configuration information during operation and stores the configuration information in a configuration file.
9. The method as claimed in claim 8, wherein the configuration file is imported when the BMC version image file is decompressed and refreshed to complete reactivation of the BMC version.
10. The device of claim 6, wherein the real-time monitoring of the BMC operating status in the monitoring module is specifically: and monitoring the state register value of the BMC working version in real time by the watchdog in the BMC.
CN201911388465.8A 2019-12-30 2019-12-30 Method and device for realizing server BMC dual-activity Active CN111190632B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911388465.8A CN111190632B (en) 2019-12-30 2019-12-30 Method and device for realizing server BMC dual-activity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911388465.8A CN111190632B (en) 2019-12-30 2019-12-30 Method and device for realizing server BMC dual-activity

Publications (2)

Publication Number Publication Date
CN111190632A true CN111190632A (en) 2020-05-22
CN111190632B CN111190632B (en) 2024-02-27

Family

ID=70705916

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911388465.8A Active CN111190632B (en) 2019-12-30 2019-12-30 Method and device for realizing server BMC dual-activity

Country Status (1)

Country Link
CN (1) CN111190632B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1885270A (en) * 2005-06-24 2006-12-27 株式会社东芝 Information processing apparatus, storage medium, and data rescue method
CN105279042A (en) * 2014-07-15 2016-01-27 华耀(中国)科技有限公司 Redundant backup system and method for BSD system
CN110399152A (en) * 2019-07-22 2019-11-01 浙江鸿泉车联网有限公司 A kind of device systems double copies upgrade method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1885270A (en) * 2005-06-24 2006-12-27 株式会社东芝 Information processing apparatus, storage medium, and data rescue method
CN105279042A (en) * 2014-07-15 2016-01-27 华耀(中国)科技有限公司 Redundant backup system and method for BSD system
CN110399152A (en) * 2019-07-22 2019-11-01 浙江鸿泉车联网有限公司 A kind of device systems double copies upgrade method and device

Also Published As

Publication number Publication date
CN111190632B (en) 2024-02-27

Similar Documents

Publication Publication Date Title
CN106933843B (en) Database heartbeat detection method and device
CN103415840A (en) Error management across hardware and software layers
CN105095001A (en) Virtual machine exception recovery method under distributed environment
US8347142B2 (en) Non-disruptive I/O adapter diagnostic testing
CN104320308A (en) Method and device for detecting anomalies of server
CN107070747B (en) Device, system and method for automatically testing network card network connection stability in network card binding mode
CN114116280B (en) Interactive BMC self-recovery method, system, terminal and storage medium
CN110109782B (en) Method, device and system for replacing fault PCIe (peripheral component interconnect express) equipment
CN105259863A (en) PLC warm backup redundancy method and system
CN107368384A (en) A kind of Linux server abnormal information dump system and method
CN110704228A (en) Solid state disk exception handling method and system
CN115617550A (en) Processing device, control unit, electronic device, method, and computer program
US20210382536A1 (en) Systems, devices, and methods for controller devices handling fault events
CN111190632A (en) Method and device for realizing dual activities of BMC (baseboard management controller)
CN111880992A (en) Monitoring and maintaining method for controller state in storage device
CN111176878A (en) Server BBU (building base band Unit) standby power diagnosis method, system, terminal and storage medium
JP2008152552A (en) Computer system and failure information management method
CN111338456B (en) BBU power failure protection implementation method and system
CN110618891B (en) Solid state disk fault online processing method and solid state disk
CN108037942B (en) Adaptive data recovery and update method and device for embedded equipment
CN115639969B (en) Storage disk main-standby switching method and device and computer equipment
CN106599046B (en) Writing method and device of distributed file system
CN111142945A (en) Dynamic switching method for master channel and slave channel of dual-redundancy computer
JP6194496B2 (en) Information processing apparatus, information processing method, and program
CN112015600A (en) Log information processing system, log information processing method and device and switch

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant