CN110943855A - Method for realizing state recovery after shutdown of server through BMC - Google Patents
Method for realizing state recovery after shutdown of server through BMC Download PDFInfo
- Publication number
- CN110943855A CN110943855A CN201911131359.1A CN201911131359A CN110943855A CN 110943855 A CN110943855 A CN 110943855A CN 201911131359 A CN201911131359 A CN 201911131359A CN 110943855 A CN110943855 A CN 110943855A
- Authority
- CN
- China
- Prior art keywords
- bmc
- service
- server
- downtime
- agent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/04—Network management architectures or arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1415—Saving, restoring, recovering or retrying at system level
- G06F11/1438—Restarting or rejuvenating
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/28—Restricting access to network management systems or functions, e.g. using authorisation function to access network configuration
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Hardware Redundancy (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention provides a method for realizing state recovery after the downtime of a server through a BMC (baseboard management controller). The method monitors the running state of each service under a server system in real time through the BMC, and stores the running state of each service under the server system into the BMC for storage. And the BMC performs restart control on the server after monitoring the downtime of the server, and starts the service of the main system by an Agent according to the service information stored by the BMC before the downtime of the server, so that the running state of the server is consistent with that before the downtime. The method separates the network management data from the service data, can improve the efficiency and the reliability of data recovery, and is also beneficial to improving the safety of the data recovery.
Description
Technical Field
The invention relates to a method for recovering a server downtime state, in particular to a method for recovering a state of a server after the downtime state through a BMC (baseboard management controller), and belongs to the technical field of server control.
Background
The BMC is software which runs when the server AC is powered on and starts, and runs on a separate chip on the server. The BMC is mainly used for detecting the health states of various components (such as a CPU, a memory, a hard disk, a fan, a machine frame and the like) of the server, the running state of the BMC is independent of a server main system, and the BMC is an out-of-band management mode.
The out-of-band network management means that the management of the network is realized through a special network management channel, network management data and service data are separated, and an independent channel is established for the network management data. In the channel, only management data, statistical information, charging information and the like are transmitted, and the network management data is separated from the service data, so that the efficiency and the reliability of network management can be improved, and the security of the network management data can be improved. In-band management means that management control information of a network and bearer service information of a user network are transmitted through the same logical channel; a network management system must manage devices through a service network. If the managed object can not be accessed through the service network, the in-band network management system is invalid. The existing BMC out-of-band management software in the industry has various solutions, and in-band and out-of-band communication can be realized in the forms of LPC or network, so that the BMC can acquire information related in a main system.
At present, the server installation state before downtime is not considered in the restarting control of the BMC on the server, and under the condition that hardware is not abnormal, a server system is simply restarted, and then the service of the server is started in an in-band management mode.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a method for realizing the state recovery of the server after the downtime through a BMC (baseboard management controller), and realize a method for automatically recovering the server to the state before the downtime after the restart.
In order to solve the problems, the invention adopts the technical scheme that: 1. a method for realizing state recovery after server downtime through BMC comprises the following steps:
s01), establishing communication between the BMC and the server system;
s02), the BMC monitors the running state of each service under the server system in real time, and stores the running state of each service under the server system into the BMC for storage;
s03), the BMC performs restart control on the server after monitoring that the server system is down, and starts the services which are started before the down or closes the services which are not started before the down according to the service information before the down stored by the BMC, so that the server system is consistent with the server system before the down.
Further, the BMC monitors and restarts the server system through an Agent program, and the function of the program includes: the method comprises the steps of acquiring service information under a server system in real time, sending the service information of the server system to the BMC, receiving a server control instruction sent by the BMC, and controlling and managing the opening and closing of the service of the server system according to the instruction.
Furthermore, the Agent program runs on the server system and is set to be started up, and the start sequence is put behind all other programs.
Further, in the normal operation process of the system, the Agent program acquires the service information of the main system in real time, and sends the server information to the BMC, and the BMC stores and updates the service information of the system in real time.
Further, if the Agent program finds that the system is in a downtime state, the Agent program is restarted through the BMC, the Agent program starts to operate at the last restart, at the moment, the BMC acquires the service operation state after the system is restarted through the Agent, compares the current server operation state with the service state recorded before the downtime, sends a service starting instruction to the Agent if the fact that the service is not operated after the restart is found, sends a service closing instruction to the Agent if the server system has the service which is being operated and is not in the service operation list before the downtime, until the system service operation state is consistent with the service operation state before the downtime, at the moment, the BMC sends a confirmation instruction that the service state is consistent to the Agent, and then the BMC does not control the main system.
Further, after the running state of the server system is consistent with that before the downtime, the Agent continuously runs, the state is switched to a service detection state, if the Agent finds that the service running state changes, the Agent informs the BMC to update the service running state list, and the BMC updates the service running state list through the acquired information.
The invention has the advantages that: the invention realizes that the server automatically restores the state before downtime after restarting in the mode of the out-of-band network management, and the out-of-band network management separates the network management data from the service data and establishes an independent channel for the network management data, thereby improving the efficiency and the reliability of the network management and being beneficial to improving the safety of the network management data. The method monitors the running state of each service under the server system in real time through the Agent program, realizes restart control through the Agent program after the server is down, can realize state recovery of the server, does not influence the normal work of the server system, and ensures the working efficiency of the server system.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
FIG. 1 is a schematic diagram of the present method in use;
fig. 2 is a flow chart of the operation of the Agent program.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
The embodiment discloses a method for realizing state recovery after the downtime of a server through a BMC (baseboard management controller). And the BMC performs restart control on the server after monitoring the system downtime, and starts or closes the service of the main system through an Agent according to the service information before the system downtime, which is stored by the BMC, so that the service state of the main system is consistent with that before the system downtime.
Specifically, the method comprises the following steps:
s01), establishing a communication protocol between the BMC and the server system, so that at least one communication mode exists between the server main system and the BMC;
s02), the BMC monitors the running state of each service under the server system in real time, and stores the running state of each service under the server system into the BMC for storage;
s03), the BMC performs restart control on the server after monitoring that the server system is down, and starts the services which are started before the down or closes the services which are not started before the down according to the service information before the down stored by the BMC, so that the server system is consistent with the server system before the down.
The application mode of the method is as shown in fig. 1, on the premise of ensuring that the BMC and the server system have a communication mode, firstly, an Agent program is written, that is, the BMC monitors and restarts the server system through the Agent program, and the function of the Agent program includes: the method comprises the steps of acquiring service information under a server system in real time, sending the service information of the server system to the BMC, receiving a server control instruction sent by the BMC, and controlling and managing the opening and closing of the service of the server system according to the instruction.
In this embodiment, the Agent program runs on the server system, and is set to be started up, and the start sequence is put after all other programs. As shown in fig. 2, the workflow of Agent is: in the normal operation process of the system, the Agent program acquires the service information of the main system in real time and sends the server information to the BMC, and the BMC stores and updates the service information of the system in real time.
And if the Agent program finds that the system is in a downtime state, restarting the system through the BMC, starting running the Agent program at the last restart, acquiring the service running state after the system is restarted through the Agent by the BMC, comparing the current server running state with the service state recorded before the downtime, sending a service starting instruction to the Agent if the fact that the service is not running after the restart is found, sending a service closing instruction to the Agent if the server system has the running service which is not in the service running list before the downtime, and sending a confirmation instruction that the service state is consistent to the Agent by the BMC until the system service running state is consistent with the service running state before the downtime, wherein the BMC does not control the main system any more.
And after the running state of the server system is consistent with the running state before the downtime, the Agent continuously runs, the state is switched to a service detection state, if the Agent finds that the service running state changes, the Agent informs the BMC to update the service running state list, and the BMC updates the service running state list through the acquired information.
The invention realizes that the server automatically restores the state before downtime after restarting in the mode of the out-of-band network management, and the out-of-band network management separates the network management data from the service data and establishes an independent channel for the network management data, thereby improving the efficiency and the reliability of the network management and being beneficial to improving the safety of the network management data. The method monitors the running state of each service under the server system in real time through the Agent program, realizes restart control through the Agent program after the server is down, can realize state recovery of the server, does not influence the normal work of the server system, and ensures the working efficiency of the server system.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (6)
1. A method for realizing state recovery after server downtime through BMC is characterized in that: the method comprises the following steps:
s01), establishing communication between the BMC and the server system;
s02), the BMC monitors the running state of each service under the server system in real time, and stores the running state of each service under the server system into the BMC for storage;
s03), the BMC performs restart control on the server after monitoring that the server system is down, and starts the services which are started before the down or closes the services which are not started before the down according to the service information before the down stored by the BMC, so that the server system is consistent with the server system before the down.
2. The method for implementing the status recovery after the server downtime by the BMC according to claim 1, wherein: the BMC monitors and restarts the server system through an Agent program, and the function of the program comprises: the method comprises the steps of acquiring service information under a server system in real time, sending the service information of the server system to the BMC, receiving a server control instruction sent by the BMC, and controlling and managing the opening and closing of the service of the server system according to the instruction.
3. The method for implementing the status recovery after the server downtime by the BMC according to claim 2, wherein: the Agent program runs on the server system and is set to be started up, and the start sequence is put behind all other programs.
4. The method for implementing the status recovery after the server downtime by the BMC according to claim 2, wherein: in the normal operation process of the system, the Agent program acquires the service information of the main system in real time and sends the server information to the BMC, and the BMC stores and updates the service information of the system in real time.
5. The method for implementing the status recovery after the server downtime by the BMC according to claim 4, wherein: and if the Agent program finds that the system is in a downtime state, restarting the system through the BMC, starting running the Agent program at the last restart, acquiring the service running state after the system is restarted through the Agent by the BMC, comparing the current server running state with the service state recorded before the downtime, sending a service starting instruction to the Agent if the fact that the service is not running after the restart is found, sending a service closing instruction to the Agent if the server system has the running service which is not in the service running list before the downtime, and sending a confirmation instruction that the service state is consistent to the Agent by the BMC until the system service running state is consistent with the service running state before the downtime, wherein the BMC does not control the main system any more.
6. The method for implementing the status recovery after the server downtime by the BMC according to claim 5, wherein: and after the running state of the server system is consistent with the running state before the downtime, the Agent continuously runs, the state is switched to a service detection state, if the Agent finds that the service running state changes, the Agent informs the BMC to update the service running state list, and the BMC updates the service running state list through the acquired information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911131359.1A CN110943855A (en) | 2019-11-19 | 2019-11-19 | Method for realizing state recovery after shutdown of server through BMC |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911131359.1A CN110943855A (en) | 2019-11-19 | 2019-11-19 | Method for realizing state recovery after shutdown of server through BMC |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110943855A true CN110943855A (en) | 2020-03-31 |
Family
ID=69907846
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911131359.1A Pending CN110943855A (en) | 2019-11-19 | 2019-11-19 | Method for realizing state recovery after shutdown of server through BMC |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110943855A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111459134A (en) * | 2020-04-01 | 2020-07-28 | 海信集团有限公司 | Household appliance |
CN111865685A (en) * | 2020-07-17 | 2020-10-30 | 浪潮商用机器有限公司 | Network service recovery method, device, equipment and readable storage medium |
CN113641556A (en) * | 2021-08-24 | 2021-11-12 | 东风电子科技股份有限公司 | System, method, device, processor and computer readable storage medium for guaranteeing stable operation of automobile instrument |
CN117149229A (en) * | 2023-10-27 | 2023-12-01 | 江苏华鲲振宇智能科技有限责任公司 | Automatic restoration method and system for server management software |
CN113641556B (en) * | 2021-08-24 | 2024-05-17 | 东风电子科技股份有限公司 | System, method, device, processor and computer readable storage medium for ensuring stable operation of automobile instrument |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080270827A1 (en) * | 2007-04-26 | 2008-10-30 | International Business Machines Corporation | Recovering diagnostic data after out-of-band data capture failure |
CN102360323A (en) * | 2011-10-28 | 2012-02-22 | 东莞市正欣科技有限公司 | Method and system for self-repairing down of network server |
CN103209214A (en) * | 2013-04-03 | 2013-07-17 | 蓝盾信息安全技术股份有限公司 | Not only structured query language (NoSQL)-based method for realizing message-oriented middleware |
WO2016091033A1 (en) * | 2014-12-11 | 2016-06-16 | 华为技术有限公司 | Method and server for presenting initialization degree of hardware in server |
CN106528143A (en) * | 2016-10-27 | 2017-03-22 | 杭州昆海信息技术有限公司 | Configuration management method and device |
CN108280012A (en) * | 2018-01-25 | 2018-07-13 | 郑州云海信息技术有限公司 | A kind of method and device of monitoring server system process |
-
2019
- 2019-11-19 CN CN201911131359.1A patent/CN110943855A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080270827A1 (en) * | 2007-04-26 | 2008-10-30 | International Business Machines Corporation | Recovering diagnostic data after out-of-band data capture failure |
CN102360323A (en) * | 2011-10-28 | 2012-02-22 | 东莞市正欣科技有限公司 | Method and system for self-repairing down of network server |
CN103209214A (en) * | 2013-04-03 | 2013-07-17 | 蓝盾信息安全技术股份有限公司 | Not only structured query language (NoSQL)-based method for realizing message-oriented middleware |
WO2016091033A1 (en) * | 2014-12-11 | 2016-06-16 | 华为技术有限公司 | Method and server for presenting initialization degree of hardware in server |
CN106528143A (en) * | 2016-10-27 | 2017-03-22 | 杭州昆海信息技术有限公司 | Configuration management method and device |
CN108280012A (en) * | 2018-01-25 | 2018-07-13 | 郑州云海信息技术有限公司 | A kind of method and device of monitoring server system process |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111459134A (en) * | 2020-04-01 | 2020-07-28 | 海信集团有限公司 | Household appliance |
CN111865685A (en) * | 2020-07-17 | 2020-10-30 | 浪潮商用机器有限公司 | Network service recovery method, device, equipment and readable storage medium |
CN113641556A (en) * | 2021-08-24 | 2021-11-12 | 东风电子科技股份有限公司 | System, method, device, processor and computer readable storage medium for guaranteeing stable operation of automobile instrument |
CN113641556B (en) * | 2021-08-24 | 2024-05-17 | 东风电子科技股份有限公司 | System, method, device, processor and computer readable storage medium for ensuring stable operation of automobile instrument |
CN117149229A (en) * | 2023-10-27 | 2023-12-01 | 江苏华鲲振宇智能科技有限责任公司 | Automatic restoration method and system for server management software |
CN117149229B (en) * | 2023-10-27 | 2024-03-12 | 江苏华鲲振宇智能科技有限责任公司 | Automatic restoration method and system for server management software |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110943855A (en) | Method for realizing state recovery after shutdown of server through BMC | |
CN106201844B (en) | A kind of log collecting method and device | |
CN109286529B (en) | Method and system for recovering RabbitMQ network partition | |
JP2001101033A (en) | Fault monitoring method for operating system and application program | |
CN112506702B (en) | Disaster recovery method, device, equipment and storage medium for data center | |
CN105354113A (en) | Server, and system and method for managing server | |
CN110618864A (en) | Interrupt task recovery method and device | |
CN111988162B (en) | Method and device for managing port configuration mode | |
CN113656175A (en) | Method, apparatus and program product for training models based on distributed systems | |
EP3076694A1 (en) | Multiple connection management for bluetooth low energy devices | |
CN108733454B (en) | Virtual machine fault processing method and device | |
CN104503861A (en) | Abnormality handling method and system, agency device and control device | |
CN112860408A (en) | Process keep-alive method, device and system in cloud reality machine and storage medium | |
CN108762886A (en) | The fault detect restoration methods and system of virtual machine | |
CN109558209B (en) | Monitoring method for virtual machine | |
CN102073523A (en) | Method and device for implementing software version synchronization | |
CN106411643B (en) | BMC detection method and device | |
CN111614702B (en) | Edge calculation method and edge calculation system | |
CN112084159A (en) | File synchronization system and method based on Bluetooth communication | |
CN107896176B (en) | Processing method of computing node, intelligent terminal and storage medium | |
CN113867815B (en) | Method for monitoring server suspension and automatically restarting and server applying same | |
CN111176893A (en) | Computer remote control method, device, system and storage medium | |
CN109947576B (en) | Method for managing internal agent program of virtual machine | |
CN105072185A (en) | TR069 (Technical Report 069) remote monitoring method and system, and communication equipment | |
CN103546582A (en) | Method, device and system for backup of application services of server |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200331 |
|
RJ01 | Rejection of invention patent application after publication |