CN110943855A - Method for realizing state recovery after shutdown of server through BMC - Google Patents

Method for realizing state recovery after shutdown of server through BMC Download PDF

Info

Publication number
CN110943855A
CN110943855A CN201911131359.1A CN201911131359A CN110943855A CN 110943855 A CN110943855 A CN 110943855A CN 201911131359 A CN201911131359 A CN 201911131359A CN 110943855 A CN110943855 A CN 110943855A
Authority
CN
China
Prior art keywords
bmc
service
server
downtime
agent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911131359.1A
Other languages
Chinese (zh)
Inventor
王朝晖
陈亮甫
郭坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Chaoyue CNC Electronics Co Ltd
Original Assignee
Shandong Chaoyue CNC Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Chaoyue CNC Electronics Co Ltd filed Critical Shandong Chaoyue CNC Electronics Co Ltd
Priority to CN201911131359.1A priority Critical patent/CN110943855A/en
Publication of CN110943855A publication Critical patent/CN110943855A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/04Network management architectures or arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1438Restarting or rejuvenating
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/28Restricting access to network management systems or functions, e.g. using authorisation function to access network configuration

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a method for realizing state recovery after the downtime of a server through a BMC (baseboard management controller). The method monitors the running state of each service under a server system in real time through the BMC, and stores the running state of each service under the server system into the BMC for storage. And the BMC performs restart control on the server after monitoring the downtime of the server, and starts the service of the main system by an Agent according to the service information stored by the BMC before the downtime of the server, so that the running state of the server is consistent with that before the downtime. The method separates the network management data from the service data, can improve the efficiency and the reliability of data recovery, and is also beneficial to improving the safety of the data recovery.

Description

Method for realizing state recovery after shutdown of server through BMC
Technical Field
The invention relates to a method for recovering a server downtime state, in particular to a method for recovering a state of a server after the downtime state through a BMC (baseboard management controller), and belongs to the technical field of server control.
Background
The BMC is software which runs when the server AC is powered on and starts, and runs on a separate chip on the server. The BMC is mainly used for detecting the health states of various components (such as a CPU, a memory, a hard disk, a fan, a machine frame and the like) of the server, the running state of the BMC is independent of a server main system, and the BMC is an out-of-band management mode.
The out-of-band network management means that the management of the network is realized through a special network management channel, network management data and service data are separated, and an independent channel is established for the network management data. In the channel, only management data, statistical information, charging information and the like are transmitted, and the network management data is separated from the service data, so that the efficiency and the reliability of network management can be improved, and the security of the network management data can be improved. In-band management means that management control information of a network and bearer service information of a user network are transmitted through the same logical channel; a network management system must manage devices through a service network. If the managed object can not be accessed through the service network, the in-band network management system is invalid. The existing BMC out-of-band management software in the industry has various solutions, and in-band and out-of-band communication can be realized in the forms of LPC or network, so that the BMC can acquire information related in a main system.
At present, the server installation state before downtime is not considered in the restarting control of the BMC on the server, and under the condition that hardware is not abnormal, a server system is simply restarted, and then the service of the server is started in an in-band management mode.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a method for realizing the state recovery of the server after the downtime through a BMC (baseboard management controller), and realize a method for automatically recovering the server to the state before the downtime after the restart.
In order to solve the problems, the invention adopts the technical scheme that: 1. a method for realizing state recovery after server downtime through BMC comprises the following steps:
s01), establishing communication between the BMC and the server system;
s02), the BMC monitors the running state of each service under the server system in real time, and stores the running state of each service under the server system into the BMC for storage;
s03), the BMC performs restart control on the server after monitoring that the server system is down, and starts the services which are started before the down or closes the services which are not started before the down according to the service information before the down stored by the BMC, so that the server system is consistent with the server system before the down.
Further, the BMC monitors and restarts the server system through an Agent program, and the function of the program includes: the method comprises the steps of acquiring service information under a server system in real time, sending the service information of the server system to the BMC, receiving a server control instruction sent by the BMC, and controlling and managing the opening and closing of the service of the server system according to the instruction.
Furthermore, the Agent program runs on the server system and is set to be started up, and the start sequence is put behind all other programs.
Further, in the normal operation process of the system, the Agent program acquires the service information of the main system in real time, and sends the server information to the BMC, and the BMC stores and updates the service information of the system in real time.
Further, if the Agent program finds that the system is in a downtime state, the Agent program is restarted through the BMC, the Agent program starts to operate at the last restart, at the moment, the BMC acquires the service operation state after the system is restarted through the Agent, compares the current server operation state with the service state recorded before the downtime, sends a service starting instruction to the Agent if the fact that the service is not operated after the restart is found, sends a service closing instruction to the Agent if the server system has the service which is being operated and is not in the service operation list before the downtime, until the system service operation state is consistent with the service operation state before the downtime, at the moment, the BMC sends a confirmation instruction that the service state is consistent to the Agent, and then the BMC does not control the main system.
Further, after the running state of the server system is consistent with that before the downtime, the Agent continuously runs, the state is switched to a service detection state, if the Agent finds that the service running state changes, the Agent informs the BMC to update the service running state list, and the BMC updates the service running state list through the acquired information.
The invention has the advantages that: the invention realizes that the server automatically restores the state before downtime after restarting in the mode of the out-of-band network management, and the out-of-band network management separates the network management data from the service data and establishes an independent channel for the network management data, thereby improving the efficiency and the reliability of the network management and being beneficial to improving the safety of the network management data. The method monitors the running state of each service under the server system in real time through the Agent program, realizes restart control through the Agent program after the server is down, can realize state recovery of the server, does not influence the normal work of the server system, and ensures the working efficiency of the server system.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
FIG. 1 is a schematic diagram of the present method in use;
fig. 2 is a flow chart of the operation of the Agent program.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
The embodiment discloses a method for realizing state recovery after the downtime of a server through a BMC (baseboard management controller). And the BMC performs restart control on the server after monitoring the system downtime, and starts or closes the service of the main system through an Agent according to the service information before the system downtime, which is stored by the BMC, so that the service state of the main system is consistent with that before the system downtime.
Specifically, the method comprises the following steps:
s01), establishing a communication protocol between the BMC and the server system, so that at least one communication mode exists between the server main system and the BMC;
s02), the BMC monitors the running state of each service under the server system in real time, and stores the running state of each service under the server system into the BMC for storage;
s03), the BMC performs restart control on the server after monitoring that the server system is down, and starts the services which are started before the down or closes the services which are not started before the down according to the service information before the down stored by the BMC, so that the server system is consistent with the server system before the down.
The application mode of the method is as shown in fig. 1, on the premise of ensuring that the BMC and the server system have a communication mode, firstly, an Agent program is written, that is, the BMC monitors and restarts the server system through the Agent program, and the function of the Agent program includes: the method comprises the steps of acquiring service information under a server system in real time, sending the service information of the server system to the BMC, receiving a server control instruction sent by the BMC, and controlling and managing the opening and closing of the service of the server system according to the instruction.
In this embodiment, the Agent program runs on the server system, and is set to be started up, and the start sequence is put after all other programs. As shown in fig. 2, the workflow of Agent is: in the normal operation process of the system, the Agent program acquires the service information of the main system in real time and sends the server information to the BMC, and the BMC stores and updates the service information of the system in real time.
And if the Agent program finds that the system is in a downtime state, restarting the system through the BMC, starting running the Agent program at the last restart, acquiring the service running state after the system is restarted through the Agent by the BMC, comparing the current server running state with the service state recorded before the downtime, sending a service starting instruction to the Agent if the fact that the service is not running after the restart is found, sending a service closing instruction to the Agent if the server system has the running service which is not in the service running list before the downtime, and sending a confirmation instruction that the service state is consistent to the Agent by the BMC until the system service running state is consistent with the service running state before the downtime, wherein the BMC does not control the main system any more.
And after the running state of the server system is consistent with the running state before the downtime, the Agent continuously runs, the state is switched to a service detection state, if the Agent finds that the service running state changes, the Agent informs the BMC to update the service running state list, and the BMC updates the service running state list through the acquired information.
The invention realizes that the server automatically restores the state before downtime after restarting in the mode of the out-of-band network management, and the out-of-band network management separates the network management data from the service data and establishes an independent channel for the network management data, thereby improving the efficiency and the reliability of the network management and being beneficial to improving the safety of the network management data. The method monitors the running state of each service under the server system in real time through the Agent program, realizes restart control through the Agent program after the server is down, can realize state recovery of the server, does not influence the normal work of the server system, and ensures the working efficiency of the server system.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. A method for realizing state recovery after server downtime through BMC is characterized in that: the method comprises the following steps:
s01), establishing communication between the BMC and the server system;
s02), the BMC monitors the running state of each service under the server system in real time, and stores the running state of each service under the server system into the BMC for storage;
s03), the BMC performs restart control on the server after monitoring that the server system is down, and starts the services which are started before the down or closes the services which are not started before the down according to the service information before the down stored by the BMC, so that the server system is consistent with the server system before the down.
2. The method for implementing the status recovery after the server downtime by the BMC according to claim 1, wherein: the BMC monitors and restarts the server system through an Agent program, and the function of the program comprises: the method comprises the steps of acquiring service information under a server system in real time, sending the service information of the server system to the BMC, receiving a server control instruction sent by the BMC, and controlling and managing the opening and closing of the service of the server system according to the instruction.
3. The method for implementing the status recovery after the server downtime by the BMC according to claim 2, wherein: the Agent program runs on the server system and is set to be started up, and the start sequence is put behind all other programs.
4. The method for implementing the status recovery after the server downtime by the BMC according to claim 2, wherein: in the normal operation process of the system, the Agent program acquires the service information of the main system in real time and sends the server information to the BMC, and the BMC stores and updates the service information of the system in real time.
5. The method for implementing the status recovery after the server downtime by the BMC according to claim 4, wherein: and if the Agent program finds that the system is in a downtime state, restarting the system through the BMC, starting running the Agent program at the last restart, acquiring the service running state after the system is restarted through the Agent by the BMC, comparing the current server running state with the service state recorded before the downtime, sending a service starting instruction to the Agent if the fact that the service is not running after the restart is found, sending a service closing instruction to the Agent if the server system has the running service which is not in the service running list before the downtime, and sending a confirmation instruction that the service state is consistent to the Agent by the BMC until the system service running state is consistent with the service running state before the downtime, wherein the BMC does not control the main system any more.
6. The method for implementing the status recovery after the server downtime by the BMC according to claim 5, wherein: and after the running state of the server system is consistent with the running state before the downtime, the Agent continuously runs, the state is switched to a service detection state, if the Agent finds that the service running state changes, the Agent informs the BMC to update the service running state list, and the BMC updates the service running state list through the acquired information.
CN201911131359.1A 2019-11-19 2019-11-19 Method for realizing state recovery after shutdown of server through BMC Pending CN110943855A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911131359.1A CN110943855A (en) 2019-11-19 2019-11-19 Method for realizing state recovery after shutdown of server through BMC

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911131359.1A CN110943855A (en) 2019-11-19 2019-11-19 Method for realizing state recovery after shutdown of server through BMC

Publications (1)

Publication Number Publication Date
CN110943855A true CN110943855A (en) 2020-03-31

Family

ID=69907846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911131359.1A Pending CN110943855A (en) 2019-11-19 2019-11-19 Method for realizing state recovery after shutdown of server through BMC

Country Status (1)

Country Link
CN (1) CN110943855A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111459134A (en) * 2020-04-01 2020-07-28 海信集团有限公司 Household appliance
CN111865685A (en) * 2020-07-17 2020-10-30 浪潮商用机器有限公司 Network service recovery method, device, equipment and readable storage medium
CN113641556A (en) * 2021-08-24 2021-11-12 东风电子科技股份有限公司 System, method, device, processor and computer readable storage medium for guaranteeing stable operation of automobile instrument
CN117149229A (en) * 2023-10-27 2023-12-01 江苏华鲲振宇智能科技有限责任公司 Automatic restoration method and system for server management software
CN113641556B (en) * 2021-08-24 2024-05-17 东风电子科技股份有限公司 System, method, device, processor and computer readable storage medium for ensuring stable operation of automobile instrument

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080270827A1 (en) * 2007-04-26 2008-10-30 International Business Machines Corporation Recovering diagnostic data after out-of-band data capture failure
CN102360323A (en) * 2011-10-28 2012-02-22 东莞市正欣科技有限公司 Method and system for self-repairing down of network server
CN103209214A (en) * 2013-04-03 2013-07-17 蓝盾信息安全技术股份有限公司 Not only structured query language (NoSQL)-based method for realizing message-oriented middleware
WO2016091033A1 (en) * 2014-12-11 2016-06-16 华为技术有限公司 Method and server for presenting initialization degree of hardware in server
CN106528143A (en) * 2016-10-27 2017-03-22 杭州昆海信息技术有限公司 Configuration management method and device
CN108280012A (en) * 2018-01-25 2018-07-13 郑州云海信息技术有限公司 A kind of method and device of monitoring server system process

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080270827A1 (en) * 2007-04-26 2008-10-30 International Business Machines Corporation Recovering diagnostic data after out-of-band data capture failure
CN102360323A (en) * 2011-10-28 2012-02-22 东莞市正欣科技有限公司 Method and system for self-repairing down of network server
CN103209214A (en) * 2013-04-03 2013-07-17 蓝盾信息安全技术股份有限公司 Not only structured query language (NoSQL)-based method for realizing message-oriented middleware
WO2016091033A1 (en) * 2014-12-11 2016-06-16 华为技术有限公司 Method and server for presenting initialization degree of hardware in server
CN106528143A (en) * 2016-10-27 2017-03-22 杭州昆海信息技术有限公司 Configuration management method and device
CN108280012A (en) * 2018-01-25 2018-07-13 郑州云海信息技术有限公司 A kind of method and device of monitoring server system process

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111459134A (en) * 2020-04-01 2020-07-28 海信集团有限公司 Household appliance
CN111865685A (en) * 2020-07-17 2020-10-30 浪潮商用机器有限公司 Network service recovery method, device, equipment and readable storage medium
CN113641556A (en) * 2021-08-24 2021-11-12 东风电子科技股份有限公司 System, method, device, processor and computer readable storage medium for guaranteeing stable operation of automobile instrument
CN113641556B (en) * 2021-08-24 2024-05-17 东风电子科技股份有限公司 System, method, device, processor and computer readable storage medium for ensuring stable operation of automobile instrument
CN117149229A (en) * 2023-10-27 2023-12-01 江苏华鲲振宇智能科技有限责任公司 Automatic restoration method and system for server management software
CN117149229B (en) * 2023-10-27 2024-03-12 江苏华鲲振宇智能科技有限责任公司 Automatic restoration method and system for server management software

Similar Documents

Publication Publication Date Title
CN110943855A (en) Method for realizing state recovery after shutdown of server through BMC
CN106201844B (en) A kind of log collecting method and device
CN109286529B (en) Method and system for recovering RabbitMQ network partition
JP2001101033A (en) Fault monitoring method for operating system and application program
CN112506702B (en) Disaster recovery method, device, equipment and storage medium for data center
CN105354113A (en) Server, and system and method for managing server
CN110618864A (en) Interrupt task recovery method and device
CN111988162B (en) Method and device for managing port configuration mode
CN113656175A (en) Method, apparatus and program product for training models based on distributed systems
EP3076694A1 (en) Multiple connection management for bluetooth low energy devices
CN108733454B (en) Virtual machine fault processing method and device
CN104503861A (en) Abnormality handling method and system, agency device and control device
CN112860408A (en) Process keep-alive method, device and system in cloud reality machine and storage medium
CN108762886A (en) The fault detect restoration methods and system of virtual machine
CN109558209B (en) Monitoring method for virtual machine
CN102073523A (en) Method and device for implementing software version synchronization
CN106411643B (en) BMC detection method and device
CN111614702B (en) Edge calculation method and edge calculation system
CN112084159A (en) File synchronization system and method based on Bluetooth communication
CN107896176B (en) Processing method of computing node, intelligent terminal and storage medium
CN113867815B (en) Method for monitoring server suspension and automatically restarting and server applying same
CN111176893A (en) Computer remote control method, device, system and storage medium
CN109947576B (en) Method for managing internal agent program of virtual machine
CN105072185A (en) TR069 (Technical Report 069) remote monitoring method and system, and communication equipment
CN103546582A (en) Method, device and system for backup of application services of server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200331

RJ01 Rejection of invention patent application after publication