CN111124728A

CN111124728A - Automatic service recovery method, system, readable storage medium and server

Info

Publication number: CN111124728A
Application number: CN201911275677.5A
Authority: CN
Inventors: 李晓龙; 陈吉宝; 袁迎春
Original assignee: Celestica Technology Consultancy Shanghai Co Ltd
Current assignee: Celestica Technology Consultancy Shanghai Co Ltd
Priority date: 2019-12-12
Filing date: 2019-12-12
Publication date: 2020-05-08
Anticipated expiration: 2039-12-12
Also published as: CN111124728B

Abstract

The invention provides a method, a system, a readable storage medium and a server for automatically recovering a service, wherein the method for automatically recovering the service comprises the following steps: after the server is started, guiding the running system of the server to enter a first operating system; calling a substrate management control module to monitor whether the working state of the first operating system is normal or not in real time; if so, continuing to call the substrate management control module to monitor the working state of the first operating system; if not, calling the baseboard management control module to restart the first operating system, and judging whether the restart of the first operating system is normal; if yes, calling a basic input and output system to restart the first operating system; if not, calling the basic input and output system to start the second operating system so as to facilitate automatic recovery of the service. The invention realizes the service automatic recovery function when the operating system is completely crashed based on the server firmware, thereby greatly shortening the time required by the recovery operation service, reducing the loss of customers and simultaneously reducing the manual maintenance cost.

Description

Automatic service recovery method, system, readable storage medium and server

Technical Field

The invention belongs to the field of computing networks, relates to a recovery method and a recovery system, and particularly relates to an automatic service recovery method, an automatic service recovery system, a readable storage medium and a server.

Background

The operating system is a computer program that manages computer hardware and software resources, and is also the kernel and foundation of the computer system. The operating system needs to handle basic transactions such as managing and configuring memory, prioritizing system resources, controlling input devices and output devices, operating the network, and managing the file system.

In an edge computing network environment, due to the lack of adequate redundancy node backup. When a client uses a single-node host to perform operation, when conditions such as unexpected power failure, external impact, software crash and the like occur, an operating system may crash, an operation service is interrupted, and automatic recovery is further impossible. If the maintenance is performed manually, a lot of time is consumed.

Therefore, what is needed is a method, a system, a readable storage medium, and a server for automatically recovering a service, so as to solve the problems of the prior art that when a host fails, the operation service is interrupted and cannot be automatically recovered.

Disclosure of Invention

In view of the above drawbacks of the prior art, an object of the present invention is to provide a method, a system, a readable storage medium, and a server for automatically recovering a service, which are used to solve the problem that the operation service is interrupted and cannot be automatically recovered when a host fails in the prior art.

In order to achieve the above and other related objects, an aspect of the present invention provides a method for automatically recovering a service, which is applied to a server, where the server includes an operation module and a baseboard management control module connected to the operation module; the operation module is provided with a first operation system, a second operation system and a basic input and output system; the automatic service recovery method comprises the following steps: after the server is started, guiding the running system of the server to enter the first operating system; calling the baseboard management control module to monitor whether the working state of the first operating system is normal or not in real time; if so, continuing to call the substrate management control module to monitor the working state of the first operating system; if not, executing the next step: calling the baseboard management control module to restart the first operating system, and judging whether the restart of the first operating system is normal; if so, calling the basic input and output system to restart the first operating system; if not, calling the basic input and output system to start the second operating system so as to facilitate automatic recovery of the service.

In an embodiment of the present invention, the operation module is communicatively connected to a network configuration module; the first operating system and the second operating system deploy application programs.

In an embodiment of the present invention, after entering the first operating system, the method for automatically recovering a service further includes: and the first operating system acquires an application configuration file matched with the application program from the network configuration module.

In an embodiment of the present invention, the method for automatically recovering a service further includes: and after the second operating system is started, the second operating system acquires an application configuration file matched with the application program from the network configuration module.

In an embodiment of the present invention, in a process of calling the bmc module to restart the first os, the method for automatically recovering a service further includes: and handing over the system management right of the server to the basic input and output system, and initiating a restart query to the baseboard management control module by the basic input and output system so as to judge whether the restart of the first operating system is normal.

In an embodiment of the invention, the operation module and the baseboard management control module are disposed on a motherboard of the server.

In an embodiment of the invention, the bios is bios firmware having an operating system recovery function and solidified on the motherboard.

The invention provides a service automatic recovery system on the other hand, which is applied to a server, wherein the server comprises an operation module and a substrate management control module connected with the operation module; the operation module is provided with a first operation system, a second operation system and a basic input and output system; the automatic service recovery system comprises: the guiding unit is used for guiding the running system of the server to enter the first operating system after the server is started; the calling unit is used for calling the baseboard management control module to monitor whether the working state of the first operating system is normal or not in real time; if so, continuing to call the substrate management control module to monitor the working state of the first operating system; if not, continuing to call the baseboard management control module to restart the first operating system, and judging whether the restart of the first operating system is normal or not through a judging unit; if yes, calling the basic input and output system through the calling unit to restart the first operating system; if not, the basic input and output system is called by the calling unit to start the second operating system so as to facilitate automatic recovery of the service.

The present invention further provides a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the automatic service restoration method.

A final aspect of the present invention provides a server comprising: a processor and a memory; the memory is used for storing a computer program, and the processor is used for executing the computer program stored by the memory so as to enable the server to execute the service automatic recovery method; the processor comprises an operation module and a substrate management control module connected with the operation module; the operation module is provided with a first operation system, a second operation system and a basic input and output system.

As described above, the method, system, readable storage medium and server for automatically recovering services according to the present invention have the following advantages:

the method, the system, the readable storage medium and the server realize the automatic service recovery function when the operating system is completely crashed based on the server firmware, thereby greatly shortening the time required by the recovery operation service, reducing the loss of customers and simultaneously reducing the manual maintenance cost.

Drawings

FIG. 1 is a system architecture diagram of a server according to the present invention.

Fig. 2 is a flowchart illustrating an automatic service recovery method according to an embodiment of the present invention.

Fig. 3 is a schematic structural diagram of an embodiment of the service automatic recovery system according to the present invention.

Description of the element reference numerals

1	Server
		11	Operation module
12	Baseboard management control module
		111	A first operating system
112	Second operating system
		13	Network configuration module
14	Main board
		S21～S27	Step (ii) of

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

The technical principles of the method, the system, the readable storage medium and the server for automatically recovering the service are as follows:

two operating systems which are redundant backup with each other are provided for a server host of edge computing, a user application program is deployed in each operating system, configuration information of the user application program is stored in a network configuration module, in the actual operation process, a server firmware BMC can actively monitor the health condition of the operating system in combination with the application program, once the operating system crashes, the BMC can automatically restart the server and inform the server to guide a firmware BIOS to complete the switching and guiding of starting items, and the server is switched to a backup operating system. And the application program in the backup operating system is loaded from the network configuration module and runs continuously before the crash, so that the reliability of the whole edge computing node is improved.

Example one

The embodiment provides an automatic service recovery method, which is characterized in that the method is applied to a server, and the server comprises an operation module and a substrate management control module connected with the operation module; the operation module is provided with a first operation system, a second operation system and a basic input and output system; the automatic service recovery method comprises the following steps:

after the server is started, guiding the running system of the server to enter the first operating system;

calling the baseboard management control module to monitor whether the working state of the first operating system is normal or not in real time; if so, continuing to call the substrate management control module to monitor the working state of the first operating system; if not, executing the next step:

calling the baseboard management control module to restart the first operating system, and judging whether the restart of the first operating system is normal; if so, calling the basic input and output system to restart the first operating system; if not, calling the basic input and output system to start the second operating system so as to facilitate automatic recovery of the service.

The service automatic recovery method provided by the present embodiment will be described in detail below with reference to the drawings. The method for automatically recovering the service is applied to a server. Please refer to fig. 1, which is a schematic diagram of a system architecture of a server. As shown in fig. 1, the server 1 includes an arithmetic module 11 and a baseboard management control module 12(BMC module) connected to the arithmetic module 11 (specifically, connected through a KCS interface). The operation module 11 is connected to a network configuration module 13 in communication.

The operation module 11 and the baseboard management control module 12 are disposed on a main board 14 of the server 1. A first operating system 111, a second operating system 112 and a basic input/output system are provided for the operation module 11, and application programs are deployed for the first operating system 111 and the second operating system 112. The application program can automatically obtain the application configuration file matched with the application program from the network configuration module 2.

The application program is a general edge computing MEC application program, which is designed and deployed by a user who purchases a server, for example, the application program can be automatic driving, AI operation and data acceleration, and an application configuration file of the application program is also defined according to the characteristics of the application program.

In this embodiment, the first operating system 111 and the second operating system 112 are redundant operating systems.

In this embodiment, the BIOS is BIOS firmware (also called BIOS firmware) that has a function of recovering the operating system and is solidified on the motherboard. During the starting process, the BIOS firmware interacts with the baseboard management control module 12 to query the operating state of the running system.

Please refer to fig. 2, which is a flowchart illustrating an exemplary embodiment of an automatic service recovery method. As shown in fig. 2, the method for automatically recovering a service specifically includes the following steps:

and S21, after the server is started, guiding the running system of the server to enter the first operating system.

S22, after entering the first operating system, the first operating system obtains an application configuration file matched with the application program deployed on the first operating system from the network configuration module, so as to complete service operation.

S23, calling the baseboard management control module to monitor whether the working state of the first operating system is normal or not in real time; if yes, returning to the step S23, namely continuing to call the baseboard management control module to monitor the working state of the first operating system; if not, go to S24.

S24, the bmc module restarts the first os, and hands over the system management right of the server to the BIOS (BIOS firmware), and the BIOS initiates a restart query to the bmc module to determine whether the restart of the first os is a normal restart; if yes, go to S25; if not, S26 is executed.

In this embodiment, the bios initiates a restart query to the baseboard management control module to obtain whether the restart of the first operating system is caused by the crash of the first operating system. If so, indicating that the restart of the first operating system is abnormal restart. If not, the restarting of the first operating system is normal.

In practical applications, when a server encounters conditions such as unexpected power failure, external impact, software crash, and the like, the operating system crashes.

S25, calling the basic input output system to restart the first operating system, and guiding the running system of the server to enter the first operating system.

And S26, calling the basic input output system to start the second operating system, and guiding the running system of the server to enter the second operating system.

And S27, after entering the second operating system, the second operating system acquires an application configuration file matched with the application program deployed on the second operating system from the network configuration module. In this embodiment, the configuration of the application program before the loading crash of the application program in the second operating system as the backup from the network configuration module is continuously run, so as to improve the reliability of the whole edge computing node.

The automatic service recovery method of the embodiment realizes the automatic service recovery function when the operating system is completely crashed based on the server firmware, thereby greatly shortening the time required by the recovery operation service, reducing the customer loss and simultaneously reducing the manual maintenance cost.

The present embodiment also provides a readable storage medium (also referred to as computer readable storage medium) having a computer program stored thereon, which when executed by a processor implements the automatic service restoration method.

One of ordinary skill in the art will appreciate that the computer-readable storage medium is: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Example two

The embodiment provides an automatic service recovery system, which is characterized in that the system is applied to a server, and the server comprises an operation module and a substrate management control module connected with the operation module; the operation module is provided with a first operation system, a second operation system and a basic input and output system; the automatic service recovery system comprises:

the guiding unit is used for guiding the running system of the server to enter the first operating system after the server is started;

the calling unit is used for calling the baseboard management control module to monitor whether the working state of the first operating system is normal or not in real time; if so, continuing to call the substrate management control module to monitor the working state of the first operating system; if not, continuing to call the baseboard management control module to restart the first operating system, and judging whether the restart of the first operating system is normal or not through a judging unit; if yes, calling the basic input and output system through the calling unit to restart the first operating system; if not, the basic input and output system is called by the calling unit to start the second operating system so as to facilitate automatic recovery of the service.

The service automatic recovery system provided by the present embodiment will be described in detail with reference to the drawings. The automatic service recovery system according to this embodiment is applied to a server shown in fig. 1. Please refer to fig. 3, which is a schematic structural diagram of an embodiment of an automatic service recovery system. As shown in fig. 3, the service automatic recovery system 3 includes a guiding unit 31, a calling unit 32, and a determining unit 33.

The guiding unit 31 is configured to guide the running system of the server to enter the first operating system after the server is started.

In this embodiment, after the booting unit 31 boots the system to enter the first operating system, the first operating system obtains, from the network configuration module, an application configuration file matched with an application program deployed on the first operating system, so as to complete business operations.

The calling unit 32 coupled to the guiding unit 31 is configured to call the bmc module to monitor whether the working state of the first os is normal in real time; if so, continuing to call the substrate management control module to monitor the working state of the first operating system; if not, the calling unit 32 calls the baseboard management control module to restart the first operating system, so that the baseboard management control module hands over the system management right of the server to the basic input/output system (BIOS firmware), and calls the basic input/output system to initiate a restart query to the baseboard management control module.

The judging unit 33 coupled to the invoking unit 32 is configured to judge whether the reboot of the first operating system is a normal reboot; if so, calling the basic input and output system to restart the first operating system, and guiding the running system of the server to enter the first operating system; if not, calling the basic input and output system to start the second operating system, and guiding the running system of the server to enter the second operating system.

In an embodiment, after entering the second operating system, the second operating system obtains, from the network configuration module, an application configuration file matched with an application program deployed on the second operating system. In this embodiment, the configuration of the application program before the loading crash of the application program in the second operating system as the backup from the network configuration module is continuously run, so as to improve the reliability of the whole edge computing node.

It should be noted that the division of the modules of the above system is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And the modules can be realized in a form that all software is called by the processing element, or in a form that all the modules are realized in a form that all the modules are called by the processing element, or in a form that part of the modules are called by the hardware. For example: the guiding module can be a processing element which is established separately, and can also be integrated in a certain chip of the system. The boot module may be stored in the memory of the system in the form of program code, and may be called by a processing element of the system to execute the functions of the above modules. Other modules are implemented similarly. All or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software. These above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), one or more microprocessors (DSPs), one or more Field Programmable Gate Arrays (FPGAs), and the like. When a module is implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. These modules may be integrated together and implemented in the form of a System-on-a-chip (SOC).

EXAMPLE III

This embodiment provides a server, including: a processor, memory, transceiver, communication interface, or/and system bus; the memory and the communication interface are connected with the processor and the transceiver through a system bus and are used for completing mutual communication, the memory is used for storing the computer program, the communication interface is used for communicating with other equipment, and the processor and the transceiver are used for running the computer program so as to enable the server to execute the steps of the automatic service recovery method according to the embodiment one. In this embodiment, the processor includes an operation module, a substrate management control module connected to the operation module, and a network configuration module connected to the operation module. The operation module is provided with a first operating system, a second operating system and a basic input and output system (BIOS firmware).

The above-mentioned system bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The communication interface is used for realizing communication between the database access device and other equipment (such as a client, a read-write library and a read-only library). The Memory may include a Random Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components.

The protection scope of the method for automatically recovering a service according to the present invention is not limited to the execution sequence of the steps listed in this embodiment, and all the schemes of adding, subtracting, and replacing steps in the prior art according to the principles of the present invention are included in the protection scope of the present invention.

The present invention also provides an automatic service recovery system, which can implement the automatic service recovery method of the present invention, but the implementation apparatus of the automatic service recovery method of the present invention includes, but is not limited to, the structure of the automatic service recovery system described in this embodiment, and all structural modifications and substitutions in the prior art made according to the principle of the present invention are included in the scope of the present invention.

In summary, the method, the system, the readable storage medium and the server for automatically recovering the service of the present invention implement the automatic service recovery function when the operating system is completely crashed based on the server firmware, thereby greatly reducing the time required for recovering the operation service, reducing the customer loss, and reducing the labor maintenance cost. The invention effectively overcomes various defects in the prior art and has high industrial utilization value.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. The method is characterized in that the method is applied to a server, and the server comprises an operation module and a substrate management control module connected with the operation module; the operation module is provided with a first operation system, a second operation system and a basic input and output system; the automatic service recovery method comprises the following steps:

2. The automatic traffic restoration method according to claim 1,

the operation module is in communication connection with a network configuration module;

the first operating system and the second operating system deploy application programs.

3. The method according to claim 2, wherein after entering the first operating system, the method further comprises:

and the first operating system acquires an application configuration file matched with the application program from the network configuration module.

4. The method of claim 2, further comprising:

and after the second operating system is started, the second operating system acquires an application configuration file matched with the application program from the network configuration module.

5. The method according to claim 1, wherein in the process of invoking the baseboard management control module to restart the first operating system, the method further comprises:

and handing over the system management right of the server to the basic input and output system, and initiating a restart query to the baseboard management control module by the basic input and output system so as to judge whether the restart of the first operating system is normal.

6. The method according to claim 1, wherein the computing module and the baseboard management control module are disposed on a motherboard of the server.

7. The method according to claim 6, wherein the bios is bios firmware having a run system recovery function and being solidified on the motherboard.

8. The automatic service recovery system is characterized by being applied to a server, wherein the server comprises an operation module and a substrate management control module connected with the operation module; the operation module is provided with a first operation system, a second operation system and a basic input and output system; the automatic service recovery system comprises:

9. A readable storage medium having stored thereon a computer program, characterized in that the computer program, when being executed by a processor, implements the method for automatic service restoration according to any one of claims 1 to 7.

10. A server, comprising: a processor and a memory;

the memory is used for storing a computer program, and the processor is used for executing the computer program stored by the memory to cause the server to execute the automatic service recovery method according to any one of claims 1 to 7;

the processor comprises an operation module and a substrate management control module connected with the operation module; the operation module is provided with a first operation system, a second operation system and a basic input and output system.