CN114443452A

CN114443452A - Host operating system crash detection method, host and storage medium

Info

Publication number: CN114443452A
Application number: CN202210153102.1A
Authority: CN
Inventors: 孙鸣
Original assignee: Shanghai Xinxi Information Technology Co ltd
Current assignee: Shanghai Xinxi Information Technology Co ltd
Priority date: 2022-02-18
Filing date: 2022-02-18
Publication date: 2022-05-06

Abstract

The present invention relates to the field of electronic devices, and in particular, to a host operating system crash detection method, a host, and a storage medium. The crash detection method of the host operating system is applied to a host comprising a central processing unit and an out-of-band management device, and comprises the following steps: detecting working parameters of a host operating system running on the central processing unit; judging whether the host operating system crashes or not according to the working parameters to obtain a judgment result; and if the judgment result indicates that the host operating system crashes, the working parameters are sent to an out-of-band management device. Compared with the prior art, the host operating system crash detection method, the host and the storage medium provided by the embodiment of the invention have the advantages of completely recording the crash information of the operating system and reducing the reading difficulty of the crash information.

Description

Host operating system crash detection method, host and storage medium

Technical Field

The present invention relates to the field of electronic devices, and in particular, to a method for detecting a crash of a host operating system, a host, and a storage medium.

Background

The Windows operating system is widely applied to various industries and becomes an indispensable part of groups such as enterprises, public institutions and individuals. Therefore, the safe and stable operation of the Windows operating system is very important, and for this reason, various safety products come into existence, especially system recovery products such as Ghost. The Windows operating system has a blue screen recording function and can record the crash reasons for the servers, the switches and other hosts.

However, the inventor of the present invention found that, on a host using the Linux operating system, although the Linux operating system is also provided with Kdump (kernel Dump) function, it is possible to record crash information on a central processing unit running the Linux operating system when the Linux operating system crashes. However, the Kjump function does not completely record the collapse information and has high reading difficulty.

Disclosure of Invention

An object of embodiments of the present invention is to provide a method for detecting a crash of a host operating system, a host and a storage medium, which can reduce difficulty in reading crash information while completely recording crash information of the operating system.

In order to solve the above technical problem, an embodiment of the present invention provides a method for detecting a crash of a host operating system, which is applied to a host including a central processing unit and an out-of-band management device, and includes: detecting working parameters of a host operating system running on the central processing unit; judging whether the host operating system crashes or not according to the working parameters to obtain a judgment result; and if the judgment result indicates that the host operating system crashes, the working parameters are sent to an out-of-band management device.

An embodiment of the present invention further provides a host, including: a central processing unit and an out-of-band management device; the central processing unit is used for detecting working parameters of a host operating system running in the central processing unit, judging whether the host operating system crashes or not according to the working parameters to obtain a judgment result, and if the judgment result indicates that the host operating system crashes, sending the working parameters to an out-of-band management device.

The embodiment of the invention also provides a storage medium, which stores a computer program, and the computer program realizes the crash detection method of the host operating system when being executed by a processor.

Compared with the prior art, the method and the device have the advantages that when the host operating system is judged to be crashed according to the working parameters of the host operating system of the central processing unit, the working parameters are sent to the out-of-band management device, and the out-of-band management device does not need to store the running data of the operating system, so that the storage space is large, and the working parameters (namely crash information) when the host operating system is crashed can be better stored; in addition, after the crash information is sent to the out-of-band management device, when the crash information needs to be inquired, the crash information can be directly read from the out-of-band management device without using the authority of entering the host operating system, so that the crash information can be read more conveniently.

In addition, after judging whether the host operating system crashes according to the working parameters, the method further comprises the following steps: if the judgment result indicates that the host operating system crashes, running a standby micro-kernel to collect system parameters of the host operating system until the restart of the host operating system is completed; and adding the collected system parameters into the working parameters. For the Linux operating system without the Kjump function, a user can load crash information detection software in the standby microkernel in advance, and when the judgment result indicates that the host operating system crashes, the standby microkernel is operated to collect system parameters of the host operating system until the host operating system is restarted, so that the crash information can be detected and recorded.

Additionally, the system parameters collected for the host operating system include: any one or more of central processing unit information parameters, message character information parameters, Dmesg information parameters, process information parameters of the main microkernel, central processing unit temperature information parameters, central processing unit voltage information parameters, fan rotating speed and the like.

In addition, after sending the working parameters to an out-of-band management device, the method further includes: receiving an access request sent by a cloud, wherein the access request at least comprises a parameter type; and uploading the parameters belonging to the parameter types contained in the access request in the working parameters to the cloud. Uploading the working parameters to the cloud end can facilitate the user to access the crash information.

In addition, after judging whether the host operating system crashes according to the working parameters, the method further comprises the following steps: and if the judgment result indicates that the host operating system crashes, sending an IPMI command to the out-of-band management device.

In addition, the sending the operating parameter to an out-of-band management apparatus includes: and transmitting the working parameters to the out-of-band management device through a RestFul interface and/or a Redfish interface.

In addition, the central processing unit is further configured to run a standby microkernel to collect system parameters of the host operating system until the host operating system is restarted and add the collected system parameters to the working parameters when the judgment result indicates that the host operating system crashes.

In addition, the central processor and the out-of-band management device are connected through a RestFul interface and/or a Redfish interface.

Drawings

FIG. 1 is a flowchart illustrating a method for detecting a crash of a host operating system according to a first embodiment of the present invention;

FIG. 2 is a flowchart illustrating a host operating system crash detection method according to a second embodiment of the present invention;

FIG. 3 is a flowchart illustrating a host operating system crash detection method according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of a host according to a fourth embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.

A first embodiment of the present invention relates to a method for detecting a crash of a host operating system, which is applied to a host including a central processing unit and an out-of-band management device, and a specific flow is shown in fig. 1, where the method includes:

step S101: the operating parameters of a host operating system running on a central processing unit are detected.

Specifically, in this step, a host operating system of the central processing unit is detected in real time, and relevant parameters of the host operating system in the working process, such as central processing unit information, display message (Dmesg) information, and process information of the microkernel, are obtained.

Step S102: and judging whether the host operating system crashes or not according to the working parameters, if so, executing the step S103, and if not, executing the step S101 again.

Specifically, in this step, the relevant parameters of the host operating system obtained in step S101 during the working process may be compared with a preset parameter range, and whether the host operating system crashes or not may be determined according to the comparison result; and performs step S103 when the comparison result indicates that the host operating system crashes. It should be understood that the foregoing is only an example of a specific method for determining whether the host operating system crashes in the present embodiment, and is not limited thereto. For example, the crash file is called after the disk format is analyzed, the analysis result is obtained after the crash file is analyzed, and if crash data is recorded in the crash file, the host operating system is crashed.

Step S103: and sending the working parameters to the out-of-band management device.

Specifically, in this step, after determining that the host operating system is crashed, the currently recorded operating parameters, such as central processor information, display message (dmsg) information, process information of the microkernel, and the like, may be sent to an out-of-band management Device connected to the central processor, such as a BMC (Baseboard management Controller), a CPLD (Complex Programmable Logic Device), an FPGA (Field Programmable Gate Array), and the like, which are connected to the central processor, as the out-of-band management Device.

Specifically, in this step, the operating parameters may be sent to the out-of-band management apparatus via a RestFul interface and/or a Redfish interface connecting the central processor and the out-of-band management device. It should be understood that the foregoing sending of the operating parameter to the out-of-band management apparatus via the RestFul Interface and/or the Redfish Interface is only a specific example in this embodiment, and is not limited thereto, and in other embodiments of the present invention, the transmission of the operating parameter may also be performed via other types of interfaces, for example, an SPI (Serial Peripheral Interface), and the setting may be flexible according to actual needs. The Restful interface is a network service based on Http (hypertext Transfer Protocol) or Http (hypertext Transfer Protocol over secure session Layer). REST is an abbreviation of english representational state transfer, chinese translation is representational state transition or representational state transition, and REST is an architecture style of web services, and is a Resource-oriented architecture using widely popular standards and protocols such as HTTP, URI (Uniform Resource Identifier), XML (Extensible Markup Language), and the like. Redfish is a management standard based on http services, and device management is realized by using a RestFul interface. Following the Redfish protocol, industry-wide unified interface specifications can be made. Based on the idea, the invention can design corresponding information resources such as the crash reason, the crash specific information, the storage target and other network interface designs, and complete the information recording of the crash of the operating system.

Compared with the prior art, the host operating system crash detection method provided by the first embodiment of the invention sends the working parameters to the out-of-band management device when judging that the host operating system crashes according to the working parameters, and can better store the working parameters (i.e. crash information) when the host operating system crashes because the out-of-band management device does not need to store the running data of the operating system, and the storage space is larger; in addition, after the crash information is sent to the out-of-band management device, when the crash information needs to be inquired, the crash information can be directly read from the out-of-band management device without using the authority of entering the host operating system, so that the crash information can be read more conveniently.

A second embodiment of the invention relates to a host operating system crash detection method. The second embodiment is substantially the same as the first embodiment, and includes, as shown in fig. 2:

step S201: the operating parameters of a host operating system running on a central processing unit are detected.

Step S202: and judging whether the host operating system crashes or not according to the working parameters, if so, executing the step S203, and if not, executing the step S201 again.

Step S203: and sending the working parameters to the out-of-band management device.

It is to be understood that steps S201 to S203 in the present embodiment are substantially the same as steps S101 to S103 in the first embodiment, and specific reference may be made to the detailed description of the foregoing embodiments, which is not repeated herein.

Step S204: and receiving an access request sent by the cloud, wherein the access request at least comprises a parameter type.

Specifically, in this step, a communication connection may be established between the out-of-band management apparatus and the cloud end, so as to receive an access request sent by the cloud end, where the access request at least includes a parameter type that the cloud end needs to obtain, for example, central processor information when the cloud end needs to access a crash of the host operating system, and the access request at least includes a label for the central processor information, that is, the label indicates the central processor information when the cloud end needs to access the crash of the host operating system.

Step S205: and uploading the parameters belonging to the parameter types contained in the access request in the working parameters to the cloud.

Specifically, in this step, after receiving an access request sent by the cloud, the out-of-band management device sends a parameter of a parameter type that needs to be obtained by the cloud and is included in the access request to the cloud, so as to complete access of the cloud to the crash data.

Compared with the prior art, the out-of-band management device in the host operating system crash detection method provided by the second embodiment of the invention can send the crash data of the data type contained in the access request to the cloud after receiving the access request sent by the cloud, so that the access to the crash data of the host operating system can be directly performed at the cloud, and the convenience and the application range of the access to the crash data of the host are improved; in addition, the present embodiment can also achieve the technical effects provided by the first embodiment, and details are not described herein.

A third embodiment of the invention relates to a host operating system crash detection method. The third embodiment is substantially the same as the first embodiment, and includes, as shown in fig. 3:

step S301: the operating parameters of a host operating system running on a central processing unit are detected.

Step S302: and judging whether the host operating system crashes or not according to the working parameters, if so, executing the step S303, and if not, executing the step S301 again.

It is to be understood that step S301 and step S302 in this embodiment are substantially the same as step S101 and step S102 in the first embodiment, and specific reference may be made to the detailed description of the foregoing embodiments, which is not repeated herein.

Step S303: and running the standby micro-kernel to collect the system parameters of the host operating system until the restart of the host operating system is completed.

Specifically, in this step, after the host operating system crashes, the standby microkernel is run to collect the system parameters of the host operating system in the event period from the crash to before the restart until the restart of the host operating system is completed.

The system parameters collected by the standby microkernel can be any one or more of central processing unit information parameters, message text information parameters, Dmesg information parameters, process information parameters of the main microkernel, central processing unit temperature information parameters, central processing unit voltage information parameters, fan rotating speed and the like.

Specifically, in this step, after the determination result indicates that the host operating system crashes, a preset script software may send an IPMI (Intelligent Platform Management Interface) command to the out-of-band Management apparatus, where the IPMI command generally includes commands of reading a system log, reading a system state, reading a network configuration, and the like, and the sending of the IPMI command to the out-of-band Management apparatus may start the spare microkernel to collect system parameters.

Step S304: and adding the collected system parameters into the working parameters.

It should be understood that adding the collected system parameters into the operating parameters and sending them to the out-of-band management device is only a specific example in this embodiment, and is not limited thereto, and in other embodiments of the present invention, the collected operating parameters may be separately transmitted to the out-of-band management device for storage, and may be flexibly set according to actual needs.

Step S305: and sending the working parameters to the out-of-band management device.

Compared with the prior art, in the host operating system crash detection method provided by the third embodiment of the present invention, the standby microkernel is used to collect the system parameters of the host operating system in the event period from the crash of the host operating system to the restart of the host operating system, so that more crash data can be recorded, and the cause of the crash and the system operation after the crash can be traced conveniently. In addition, the present embodiment can also achieve the technical effects provided by the first embodiment, and details are not described herein.

The steps of the above methods are divided for clarity of description, and may be combined into one step or split into multiple steps during implementation, and all steps are within the scope of the present patent as long as they contain the same logical relationship; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.

A fourth embodiment of the present invention relates to a host computer, as shown in fig. 4, including: the central processing unit 10 is configured to detect a working parameter of a host operating system running on the central processing unit 10, determine whether the host operating system crashes according to the working parameter, obtain a determination result, and send the working parameter to the out-of-band management device 20 if the determination result indicates that the host operating system crashes. The host further comprises an out-of-band management device 20, the out-of-band management device 20 is connected with the central processor 10, and the out-of-band manager 20 is configured to receive the operating parameters sent by the central processor 10 and store the operating parameters.

Specifically, in the present embodiment, the central processor 10 and the out-of-band management apparatus 20 may be connected via a RestFul interface and/or a Redfish interface. It should be understood that the foregoing is merely an example of a specific connection Interface between the central processing unit 10 and the out-of-band management apparatus 20 in this embodiment, and is not limited thereto, and in other embodiments of the present invention, the connection may be performed via another type of Interface, such as SPI (Serial Peripheral Interface), and the connection may be flexibly set according to actual needs.

In addition, in an embodiment of the present invention, the central processing unit 10 may be further configured to, when the determination result indicates that the host operating system crashes, run the standby microkernel to collect system parameters of the host operating system until the host operating system is restarted, and add the collected system parameters to the working parameters and send the working parameters to the out-of-band management device for storage.

It should be understood that this embodiment is an apparatus embodiment corresponding to the method embodiment described above, and that this embodiment can be implemented in cooperation with the method embodiment described above. The related technical details and technical effects mentioned in the foregoing method embodiments are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related art details mentioned in the present embodiment can also be applied to the foregoing method embodiments.

A fifth embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program realizes the above-described method embodiments when executed by a processor.

That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Those skilled in the art can understand that all or part of the steps in the method of the foregoing embodiments may be implemented by a program to instruct related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, etc.) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims

1. A crash detection method for a host operating system is applied to a host comprising a central processing unit and an out-of-band management device, and is characterized by comprising the following steps:

detecting working parameters of a host operating system running on the central processing unit;

judging whether the host operating system crashes or not according to the working parameters to obtain a judgment result;

and if the judgment result indicates that the host operating system crashes, the working parameters are sent to an out-of-band management device.

2. The method of claim 1, wherein after determining whether the host os crashes according to the working parameters, the method further comprises:

if the judgment result indicates that the host operating system crashes, running a standby micro-kernel to collect system parameters of the host operating system until the restart of the host operating system is completed;

and adding the collected system parameters into the working parameters.

3. The host operating system crash detection method of claim 2, wherein collecting system parameters of the host operating system comprises:

any one or more of central processing unit information parameters, message character information parameters, Dmesg information parameters, process information parameters of the main microkernel, central processing unit temperature information parameters, central processing unit voltage information parameters, fan rotating speed and the like.

4. The method of claim 1, wherein sending the operating parameters to an out-of-band management device further comprises:

receiving an access request sent by a cloud, wherein the access request at least comprises a parameter type;

and uploading the parameters belonging to the parameter types contained in the access request in the working parameters to the cloud.

5. The method of claim 1, wherein after determining whether the host os crashes according to the working parameters, the method further comprises:

and if the judgment result indicates that the host operating system crashes, sending an IPMI command to the out-of-band management device.

6. The method of claim 1, wherein sending the operating parameters to an out-of-band management device comprises:

and transmitting the working parameters to the out-of-band management device through a RestFul interface and/or a Redfish interface.

7. A host, comprising:

a central processing unit and an out-of-band management device;

the central processing unit is used for detecting working parameters of a host operating system running in the central processing unit, judging whether the host operating system crashes or not according to the working parameters to obtain a judgment result, and if the judgment result indicates that the host operating system crashes, sending the working parameters to an out-of-band management device.

8. The host of claim 7, wherein the cpu is further configured to run a standby microkernel to collect system parameters of the host os until the reboot of the host os is completed and add the collected system parameters to the working parameters when the determination result indicates that the host os crashes.

9. Host according to claim 7, wherein the central processor is connected to the out-of-band management device via a RestFul interface and/or a Redfish interface.

10. A storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the host operating system crash detection method of any one of claims 1 to 6.