CN117873771B - System downtime processing method, device, equipment, storage medium and server - Google Patents

System downtime processing method, device, equipment, storage medium and server Download PDF

Info

Publication number
CN117873771B
CN117873771B CN202410269824.2A CN202410269824A CN117873771B CN 117873771 B CN117873771 B CN 117873771B CN 202410269824 A CN202410269824 A CN 202410269824A CN 117873771 B CN117873771 B CN 117873771B
Authority
CN
China
Prior art keywords
information
downtime
operating system
system downtime
processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410269824.2A
Other languages
Chinese (zh)
Other versions
CN117873771A (en
Inventor
杨磊
张旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Computer Technology Co Ltd
Original Assignee
Inspur Computer Technology Co Ltd
Filing date
Publication date
Application filed by Inspur Computer Technology Co Ltd filed Critical Inspur Computer Technology Co Ltd
Priority to CN202410269824.2A priority Critical patent/CN117873771B/en
Publication of CN117873771A publication Critical patent/CN117873771A/en
Application granted granted Critical
Publication of CN117873771B publication Critical patent/CN117873771B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention relates to the technical field of servers, in particular to a system downtime processing method, a device, equipment, a storage medium and a server, which are realized by establishing a shared storage space between an operating system processor and the base plate management controller in a built-in static random access memory based on the base plate management controller comprising the built-in static random access memory and the built-in static random access memory connected with the operating system processor through a high-performance bus.

Description

System downtime processing method, device, equipment, storage medium and server
Technical Field
The present invention relates to the field of server technologies, and in particular, to a method, an apparatus, a device, a storage medium, and a server for processing a system downtime.
Background
A system downtime event is unavoidable during the operation of the server, and a scheme is required to collect key data left by the system when the system is down for subsequent analysis and maintenance. The system downtime information is typically collected by an operating system installed in a central processing unit (Central Processing Unit, CPU) interacting with a baseboard management controller (Baseboard Management Controller, BMC) to transfer the data.
In a server of the x86 platform (hereinafter referred to as an x86 server), a central processing unit triggers a baseboard management controller to collect System downtime information and generate a System downtime log through non-maskable interrupts (Non Maskable Interrupt, NMI) at the moment of downtime of an Operating System (OS). For a server such as a server of an advanced instruction set processor (ADVANCED RISC MACHINES, ARM) platform (hereinafter referred to as an ARM server) that does not support non-maskable interrupts, effective collection of system downtime information cannot be achieved in this manner.
The problem of system downtime information collection of a server which does not support non-maskable interrupt is solved, and the problem is a technical problem which needs to be solved by a person skilled in the art.
Disclosure of Invention
The invention aims to provide a system downtime processing method, device, equipment, storage medium and server, which are used for solving the problem of system downtime information collection of a server which does not support non-maskable interrupt.
In order to solve the technical problems, the present invention provides a system downtime processing method, which is applied to a baseboard management controller including a built-in static random access memory, wherein the built-in static random access memory is connected with an operating system processor of a device through a high-level high-performance bus, and the system downtime processing method comprises:
deploying the operating system processor to establish a memory mapping from the operating system processor to the built-in static random access memory by utilizing a basic input output system of the equipment and an intelligent platform management interface command which is handshake customized by the baseboard management controller, so as to obtain a shared memory space between the operating system processor and the baseboard management controller;
When the operating system processor is identified to write a write signal of system downtime information in the shared storage space through the intelligent platform management interface command, the system downtime information is read from the shared storage space;
And generating a system downtime log according to the system downtime information.
In some implementations, the generating a system downtime log from the system downtime information includes:
when the write signal is identified, accessing a crash dump file system arranged in the electrified erasable programmable read-only memory through a serial peripheral bus to create a crash dump file;
and copying the system downtime information to the crash dump file to obtain the system downtime log.
In some implementations, the deploying the operating system processor handshakes customized intelligent platform management interface commands with a basic input output system of the device and the baseboard management controller to establish a memory mapping from the operating system processor to the built-in static random access memory, resulting in a shared memory space between the operating system processor and the baseboard management controller, including:
determining the address range of the required shared storage space according to the data size of the system downtime information to be recorded at a time;
Initializing the intelligent platform management interface command to map the address range of the shared memory space to the physical memory address of the operating system processor and the physical memory address of the baseboard management controller, so that the operating system processor can access the shared memory space by using the basic input output system to send the intelligent platform management interface command.
In some implementations, the reading the system downtime information from the shared memory space when the operating system processor is identified to write a write signal of the system downtime information in the shared memory space through the intelligent platform management interface command includes:
and polling the shared storage space, and reading the system downtime information from the shared storage space when the system downtime information is polled.
In some implementations, the reading the system downtime information from the shared storage space when the system downtime information is polled includes:
when the system downtime information is polled, determining that a system downtime event occurs;
And after the system downtime information is identified to obtain the written-out zone bit, determining that the operating system processor has completed writing the system downtime information, and reading the system downtime information from the shared storage space.
In some implementations, the generating a system downtime log from the system downtime information includes:
generating a first check value according to the system downtime information by using a check algorithm;
Identifying a second check value in the system downtime information;
if the first check value is consistent with the second check value, determining that the system downtime information is completed, and generating the system downtime log according to the system downtime information;
And if the first check value is inconsistent with the second check value, determining that the system downtime information is incomplete.
In some implementations, the reading the system downtime information from the shared memory space when the operating system processor is identified to write a write signal of the system downtime information in the shared memory space through the intelligent platform management interface command includes:
And when the system downtime event of the operating system processor is monitored through the heartbeat signal, reading the system downtime information from the shared storage space.
In some implementations, the monitoring, by the heartbeat signal, that the operating system processor is experiencing a system downtime event includes:
And determining that a system downtime event occurs to the operating system processor when the heartbeat signal sent by the operating system processor is not received and/or when the heartbeat signal carrying the system downtime alarm information sent by the operating system processor is received beyond the heartbeat interval time of the operating system.
In some implementations, the reading the system downtime information from the shared memory space when the operating system processor is identified to write a write signal of the system downtime information in the shared memory space through the intelligent platform management interface command includes:
polling a system downtime detection pin appointed by the operating system processor;
And when the state change of the system downtime detection pin is monitored, determining that the write signal is received, and reading the system downtime information from the shared storage space.
In some implementations, further comprising:
Analyzing the system downtime log to obtain system fault information;
Generating system downtime fault diagnosis information according to the system fault information;
inquiring according to the system downtime fault diagnosis information to obtain corresponding system maintenance suggestions;
and outputting the system downtime log, the system downtime fault diagnosis information and the system maintenance advice.
In some implementations, further comprising:
Reading the system running state information and/or the system asset information from the shared memory space when the operating system processor is identified to write the system running state information and/or the system asset information in the shared memory space;
And generating a system operation log according to the system operation state information and/or the system asset information.
In some implementations, the deploying the operating system processor handshakes customized intelligent platform management interface commands with a basic input output system of the device and the baseboard management controller to establish a memory mapping from the operating system processor to the built-in static random access memory, resulting in a shared memory space between the operating system processor and the baseboard management controller, including:
determining the address range of the required shared storage space according to the data size of the system downtime information to be recorded at a time;
Checking the access right of the baseboard management controller to the built-in static random access memory;
if the baseboard management controller has access right to the built-in static random access memory, starting the drive of the built-in static random access memory;
if the drive of the built-in static random access memory is successfully started, initializing memory mapping to finish the initialization of the intelligent platform management interface command; the memory map includes a map of an address range of the shared memory space to a physical memory address of the operating system processor and a map of an address range of the shared memory space to a physical memory address of the baseboard management controller;
And when the operating system processor is identified to write a write signal of the system downtime information in the shared storage space through the intelligent platform management interface command, reading the system downtime information from the shared storage space, wherein the method comprises the following steps of:
The write signal is monitored by at least one mode of polling the data of the shared storage space, polling a system downtime detection pin agreed with the operating system processor and monitoring a system downtime event of the operating system processor through a heartbeat signal;
when at least one condition of polling the system downtime information in the shared storage space, monitoring that the state change of the system downtime detection pin occurs, not receiving a heartbeat signal sent by the operating system processor beyond the heartbeat interval time of an operating system, and receiving a heartbeat signal carrying system downtime alarm information sent by the operating system processor is met, determining that a system downtime event occurs in the operating system processor, and reading the system downtime information from the shared storage space;
the generating a system downtime log according to the system downtime information comprises the following steps:
when the write signal is identified, accessing a crash dump file system arranged in the electrified erasable programmable read-only memory through a serial peripheral bus to create a crash dump file;
and copying the system downtime information to the crash dump file to obtain the system downtime log.
In order to solve the above technical problems, the present invention further provides a server, including: the system comprises a baseboard management controller and an operating system processor, wherein a built-in static random access memory of the baseboard management controller is connected with the operating system processor through a high-level high-performance bus;
The baseboard management controller is used for deploying the operating system processor and utilizing a basic input and output system of the equipment and an intelligent platform management interface command which is customized by handshaking of the baseboard management controller so as to establish a storage mapping from the operating system processor to the built-in static random access memory, and obtain a shared storage space between the operating system processor and the baseboard management controller; when the operating system processor is identified to write a write signal of system downtime information in the shared storage space through the intelligent platform management interface command, the system downtime information is read from the shared storage space; and generating a system downtime log according to the system downtime information.
In order to solve the technical problem, the present invention also provides a system downtime processing apparatus, which is applied to a baseboard management controller including a built-in static random access memory, wherein the built-in static random access memory is connected with an operating system processor of a device through a high-level high-performance bus, and the system downtime processing apparatus includes:
The deployment unit is used for deploying the operating system processor to establish a memory mapping from the operating system processor to the built-in static random access memory by utilizing the basic input and output system of the equipment and the intelligent platform management interface command which is customized by handshaking of the baseboard management controller, so as to obtain a shared memory space between the operating system processor and the baseboard management controller;
The first monitoring unit is used for reading the system downtime information from the shared storage space when the operating system processor is identified to write a writing signal of the system downtime information in the shared storage space through the intelligent platform management interface command;
and the first generation unit is used for generating a system downtime log according to the system downtime information.
In order to solve the technical problem, the invention also provides a system downtime treatment device, which comprises:
A memory for storing a computer program;
And the processor is used for executing the computer program, and the computer program realizes the steps of the system downtime processing method according to any one of the above steps when being executed by the processor.
In order to solve the above technical problem, the present invention further provides a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the system downtime processing method according to any one of the above.
The invention provides a system downtime processing method, which is realized based on a baseboard management controller comprising a built-in static random access memory and a shared storage space between the built-in static random access memory and the baseboard management controller after the built-in static random access memory is connected with an operating system processor of a device, and particularly comprises the steps of deploying the operating system processor, utilizing a basic input output system of the device and an intelligent platform management interface command which is customized by handshaking of the baseboard management controller to establish a storage mapping from the operating system processor to the built-in static random access memory, so that the operating system processor can write system downtime information into the shared storage space through the intelligent platform management interface command, the baseboard management controller reads the system downtime information from the shared storage space after recognizing a write signal, and generates a system downtime log according to the system downtime information, thereby solving the problem that a server which does not support the non-maskable interrupt cannot collect the system downtime information in time through the non-maskable interrupt, and the potential safety hazards that the server and the like on a stage-advanced instruction set processor platform cannot collect the system downtime information through the non-maskable interrupt.
The invention also provides a system downtime processing device, equipment, a storage medium and a server, which have the beneficial effects and are not described in detail herein.
Drawings
For a clearer description of embodiments of the invention or of the prior art, the drawings that are used in the description of the embodiments or of the prior art will be briefly described, it being apparent that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained from them without inventive effort for a person skilled in the art.
Fig. 1 is a hardware architecture diagram of a server according to an embodiment of the present invention;
FIG. 2 is a flowchart of a first system downtime processing method according to an embodiment of the present invention;
FIG. 3 is a flowchart of a second system downtime processing method according to an embodiment of the present invention;
FIG. 4 is a flowchart of a third system downtime processing method according to an embodiment of the present invention;
Fig. 5 is a schematic structural diagram of a system downtime treatment apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a system downtime treatment apparatus according to an embodiment of the present invention.
Detailed Description
The invention provides a system downtime processing method, device, equipment, storage medium and server, which are used for solving the problem of system downtime information collection of a server which does not support non-maskable interrupt.
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The following describes an embodiment of the present invention.
Fig. 1 is a hardware architecture diagram of a server according to an embodiment of the present invention.
For easy understanding, the server provided by the embodiment of the present invention will be described first.
The embodiments of the present invention may be applied to any server having a baseboard management controller, such as an x86 server, an ARM server, etc. In the server provided by the embodiment of the invention, when a system downtime event occurs in an operating system, a system downtime log is generated according to the system downtime information through the substrate management controller. The baseboard management controller executes the remote management controller of the server, can realize the functions of relevant control, information supervision and the like of the server, and is a platform for intuitively presenting the information of the server. The server client may access the baseboard management controller of the server through the internet (web), an intelligent platform management interface management tool (ipmitool), a simple network management protocol (Simple Network Management Protocol, SNMP) tool, etc. to obtain information of the server. In the baseboard management controller, a crash dump (Crush Dump) folder is generally used to save log files generated when in-band system errors and program errors occur. In the operation providing and running process, if the operating system has abnormal behavior, a fault information receiving mechanism of the operating system is triggered, at the moment, the operating system generates a signal and sends the signal to a corresponding process, in the subsequent processing, partial key error log files are recorded and are transmitted to a baseboard management controller in a certain mode, the baseboard management controller can uniformly sort and generate a crash dump folder, so that the partial information can be acquired through a one-key log command of the baseboard management controller, and the indirect observation server generates what kind of abnormality, thereby facilitating the subsequent maintenance and analysis.
Because the downtime of the operating system of the server belongs to a prompt event, an operating system processor carried by the operating system is required to interact with the baseboard management controller in time to transfer the downtime information of the system. Servers are classified into x86 servers and ARM servers according to the type of operating system processor employed. At the x86 server, the interaction between the operating system processor and the baseboard management controller can be performed through a universal serial bus (Universal Serial Bus, USB) at present to transfer the system downtime information, and the operating system processor triggers the baseboard management controller to generate a crash dump log through the non-maskable interrupt when the operating system is downtime. After triggering the non-maskable interrupt, the processor must respond to a valid non-maskable interrupt regardless of the state register IF bit, so the trigger signal mentioned in the crash dump service of the x86 server is a non-maskable interrupt.
However, the operating system processor of the ARM server does not support the non-maskable interrupt, the baseboard management controller belongs to the information end of the unidirectional monitoring operating system processor, and in order to reasonably utilize the self resources of the baseboard management controller, this type of monitoring adopts a polling mechanism, and cannot achieve real-time monitoring, that is, the baseboard management controller side cannot timely acquire the system downtime information and the file transmission state trigger, that is, cannot achieve the collection of crash dump files through a path such as an x86 server. Further, in the ARM server, the operating system processor and the baseboard management controller interact through a two-wire serial bus (Inter-INTEGRATED CIRCUIT, I2C), the transmission rate of the two-wire serial bus is far lower than that of the universal serial bus, and at the moment of a system downtime event, the system downtime information is not transmitted to the baseboard management controller through the path. The crash dump log downloading function is not realized on a server which does not support the non-maskable interrupt, such as an ARM server, so that great trouble is brought to daily maintenance of a machine, and user experience is affected.
Therefore, the embodiment of the invention provides a server, the transmission of the system downtime information is realized by specifying a shared storage space which can be accessed by a substrate management controller and an operating system processor in the server, the operating system processor can write the system downtime information into the shared storage space through a basic input output system (Basic Input Output System, BIOS) when the system is downtime, and the substrate management controller can read the shared storage space to obtain the system downtime information so as to generate a system downtime log.
As shown in fig. 1, a server provided in an embodiment of the present invention includes: a baseboard management controller 101 and an operating system processor 102, wherein a built-in static random access memory of the baseboard management controller 101 is connected with the operating system processor 102 through a high-level high-performance bus;
The baseboard management controller 101 is configured to deploy an os processor 102, handshake a customized intelligent platform management interface command with the bios of the device and the baseboard management controller 101, so as to establish a memory mapping from the os processor 102 to a built-in sram, and obtain a shared memory space between the os processor 102 and the baseboard management controller 101; when the operating system processor 102 is identified to write a write signal of the system downtime information in the shared storage space through the intelligent platform management interface command, the system downtime information is read from the shared storage space; and generating a system downtime log according to the system downtime information.
In the server provided in the embodiments of the present invention, the operating system processor 102 may be a central processing unit with an operating system or other processors with an operating system for providing system downtime information. Creating a shared Memory space between the operating system processor 102 and the baseboard management controller 101 is achieved by using a built-in Static Random-Access Memory (SRAM, original INTERNAL SRAM) of the baseboard management controller 101. In the baseboard management controller 101, a processor chip of the baseboard management controller 101 can access the built-in static random access memory through a high-performance Bus (ADVANCED HIGH-performance Bus, AHB), and can also access the charged erasable programmable read-only memory ((ELECTRICALLY ERASABLE PROGRAMMABLE READ ONLY MEMORY, EEPROM) 103) through a serial peripheral interface Bus through a serial peripheral interface (SERIAL PERIPHERAL INTERFACE, SPI) of the baseboard management controller 101.
The baseboard management controller 101 with built-in static random access memory may employ, but is not limited to Aspeed 2600 chips, aspeed 2700 chips, ham 2 chips, etc. The built-in static random access memory supports a number of sizes of internal static random access memory block buffers (SRAM blocks). Taking Aspeed chips as an example, the internal sram block buffer has three data blocks (blocks) of 64KB, 24KB, and 1KB, based on which parity check (for subsequent data check) is available at power-up, configured for use by the baseboard management controller 101 processor chip. To establish a shared memory space between the baseboard management controller 101 and the os processor 102, the os processor 102 is connected to the internal sram through the high-performance bus, so as to implement a hardware base that the os processor 102 directly operates the internal sram. On this basis, by customizing the command of the intelligent platform management interface (INTELLIGENT PLATFORM MANAGEMENT INTERFACE, IPMI) between the operating system processor 102 and the baseboard management controller 101 for accessing the shared memory space, the software foundation that the operating system processor 102 directly operates the built-in static random access memory is realized, and meanwhile, the function that the operating system processor 102 and the baseboard management controller 101 processor chip access the built-in static random access memory do not collide is realized.
By using the baseboard management controller 101 with the built-in static random access memory, the read-write speed of the static random access memory is higher, the shared memory space between the baseboard management controller 101 and the operating system processor 102 is divided by the built-in static random access memory of the baseboard management controller 101, so that rapid system downtime information interaction can be realized at the moment of system downtime event, and the read-write speed of the static random access memory is higher than the transmission speed of the universal serial bus.
Or for a server where the baseboard management controller 101 does not have a built-in static random access memory, it may be determined that a shared memory space between the baseboard management controller 101 and the operating system processor 102 is divided in a shared memory in addition to the baseboard management controller 101, so as to enable the operating system processor 102 to write system downtime information into the shared memory space, the baseboard management controller 101 reads the system downtime information from the shared memory space, and then generates a system downtime log according to the system downtime information.
After the baseboard management controller 101 reads the system downtime information, the system downtime information can be written into the charged erasable programmable read-only memory 103 through the serial peripheral interface via the serial peripheral interface bus, and a system downtime log is generated in the charged erasable programmable read-only memory 103, so that the loss of the system downtime information after power failure is avoided.
On the basis of the above architecture, the system downtime processing method provided by the embodiment of the invention is described below with reference to the accompanying drawings.
The second embodiment of the present invention will be described below.
Fig. 2 is a flowchart of a first system downtime processing method according to an embodiment of the present invention.
As shown in fig. 2, the system downtime processing method is applied to a baseboard management controller including a built-in static random access memory, wherein the built-in static random access memory is connected with an operating system processor of a device where the built-in static random access memory is located through a high-level high-performance bus, and comprises the following steps:
S201: the deployment operating system processor handshakes the customized intelligent platform management interface command by using the basic input and output system of the device and the baseboard management controller to establish the memory mapping from the operating system processor to the built-in static random access memory, and obtain the shared memory space between the operating system processor and the baseboard management controller.
S202: when the operating system processor is identified to write a write signal of the system downtime information in the shared storage space through the intelligent platform management interface command, the system downtime information is read from the shared storage space.
S203: and generating a system downtime log according to the system downtime information.
In specific implementation, the size of the shared storage space can be determined according to the size of the system downtime information to be stored, and enough system downtime information needs to be captured, and the storage space is occupied as little as possible, so that resource utilization optimization of the shared storage space is realized, and meanwhile, useful data support can be provided when server faults are analyzed. Under different server application scenarios, shared storage spaces of different sizes may be partitioned.
S201: deploying an operating system processor to handshake a customized intelligent platform management interface command with a basic input output system and a baseboard management controller of a device to establish a memory mapping from the operating system processor to a built-in static random access memory to obtain a shared memory space between the operating system processor and the baseboard management controller may include:
Determining the address range of the required shared storage space according to the data size of the system downtime information to be recorded at a time;
initializing the intelligent platform management interface command to map the address range of the shared memory space to the physical memory address of the operating system processor and the physical memory address of the baseboard management controller, so that the operating system processor can access the shared memory space by using the basic input output system to send the intelligent platform management interface command.
When the shared storage space between the baseboard management controller and the operating system processor is created, the memory mapping from the physical address of the baseboard management controller to the shared storage space and the memory mapping from the physical address of the operating system processor to the shared storage space are required to be established, so that the read-write operation on the shared storage space is realized. In order to avoid the operation conflict generated by the baseboard management controller and the operating system processor, a control right switching mechanism of the shared storage space needs to be designed to realize a stable and efficient switching process. The operating system processor may access the shared memory space based on the basic input output system. And constructing a baseboard management controller and a basic input/output system to write in and out files in the shared storage space, and designing a data transmission channel and a protocol to support safe and rapid data transmission between different modules. Establishing a perfect code framework helps to ensure orderly integration of these functions to achieve seamless transitions from baseboard management controllers to basic input output systems or vice versa under different conditions.
The shared memory space can be part or all of the memory space of the built-in static random access memory according to the requirement of downtime information of the memory system. Furthermore, as described in the first embodiment of the present invention, for a server where the baseboard management controller does not have a built-in static random access memory, the shared memory space may be a shared memory device other than the baseboard management controller and the operating system processor.
Meanwhile, a script for generating a system downtime log according to the system downtime information is deployed in the baseboard management controller, so that the system downtime log can be generated in time after the system downtime log in the shared storage space is read.
The custom protocol defining the bios and baseboard management controller access the embedded sram handshake protocol communicates by accessing a designated folder under the baseboard management controller. For example, the operating system processor may specify that the folder create a particular file (e.g., may be named asset json) under the baseboard management controller, and that the appointment storage directory be a file system/tmp/, under the baseboard management controller. The operating system processor may specifically use an intelligent platform management interface user (IPMI OEM) command to create and delete a file in the shared storage space, and send a command under the operating system through an intelligent platform management interface customized by handshaking with the bios, so that the baseboard management controller performs an operation flow of creating a specific file after receiving the command. The baseboard management controller judges whether to execute operation on the built-in static random access memory to acquire useful information by monitoring the existence of files.
For S202, when the operating system is down, the operating system processor writes the system down information into the shared storage space, and then reads the system down information from the baseboard management controller to the shared storage space. The system downtime information can comprise key data such as system state information, register information, stack information, operation logs and the like when the system is downtime, can comprise information of each component such as a central processing unit, a hard disk, a memory and the like when an operating system is downtime, and can also comprise firmware versions corresponding to each component.
The opportunity for the operating system processor to write the system downtime information in the shared memory space may include: after the equipment is started and enters the operating system, the operating system processor monitors the running state of the operating system, and if the abnormal running information of the operating system is monitored, the system downtime information is written in the shared storage space. For example, after the device is started, when the basic input/output system is started and enters the operating system, the running state of the operating system is monitored, and if abnormal running information of the operating system is monitored, the operating system processor writes an asset.json file in a shared storage space of the built-in static random access memory through the basic input/output system. The baseboard management controller will check if there is an asset json file in the shared memory space after power-up. Under the default condition, the basic input/output system has control right to the built-in static random access memory, an asset.json file is created after the basic input/output system performs operation to the built-in static random access memory, and the baseboard management controller judges whether the system downtime information can be acquired or not by polling/tmp/whether the asset.json file exists under the folder or not, and further processing is performed.
The opportunity of the operating system processor to write the system downtime information in the shared memory space may further include: when the equipment is in an operating state, the operating system processor writes the system downtime information in the shared storage space when the system downtime signal is monitored.
In addition to the system downtime information, in order to facilitate the subsequent overhaul and maintenance of the system fault, the system downtime processing method provided by the embodiment of the invention may further include:
Reading the system running state information and/or the system asset information from the shared memory space when the operating system processor is identified to write the system running state information signal and/or the system asset information signal in the shared memory space;
and generating a system operation log according to the system operation state information and/or the system asset information.
That is, the operating system processor and the baseboard management controller provided by the embodiment of the invention can be used for interacting system running state information and/or system asset information besides the system downtime information through sharing the path of the information interaction of the storage space, so that the baseboard management controller can generate a system running log. It will be appreciated that the size of the shared memory space is determined based on the size of the data volume of a single interaction of the shared memory space as desired. For example, the shared storage space is utilized to interact with the system downtime information, the system running state information and the system asset information, and different data areas of the shared storage space can be selected to interact with different information so as to balance the use and storage. Meanwhile, in order to ensure the reliability of the system downtime processing method provided by the embodiment of the invention, a standby mechanism can be designed, for example, two shared storage spaces are arranged to form a master and a slave, and an operating system processor can write system downtime information into the two shared storage spaces at the same time so as to record the system downtime information when one path fails.
For S203, the system downtime log may be generated according to the system downtime information by a log script pre-deployed in the baseboard management controller, such as in a crash dump file system for recording the system downtime log. S203: generating a system downtime log according to the system downtime information may include:
When the write signal is identified, accessing a crash dump file system arranged in the electrified erasable programmable read-only memory through a serial peripheral bus to create a crash dump file;
and copying the system downtime information to a crash dump file to obtain a system downtime log.
The crash dump file system may be deployed in a charged eeprom that is connected to the baseboard management controller via a serial peripheral interface bus. After recognizing the write signal of the operating system processor to the shared storage space, the baseboard management controller creates a crash dump file in the crash dump file system, and copies the system downtime information from the shared storage space to write the crash dump file to form a system downtime log.
Further, the method for processing the downtime of the system provided by the embodiment of the invention can further comprise the following steps:
Analyzing a system downtime log to obtain system fault information;
Generating system downtime fault diagnosis information according to the system fault information;
inquiring according to the system downtime fault diagnosis information to obtain corresponding system maintenance suggestions;
Outputting a system downtime log, system downtime fault diagnosis information and system maintenance suggestions.
In a specific implementation, a parser and an analysis tool may be deployed in advance in the baseboard management controller, so as to extract system fault information for diagnosing a system downtime fault from the system downtime log, and analyze and generate system downtime fault diagnosis information. The corresponding relation between the system fault information keywords and the system downtime fault diagnosis information and the corresponding relation between the system downtime fault diagnosis information and the system maintenance advice can be pre-established so as to further obtain the system maintenance advice, and the system downtime log, the system downtime fault diagnosis information and the system maintenance advice are output for reference by operation and maintenance personnel when a system downtime log query command of the operation and maintenance personnel is received. If the system downtime log is a crash dump file, a parser and an analysis tool are designed for the crash dump file so as to extract useful information from the crash dump file and perform targeted fault diagnosis and debugging.
Further, the method for processing the downtime of the system provided by the embodiment of the invention can further comprise the following steps: the baseboard management controller status information is written into the shared memory space such that the operating system processor reads the baseboard management controller status information from the shared memory space upon recognizing that the baseboard management controller is writing signals of baseboard management controller status information in the shared memory space. That is, the system downtime processing method provided by the embodiment of the invention realizes targeted collection of the operating system information or transmission of part of key information back to the operating system through the self-defined protocols between the baseboard management controller and the operating system processor, thereby providing greater flexibility and customization for information transmission between the baseboard management controller and the operating system processor.
The method for processing the system downtime provided by the embodiment of the invention is realized based on a baseboard management controller comprising a built-in static random access memory and a shared storage space between the built-in static random access memory and the baseboard management controller after the built-in static random access memory is connected with an operating system processor of the equipment through a high-level high-performance bus, and specifically comprises the steps of deploying the operating system processor to utilize a basic input output system of the equipment and an intelligent platform management interface command customized by handshaking of the baseboard management controller to establish a storage mapping from the operating system processor to the built-in static random access memory, so that the operating system processor can write system downtime information in the shared storage space through the intelligent platform management interface command, the baseboard management controller reads the system downtime information from the shared storage space after recognizing a writing signal, and generates a system downtime log according to the system downtime information, thereby solving the problem that a server which does not support the unshieldable interrupt cannot collect the system downtime information in time through the unshieldable interrupt, and the potential safety hazards that a server such as a server of an advanced instruction set processor platform cannot collect the system downtime information through the unshieldable interrupt are overcome.
The following describes a third embodiment of the present invention.
On the basis of the embodiment, the embodiment of the invention further describes a method for acquiring the downtime information of the system by the substrate management controller.
In the system downtime processing method provided by the embodiment of the invention, S202: when the operating system processor is identified to write a write signal of the system downtime information in the shared storage space through the intelligent platform management interface command, reading the system downtime information from the shared storage space can comprise:
and polling the shared storage space, and reading the system downtime information from the shared storage space when the system downtime information is polled.
That is, the baseboard management controller polls the shared storage space at regular time, when identifying the system downtime information, such as the asset.json file in the file system/tmp/under the baseboard management controller listed in the second embodiment of the present invention, determines that the operating system processor has written the system downtime information, and reads the system downtime information from the shared storage space.
In order to avoid that the operating system processor does not complete writing of the system downtime information when the baseboard management controller reads the system downtime information, the baseboard management controller reads the system downtime information from the shared storage space when polling the system downtime information, the method may include:
when the system downtime information is polled, determining that a system downtime event occurs;
And after the identification system downtime information obtains the written-out zone bit, determining that the operating system processor finishes writing the system downtime information, and reading the system downtime information from the shared storage space.
The written-in zone bit is appointed by the baseboard management controller and the operating system processor, and the operating system processor writes in the written-in zone bit after the writing of the system downtime information is completed. The baseboard management controller does not copy the system downtime information in the shared storage space when identifying the system downtime information, but waits for copying the system downtime information to a crash dump file system after identifying the written-out zone bit so as to generate a system downtime log.
In order to ensure the reliability of the system, in the system downtime processing method provided by the embodiment of the invention, the generation of the system downtime log according to the system downtime information may include:
generating a first check value according to the system downtime information by using a check algorithm;
Identifying a second check value in the system downtime information;
If the first check value is consistent with the second check value, determining that the system downtime information is completed, and generating a system downtime log according to the system downtime information;
If the first check value is inconsistent with the second check value, determining that the system downtime information is incomplete.
The check algorithm may employ a cyclic redundancy check (Cyclic redundancy check, CRC) algorithm.
The fourth embodiment of the present invention will be described below.
Based on the above embodiments, the embodiments of the present invention provide another method for acquiring system downtime information by using a baseboard management controller.
In the system downtime processing method provided by the embodiment of the invention, S202: when the operating system processor is identified to write a write signal of the system downtime information in the shared storage space through the intelligent platform management interface command, reading the system downtime information from the shared storage space can comprise:
when the system downtime event of the operating system processor is monitored through the heartbeat signal, the system downtime information is read from the shared storage space.
That is, the baseboard management controller performs state interaction with the operating system processor through a heartbeat mechanism, when the system downtime event of the operating system processor is monitored through a heartbeat signal, the fact that the operating system downtime event occurs is determined, and system downtime information is read from the shared storage space.
The heartbeat mechanism between the baseboard management controller and the operating system processor can be used for sending heartbeat signals at fixed time to determine the connection state between the baseboard management controller and the operating system processor, and can also be used for transmitting information through the heartbeat signals to transmit state information. The baseboard management controller monitors that the operating system processor generates a system downtime event through the heartbeat signal, which may include:
And determining that the operating system processor generates a system downtime event when the heartbeat signal sent by the operating system processor is not received and/or when the heartbeat signal carrying the system downtime alarm information sent by the operating system processor is received beyond the heartbeat interval time of the operating system.
That is, a heartbeat signal can be sent in a heartbeat interval time of each operating system between the baseboard management controller and the operating system processor, if the heartbeat signal is monitored, the baseboard management controller and the operating system processor are determined to be normally connected and no system downtime event occurs, if the heartbeat signal is not monitored after timeout, the baseboard management controller and the operating system processor are determined to be abnormally connected, the system downtime event is considered to occur, and the system downtime information is read from the shared storage space. Or the heartbeat signal transmission system downtime alarm information can be appointed between the baseboard management controller and the operating system processor, the operating system processor can write the system downtime alarm information into the heartbeat signal and transmit the system downtime alarm information to the baseboard management controller when the system downtime event is predicted to be about to happen, and the baseboard management controller can read the system downtime alarm information from the shared storage space when recognizing the system downtime alarm information in the heartbeat signal.
The fifth embodiment of the present invention will be described below.
Based on the above embodiments, the embodiments of the present invention provide a third method for acquiring system downtime information by using a baseboard management controller.
In the system downtime processing method provided by the embodiment of the invention, S202: when the operating system processor is identified to write a write signal of the system downtime information in the shared storage space through the intelligent platform management interface command, reading the system downtime information from the shared storage space can comprise:
Polling a system downtime detection pin agreed by an operating system processor;
When the system downtime detection pin is monitored to generate state change, the write signal is determined to be received, and the system downtime information is read from the shared storage space.
In a specific implementation, the baseboard management controller and the operating system processor may agree that the operating system processor controls the agreed pins of the baseboard management controller to change in state to inform the baseboard management controller of the occurrence of a system downtime event. For example, the operating system processor pulls down a system downtime detection pin of the substrate management controller when the operating system is down, the substrate management controller periodically polls the system downtime detection pin, and when the system downtime detection pin is identified to be low level, the system downtime event is determined to occur, and system downtime information is read to the shared storage space, or vice versa.
Or in order for the baseboard management controller to timely learn the downtime event of the system, S202: when the operating system processor is identified to write a write signal of the system downtime information in the shared storage space through the intelligent platform management interface command, the system downtime information is read from the shared storage space, and the method can further comprise the following steps:
When an interrupt signal of the baseboard management controller is triggered when the operating system processor writes system downtime information in the shared storage space is received, the system downtime information is read from the shared storage space.
That is, the os processor may trigger an interrupt through a General-purpose input/output (GPIO) pin after writing the system downtime information into the shared memory space to inform the baseboard management controller to read the system downtime information into the shared memory space.
The sixth embodiment of the present invention will be described.
Fig. 3 is a flowchart of a second system downtime processing method according to an embodiment of the present invention.
Based on the above embodiments, the embodiment of the present invention further provides a system downtime processing method for implementation.
As shown in fig. 3, in the system downtime processing method provided by the embodiment of the present invention, S201: deploying an operating system processor to handshake a customized intelligent platform management interface command with a basic input output system and a baseboard management controller of a device to establish a memory mapping from the operating system processor to a built-in static random access memory to obtain a shared memory space between the operating system processor and the baseboard management controller may include:
S301: and determining the address range of the required shared storage space according to the data size of the system downtime information which is required to be recorded at a time.
S302: the baseboard management controller is checked for access to the built-in static random access memory.
S303: if the baseboard management controller has access to the built-in static random access memory, the drive of the built-in static random access memory is started.
S304: if the drive of the built-in static random access memory is successfully started, initializing the memory mapping to finish the initialization of the intelligent platform management interface command.
The memory map includes a map of an address range of the shared memory space to a physical memory address of the operating system processor and a map of an address range of the shared memory space to a physical memory address of the baseboard management controller.
S202: when the operating system processor is identified to write a write signal of the system downtime information in the shared storage space through the intelligent platform management interface command, the system downtime information is read from the shared storage space, and the method comprises the following steps:
S305: the write signal is monitored by at least one of polling data in the shared memory space, polling a system downtime detection pin agreed by the operating system processor, and monitoring a system downtime event of the operating system processor by a heartbeat signal.
S306: and determining that the operating system processor generates a system downtime event when at least one condition of polling the system downtime information in the shared storage space, monitoring that a state change occurs in a system downtime detection pin, not receiving a heartbeat signal sent by the operating system processor beyond the heartbeat interval time of the operating system, and receiving a heartbeat signal carrying system downtime alarm information sent by the operating system processor is met.
S203: generating a system downtime log according to the system downtime information may include:
s307: when the write signal is identified, a crash dump file is created by accessing a crash dump file system provided in the charged erasable programmable read-only memory through the serial peripheral bus.
S308: and copying the system downtime information to a crash dump file to obtain a system downtime log.
In practical applications, the baseboard management controller is exemplified as having a built-in sram, which is Aspeed 2600 chips, which provides three blocks of memory spaces of 64KB, 24KB and 1KB, respectively. The appropriate block size is determined as shared memory space based on conventional log collection under in-band systems. For example, in practical applications, the size of the key information dmesg file in the operating system ranges from 10KB to 17KB, and is about 15KB on average, so that the block storage space 2 of the internal sram with the size of 24KB can be selected as the shared storage space. And then, according to the requirements of collecting system downtime information, system operation information and the like, the size of the shared storage space can be adjusted in real time according to the size of the files to be collected. As much system operation data as possible should be collected in order to evaluate server operation in more detail for fault diagnosis.
The built-in static random access memory belongs to an internal functional module of the baseboard management controller, so that the module needs the baseboard management controller to perform initialization operation, and the content comprises mapping processing between a physical address and an actual memory, and the size of a used space and a starting address are defined; after initialization, the part of the built-in static random access memory can be read and written, and related realization logic codes are referred as follows:
Built-in static random access memory related definition:
The chunk memory SIZE of the built-in sram used is defined by #define h2b_mem_size (1×1024×1024)// definition.
# DEFINE AST2600_sram_memory_size (64×1024)// block MEMORY SIZE of 64KB is selected as shared MEMORY.
# DEFINE AST _sram_memory_address (0 x 10000000)// start ADDRESS of built-in SRAM.
The/check is made as to whether the baseboard management controller has access to the built-in static random access memory.
if (access(SRAM_ENABLE_FLAG_FILE,F_OK)==0)
{
The built-in sram driver is turned on.
SRAMDeviceFd=sigwrap_open(SRAM_MEM_DEV_NAME, O_RDWR|O_SYNC);
if(SRAMDeviceFd<0)
{
The/process turns on the built-in sram drive failure.
return NULL;
}
And/initialize the memory map.
SRAMHandler=mmap(0,AST2600_SRAM_MEMORY_SIZE, PROT_READ|PROT_WRITE,MAP_SHARED,SRAMDeviceFd,AST2600_SRAM_MEMORY_ADDRESS);
if (MAP_FAILED==SRAMHandler){
SRAMHandler=NULL;
sigwrap_close(SRAMDeviceFd);
SRAMDeviceFd=-1;
return NULL;
}
}
The above code segments are related operations with respect to initializing a built-in static random access memory. First, the size of the block memory space of the built-in static random access memory and the size and start address of the block memory space 3 (64 KB size) of the built-in static random access memory are defined by macro definition. Then, after judging whether the baseboard management controller has access right to the built-in static random access memory, opening the built-in static random access memory driver and carrying out the initialization operation of memory mapping.
Aiming at the system downtime information transmitted by the basic input and output system stored in the built-in static random access memory and the data of the built-in static random access memory is acquired by the baseboard management controller after the control right in the step S205 is switched, the related code implementation is referred as follows:
reading asset information from the built-in static random access memory:
int ReadAssetFromSRAM(int BMCInst)
{
length= ntohl (SRAMHANDLER- > len)// the data length read matches the defined built-in sram block size.
if(length==0)
{
return READ_SRAM_LEN_ERR;
}
if (length>BIOS_ASSET_SIZE)
{
TMAINTENCE(LOG_INFO,"Failed to read Asset from SRAM.Invalid length:%d.",length);
Return READ SRAM lenerr/log processing for data transferred greater than a predetermined parameter size (to ensure the integrity of the READ system downtime information).
}
crcData=ntohl(sramHandler->crc);
Data is replicated from the shared memory space, preventing the bios from being rewritten into the region upon restart.
data=malloc(sizeof(BIOSConfAssetInfo_T)+length+1);
memset(data,0,(sizeof(BIOSConfAssetInfo_T)+length+1));
memcpy(data,(unsignedchar*)sramHandler, (sizeof(BIOSConfAssetInfo_T)+length));
And (5) calculating a 32-bit check value of the cyclic redundancy check of the acquired data to confirm the validity of the system downtime information data.
checksum=CalculateCRC32(data+sizeof(BIOSConfAssetInfo_T), length);
return SUCCESS;
}
unsigned long CalculateCRC32(unsigned char*Buffer,unsigned long Size) {
unsigned long i, crc32=0xFFFFFFFF;
for(i=0;i<Size;i++) {
crc32=((crc32)>>8)^CrcLookUpTable[(Buffer[i])^((crc32)&(0x000000FF))];
}
return~crc32;
}
The code realizes the reading mechanism of the baseboard management controller to the stored content in the built-in static random access memory. First, the baseboard management controller acquires access rights to the built-in sram through the previous operation. Then, the baseboard management controller reads data from the built-in static random access memory by calling the written read interface. After the data is read, the code firstly judges whether the length of the data is reasonable or not, and if the length is abnormal, an error is returned. Next, the read data is subjected to a cyclic redundancy check, 32-bit check value data check, which is a check method for monitoring whether the data is damaged. The cyclic redundancy check 32-bit check value (CRC 32) is an error monitoring code for monitoring whether data is corrupted during data transmission. When transmitting data, a short checksum is generated and transmitted with the data. When receiving data, the checksum is generated again and compared with the transmitted checksum, and if the checksum and the transmitted checksum are equal, the data can be judged not to be damaged.
According to the third embodiment of the invention, whether the system downtime information read by the substrate management controller from the shared storage space is complete and reliable is determined by comparing and calculating whether the first check value of the system downtime information is consistent with the second check value carried by the system downtime information.
The process ensures that the baseboard management controller can safely and effectively read the stored data from the built-in static random access memory, and ensures the integrity and reliability of the data through a verification mechanism.
As described in the second embodiment of the present invention, the operation of generating the system downtime log may be established through the crash dump file in the baseboard management controller, which includes the mechanism of monitoring the system downtime event by the baseboard management controller as in the above embodiments of the present invention, periodically or in real time monitoring the system status, the hardware running condition and the key index, and triggering the exception handling procedure, for example, a thread of the baseboard management controller polls the system downtime information file in the shared storage space as a signal for capturing the exception.
When a system downtime event occurs, the baseboard management controller starts a crash dump file generation flow, including recording key data such as the current system state, register information, stack information, operation log and the like; a crash dump file containing the above information is generated and saved in binary or other suitable format.
The baseboard management controller stores the generated crash dump file in a suitable location, such as a local storage device, a remote server, or the like. The storage strategy can be set according to the needs, including parameters such as file naming, file segmentation, storage path and the like.
By designing a crash dump file parser at the baseboard management controller to extract key data from the crash dump file, the parser can be used to perform: opening a crash dump file and reading binary data in the crash dump file; and analyzing key information such as a register, a stack, a running log and the like according to the file format and the structure.
The system downtime diagnosis information is generated according to the system failure information obtained by analyzing the system downtime log by designing a data analysis and failure diagnosis tool in the baseboard management controller. The data analysis and fault diagnosis tool can display the data extracted from the system downtime log in a readable form, such as displaying the values of registers, stack tracking, running logs and the like; performing fault diagnosis according to the analyzed system fault information, and attempting to locate the cause of system fault; and inquiring the obtained system downtime fault diagnosis information to obtain a system maintenance suggestion, and providing the system downtime log, the system downtime fault diagnosis information and the system maintenance suggestion for operation and maintenance personnel to further investigate and process.
The above steps can be realized by the following codes:
# include < stdio.h >// function declaration:
int detect_abnormal_condition();
void generate_crashdump();
void store_crashdump();
void collect_critical_data();
void create_crashdump_file();
void parse_crashdump_file();
void analyze_crashdump();
Monitoring system status and handling exceptions:
void monitor_system() {
while(1) {
if(detect_abnormal_condition())
{
generate_crashdump();
store_crashdump();
}
}
}
the// baseboard management controller defines a crash dump file system:
void generate_crashdump() {
collect_critical_data();
create_crashdump_file();
}
the// baseboard management controller reads system downtime information from the shared storage space:
void collect_critical_data() {
the/(system downtime information may include registers, stacks, running logs, etc.
}
The// baseboard management controller creates a crash dump file:
void create_crashdump_file() {
And (5) writing the downtime information of the system into a crash dump file.
}
The crash dump file is/is parsed:
void parse_crashdump_file() {
And (5) analyzing the crash dump file and extracting system fault information.
}
System failure information is/is analyzed:
void analyze_crashdump() {
Information such as register values, stack trace, running log, etc. is shown/presented.
And generating system downtime fault diagnosis information according to the system fault information.
And obtaining corresponding system maintenance suggestions according to the query of the system downtime fault diagnosis information, and outputting a system downtime log, the system downtime fault diagnosis information and the system maintenance suggestions.
}
A// primary function.
int main() {
monitor_system();
return 0。
According to the system downtime processing method provided by the embodiment of the invention, the built-in static random access memory of the baseboard management controller is used as the shared storage space of the baseboard management controller and the operating system processor, so that the characteristic of the built-in static random access memory of the baseboard management controller is fully utilized, an efficient system downtime information acquisition method is realized, and the problem that part of servers which do not support non-maskable interrupt cannot acquire operating system operation data when the system is downtime is solved. In addition to the embodiments described herein that utilize the built-in sram of the bmc as a shared memory space between the bmc and the os processor, other bmc embodiments with built-in sram may be based on such embodiments as Aspeed 2700 chips, ham chips, etc. If the method is applied to other baseboard management controllers without built-in static random access memory, the shared memory is selected to divide the shared memory space of the baseboard management controller and the operating system processor outside the baseboard management controller, and the method is not limited to different operating system processor architectures or specific baseboard management controller platforms.
The seventh embodiment of the present invention will be described.
Fig. 4 is a flowchart of a third system downtime processing method according to an embodiment of the present invention.
Based on the above embodiments, the embodiment of the present invention further provides a system downtime processing method for implementation.
As shown in fig. 4, the method for processing the downtime of the system provided by the embodiment of the invention includes:
s401: the operating system processor is powered on and proceeds to S402.
S402: the operating system processor judges whether the running state of the operating system is abnormal or not; if yes, go to S403; if not, S402 is entered.
S403: the operating system processor creates a system downtime information file in the shared storage space through the intelligent platform management interface command, and writes the system downtime information into the system downtime information file.
S404: the baseboard management controller starts up and proceeds to S405.
S405: the baseboard management controller judges whether a system downtime information file exists in the shared storage space; if yes, go to S406; if not, S405 is entered.
S406: the baseboard management controller reads system downtime information from the shared storage space.
S407: and the baseboard management controller generates a system downtime log according to the system downtime information.
Wherein S404-S407 and S401-S403 have no sequential relationship.
The specific implementation manner of the embodiment of the present invention may refer to the steps executed by the baseboard management controller and the steps executed by the operating system processor in the system downtime processing method described in the foregoing embodiments, which are not described herein again.
The invention further discloses a system downtime processing device, equipment and a storage medium corresponding to the method.
The eighth embodiment of the present invention will be described.
Fig. 5 is a schematic structural diagram of a system downtime treatment apparatus according to an embodiment of the present invention.
As shown in fig. 5, the system downtime processing apparatus provided in the embodiment of the present invention includes:
A deployment unit 501, configured to deploy a storage mapping from the os processor to the internal sram by using a bios of the device and a baseboard management controller handshake customized smart platform management interface command, so as to obtain a shared storage space between the os processor and the baseboard management controller;
The first monitoring unit 502 is configured to read, when it is identified that the operating system processor writes a write signal of system downtime information in the shared storage space through the intelligent platform management interface command, the system downtime information from the shared storage space;
the first generation unit 503 is configured to generate a system downtime log according to the system downtime information.
In some implementations, the first generation unit 503 generates a system downtime log according to the system downtime information, including:
When the write signal is identified, accessing a crash dump file system arranged in the electrified erasable programmable read-only memory through a serial peripheral bus to create a crash dump file;
and copying the system downtime information to a crash dump file to obtain a system downtime log.
In some implementations, the deployment unit 501 deploys the os processor to handshake customized intelligent platform management interface commands with the bios and baseboard management controller of the device to establish a memory mapping of the os processor to the built-in sram, resulting in a shared memory space between the os processor and baseboard management controller, including:
Determining the address range of the required shared storage space according to the data size of the system downtime information to be recorded at a time;
initializing the intelligent platform management interface command to map the address range of the shared memory space to the physical memory address of the operating system processor and the physical memory address of the baseboard management controller, so that the operating system processor can access the shared memory space by using the basic input output system to send the intelligent platform management interface command.
In some implementations, the first monitoring unit 502 reads the system downtime information from the shared memory space when it recognizes that the operating system processor writes a write signal of the system downtime information in the shared memory space through the intelligent platform management interface command, including:
and polling the shared storage space, and reading the system downtime information from the shared storage space when the system downtime information is polled.
In some implementations, the first monitoring unit 502, when it polls the system downtime information, reads the system downtime information from the shared storage space, including:
when the system downtime information is polled, determining that a system downtime event occurs;
And after the identification system downtime information obtains the written-out zone bit, determining that the operating system processor finishes writing the system downtime information, and reading the system downtime information from the shared storage space.
In some implementations, the first generation unit 503 generates a system downtime log according to the system downtime information, including:
generating a first check value according to the system downtime information by using a check algorithm;
Identifying a second check value in the system downtime information;
If the first check value is consistent with the second check value, determining that the system downtime information is completed, and generating a system downtime log according to the system downtime information;
If the first check value is inconsistent with the second check value, determining that the system downtime information is incomplete.
In some implementations, the first monitoring unit 502 reads the system downtime information from the shared memory space when it recognizes that the operating system processor writes a write signal of the system downtime information in the shared memory space through the intelligent platform management interface command, including:
when the system downtime event of the operating system processor is monitored through the heartbeat signal, the system downtime information is read from the shared storage space.
In some implementations, the first monitoring unit 502 monitors the operating system processor for a system downtime event via a heartbeat signal, including:
And determining that the operating system processor generates a system downtime event when the heartbeat signal sent by the operating system processor is not received and/or when the heartbeat signal carrying the system downtime alarm information sent by the operating system processor is received beyond the heartbeat interval time of the operating system.
In some implementations, the first monitoring unit 502 reads the system downtime information from the shared memory space when it recognizes that the operating system processor writes a write signal of the system downtime information in the shared memory space through the intelligent platform management interface command, including:
Polling a system downtime detection pin agreed by an operating system processor;
When the system downtime detection pin is monitored to generate state change, the write signal is determined to be received, and the system downtime information is read from the shared storage space.
In some implementations, the system downtime processing apparatus provided by the embodiment of the present invention further includes:
the analysis unit is used for analyzing the system downtime log to obtain system fault information;
The second generation unit is used for generating system downtime fault diagnosis information according to the system fault information;
The inquiring unit is used for inquiring and obtaining corresponding system maintenance suggestions according to the system downtime fault diagnosis information;
and the output unit is used for outputting the system downtime log, the system downtime fault diagnosis information and the system maintenance advice.
In some implementations, the system downtime processing apparatus provided by the embodiment of the present invention further includes:
a recognition unit for reading the system operation state information and/or the system asset information from the shared memory space when recognizing the signal of the operating system processor writing the system operation state information and/or the signal of the system asset information in the shared memory space;
and the third generating unit is used for generating a system operation log according to the system operation state information and/or the system asset information.
In some implementations, the deployment unit 501 deploys the os processor to handshake customized intelligent platform management interface commands with the bios and baseboard management controller of the device to establish a memory mapping of the os processor to the built-in sram, resulting in a shared memory space between the os processor and baseboard management controller, including:
Determining the address range of the required shared storage space according to the data size of the system downtime information to be recorded at a time;
checking the access right of the baseboard management controller to the built-in static random access memory;
If the baseboard management controller has access right to the built-in static random access memory, starting the drive of the built-in static random access memory;
If the drive of the built-in static random access memory is successfully started, initializing the memory mapping to finish the initialization of the intelligent platform management interface command; the memory map includes a map of an address range of the shared memory space to a physical memory address of the operating system processor and a map of an address range of the shared memory space to a physical memory address of the baseboard management controller;
The first monitoring unit 502 reads the system downtime information from the shared storage space when recognizing that the operating system processor writes a write signal of the system downtime information in the shared storage space through the intelligent platform management interface command, including:
monitoring a write signal by at least one mode of polling data of a shared storage space, polling a system downtime detection pin agreed by an operating system processor and monitoring a system downtime event of the operating system processor through a heartbeat signal;
When at least one condition of polling the system downtime information in the shared storage space, monitoring that a state change occurs in a system downtime detection pin, not receiving a heartbeat signal sent by an operating system processor beyond the heartbeat interval time of the operating system, and receiving a heartbeat signal carrying system downtime alarm information sent by the operating system processor is met, determining that the operating system processor has a system downtime event, and reading the system downtime information from the shared storage space;
The first generating unit 503 generates a system downtime log according to the system downtime information, including:
When the write signal is identified, accessing a crash dump file system arranged in the electrified erasable programmable read-only memory through a serial peripheral bus to create a crash dump file;
and copying the system downtime information to a crash dump file to obtain a system downtime log.
Since the embodiments of the apparatus portion and the embodiments of the method portion correspond to each other, the embodiments of the apparatus portion are referred to the description of the embodiments of the method portion, and are not repeated herein.
The following describes an embodiment nine of the present invention.
Fig. 6 is a schematic structural diagram of a system downtime treatment apparatus according to an embodiment of the present invention.
As shown in fig. 6, the system downtime processing device provided by the embodiment of the present invention includes:
a memory 610 for storing a computer program 611;
A processor 620, configured to execute a computer program 611, where the computer program 611 implements the steps of the system downtime processing method according to any one of the foregoing embodiments when executed by the processor 620.
Processor 620 may include one or more processing cores, such as a 3-core processor, an 8-core processor, etc., among others. The processor 620 may be implemented in at least one hardware form of a digital signal Processing DSP (DIGITAL SIGNAL Processing), a Field-Programmable gate array FPGA (Field-Programmable GATE ARRAY), or a Programmable logic array PLA (Programmable Logic Array). Processor 620 may also include a main processor, which is a processor for processing data in an awake state, also referred to as central processor CPU (Central Processing Unit), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 620 may be integrated with an image processor GPU (Graphics Processing Unit), a GPU for use in responsible for rendering and rendering of the content that is to be displayed by the display screen. In some embodiments, the processor 620 may also include an artificial intelligence AI (Artificial Intelligence) processor for processing computing operations related to machine learning.
The memory 610 may include one or more storage media, which may be non-transitory. Memory 610 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory 610 is at least configured to store a computer program 611, where the computer program 611, after being loaded and executed by the processor 620, can implement relevant steps in the system downtime processing method disclosed in any of the foregoing embodiments. In addition, the resources stored by the memory 610 may also include an operating system 612, data 613, and the like, and the storage manner may be transient storage or permanent storage. The operating system 612 may be Windows. The data 613 may include, but is not limited to, data related to the above-described method.
In some embodiments, the system downtime processing equipment may also include a display 630, a power supply 640, a communication interface 650, an input output interface 660, sensors 670, and a communication bus 680.
Those skilled in the art will appreciate that the configuration shown in FIG. 6 is not limiting of the system downtime treatment equipment and may include more or fewer components than shown.
The system downtime processing equipment provided by the embodiment of the invention comprises the memory and the processor, wherein the processor can realize the system downtime processing method when executing the program stored in the memory, and the effects are the same as above.
The following describes embodiments of the present invention.
It should be noted that the apparatus and device embodiments described above are merely exemplary, and for example, the division of modules is merely a logic function division, and there may be other division manners in actual implementation, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms. The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules.
The integrated modules may be stored in a storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium for performing all or part of the steps of the method according to the embodiments of the present invention.
Therefore, the embodiment of the invention also provides a storage medium, and the storage medium stores a computer program which realizes the steps of the system downtime processing method when being executed by a processor.
The storage medium may include: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory RAM (Random Access Memory), a magnetic disk, an optical disk, or other various storage media capable of storing program codes.
The computer program included in the storage medium provided in this embodiment can implement the steps of the system downtime processing method as described above when being executed by the processor, and the effects are the same as above.
The method, the device, the equipment, the storage medium and the server for processing the system downtime provided by the invention are described in detail. In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. The apparatus, device, storage medium and server disclosed in the embodiments correspond to the methods disclosed in the embodiments, so that the description is simpler, and the relevant points refer to the description of the method section. It should be noted that it will be apparent to those skilled in the art that the present invention may be modified and practiced without departing from the spirit of the present invention.
It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims (13)

1. The system downtime processing method is characterized by being applied to a baseboard management controller comprising a built-in static random access memory, wherein the built-in static random access memory is connected with an operating system processor of the equipment through a high-level high-performance bus, and the system downtime processing method comprises the following steps:
deploying the operating system processor to establish a memory mapping from the operating system processor to the built-in static random access memory by utilizing a basic input output system of the equipment and an intelligent platform management interface command which is handshake customized by the baseboard management controller, so as to obtain a shared memory space between the operating system processor and the baseboard management controller;
When the operating system processor is identified to write a write signal of system downtime information in the shared storage space through the intelligent platform management interface command, the system downtime information is read from the shared storage space;
generating a system downtime log according to the system downtime information;
The system downtime information is information written into the shared storage space when the operating system processor is in downtime of an operating system, and comprises the system downtime information written into the shared storage space when the operating system processor is monitored to run abnormally after the equipment is started, or the system downtime information written into the shared storage space when the operating system processor is monitored to run abnormally in the running state of the equipment;
The system downtime information comprises system state information, register information, stack information and operation logs when an operating system is downtime, comprises information of a central processing unit, a hard disk and a memory of each component when the operating system is downtime, and also comprises firmware versions corresponding to each component;
The deploying the operating system processor uses a basic input output system of the device and the baseboard management controller to handshake customized intelligent platform management interface commands to establish a memory mapping from the operating system processor to the built-in static random access memory, so as to obtain a shared memory space between the operating system processor and the baseboard management controller, including:
determining the address range of the required shared storage space according to the data size of the system downtime information to be recorded at a time;
Checking the access right of the baseboard management controller to the built-in static random access memory;
if the baseboard management controller has access right to the built-in static random access memory, starting the drive of the built-in static random access memory;
if the drive of the built-in static random access memory is successfully started, initializing memory mapping to finish the initialization of the intelligent platform management interface command; the memory map includes a map of an address range of the shared memory space to a physical memory address of the operating system processor and a map of an address range of the shared memory space to a physical memory address of the baseboard management controller;
And when the operating system processor is identified to write a write signal of the system downtime information in the shared storage space through the intelligent platform management interface command, reading the system downtime information from the shared storage space, wherein the method comprises the following steps of:
The write signal is monitored by at least one mode of polling the data of the shared storage space, polling a system downtime detection pin agreed with the operating system processor and monitoring a system downtime event of the operating system processor through a heartbeat signal;
when at least one condition of polling the system downtime information in the shared storage space, monitoring that the state change of the system downtime detection pin occurs, not receiving a heartbeat signal sent by the operating system processor beyond the heartbeat interval time of an operating system, and receiving a heartbeat signal carrying system downtime alarm information sent by the operating system processor is met, determining that a system downtime event occurs in the operating system processor, and reading the system downtime information from the shared storage space;
the generating a system downtime log according to the system downtime information comprises the following steps:
when the write signal is identified, accessing a crash dump file system arranged in the electrified erasable programmable read-only memory through a serial peripheral bus to create a crash dump file;
and copying the system downtime information to the crash dump file to obtain the system downtime log.
2. The system downtime processing method of claim 1, wherein the reading the system downtime information from the shared memory upon identifying the operating system processor to write a write signal of the system downtime information in the shared memory via the intelligent platform management interface command comprises:
and polling the shared storage space, and reading the system downtime information from the shared storage space when the system downtime information is polled.
3. The system downtime treatment method of claim 2, wherein the reading the system downtime information from the shared memory space when the system downtime information is polled comprises:
when the system downtime information is polled, determining that a system downtime event occurs;
And after the system downtime information is identified to obtain the written-out zone bit, determining that the operating system processor has completed writing the system downtime information, and reading the system downtime information from the shared storage space.
4. The system downtime treatment method of claim 3, wherein the generating a system downtime log according to the system downtime information comprises:
generating a first check value according to the system downtime information by using a check algorithm;
Identifying a second check value in the system downtime information;
if the first check value is consistent with the second check value, determining that the system downtime information is completed, and generating the system downtime log according to the system downtime information;
And if the first check value is inconsistent with the second check value, determining that the system downtime information is incomplete.
5. The system downtime processing method of claim 1, wherein the reading the system downtime information from the shared memory upon identifying the operating system processor to write a write signal of the system downtime information in the shared memory via the intelligent platform management interface command comprises:
And when the system downtime event of the operating system processor is monitored through the heartbeat signal, reading the system downtime information from the shared storage space.
6. The system downtime treatment method of claim 5, wherein the monitoring of the operating system processor for a system downtime event via a heartbeat signal comprises:
And determining that a system downtime event occurs to the operating system processor when the heartbeat signal sent by the operating system processor is not received and/or when the heartbeat signal carrying the system downtime alarm information sent by the operating system processor is received beyond the heartbeat interval time of the operating system.
7. The system downtime processing method of claim 1, wherein the reading the system downtime information from the shared memory upon identifying the operating system processor to write a write signal of the system downtime information in the shared memory via the intelligent platform management interface command comprises:
polling a system downtime detection pin appointed by the operating system processor;
And when the state change of the system downtime detection pin is monitored, determining that the write signal is received, and reading the system downtime information from the shared storage space.
8. The system downtime treatment method of claim 1, further comprising:
Analyzing the system downtime log to obtain system fault information;
Generating system downtime fault diagnosis information according to the system fault information;
inquiring according to the system downtime fault diagnosis information to obtain corresponding system maintenance suggestions;
and outputting the system downtime log, the system downtime fault diagnosis information and the system maintenance advice.
9. The system downtime treatment method of claim 1, further comprising:
Reading the system running state information and/or the system asset information from the shared memory space when the operating system processor is identified to write the system running state information and/or the system asset information in the shared memory space;
And generating a system operation log according to the system operation state information and/or the system asset information.
10. A server, comprising: the system comprises a baseboard management controller and an operating system processor, wherein a built-in static random access memory of the baseboard management controller is connected with the operating system processor through a high-level high-performance bus;
The baseboard management controller is used for deploying the operating system processor to handshake the customized intelligent platform management interface command by using the basic input and output system of the equipment and the baseboard management controller so as to establish the memory mapping from the operating system processor to the built-in static random access memory and obtain the shared memory space between the operating system processor and the baseboard management controller; when the operating system processor is identified to write a write signal of system downtime information in the shared storage space through the intelligent platform management interface command, the system downtime information is read from the shared storage space; generating a system downtime log according to the system downtime information;
The system downtime information is information written into the shared storage space when the operating system processor is in downtime of an operating system, and comprises the system downtime information written into the shared storage space when the operating system processor is monitored to run abnormally after the equipment is started, or the system downtime information written into the shared storage space when the operating system processor is monitored to run abnormally in the running state of the equipment;
The system downtime information comprises system state information, register information, stack information and operation logs when an operating system is downtime, comprises information of a central processing unit, a hard disk and a memory of each component when the operating system is downtime, and also comprises firmware versions corresponding to each component;
The deploying the operating system processor uses a basic input output system of the device and the baseboard management controller to handshake customized intelligent platform management interface commands to establish a memory mapping from the operating system processor to the built-in static random access memory, so as to obtain a shared memory space between the operating system processor and the baseboard management controller, including:
determining the address range of the required shared storage space according to the data size of the system downtime information to be recorded at a time;
Checking the access right of the baseboard management controller to the built-in static random access memory;
if the baseboard management controller has access right to the built-in static random access memory, starting the drive of the built-in static random access memory;
if the drive of the built-in static random access memory is successfully started, initializing memory mapping to finish the initialization of the intelligent platform management interface command; the memory map includes a map of an address range of the shared memory space to a physical memory address of the operating system processor and a map of an address range of the shared memory space to a physical memory address of the baseboard management controller;
And when the operating system processor is identified to write a write signal of the system downtime information in the shared storage space through the intelligent platform management interface command, reading the system downtime information from the shared storage space, wherein the method comprises the following steps of:
The write signal is monitored by at least one mode of polling the data of the shared storage space, polling a system downtime detection pin agreed with the operating system processor and monitoring a system downtime event of the operating system processor through a heartbeat signal;
when at least one condition of polling the system downtime information in the shared storage space, monitoring that the state change of the system downtime detection pin occurs, not receiving a heartbeat signal sent by the operating system processor beyond the heartbeat interval time of an operating system, and receiving a heartbeat signal carrying system downtime alarm information sent by the operating system processor is met, determining that a system downtime event occurs in the operating system processor, and reading the system downtime information from the shared storage space;
the generating a system downtime log according to the system downtime information comprises the following steps:
when the write signal is identified, accessing a crash dump file system arranged in the electrified erasable programmable read-only memory through a serial peripheral bus to create a crash dump file;
and copying the system downtime information to the crash dump file to obtain the system downtime log.
11. The system downtime processing device is characterized by being applied to a baseboard management controller comprising a built-in static random access memory, wherein the built-in static random access memory is connected with an operating system processor of a device through a high-level high-performance bus, and the system downtime processing device comprises:
The deployment unit is used for deploying the operating system processor to establish a memory mapping from the operating system processor to the built-in static random access memory by utilizing the basic input and output system of the equipment and the intelligent platform management interface command which is customized by handshaking of the baseboard management controller, so as to obtain a shared memory space between the operating system processor and the baseboard management controller;
The first monitoring unit is used for reading the system downtime information from the shared storage space when the operating system processor is identified to write a writing signal of the system downtime information in the shared storage space through the intelligent platform management interface command;
The first generation unit is used for generating a system downtime log according to the system downtime information;
The system downtime information is information written into the shared storage space when the operating system processor is in downtime of an operating system, and comprises the system downtime information written into the shared storage space when the operating system processor is monitored to run abnormally after the equipment is started, or the system downtime information written into the shared storage space when the operating system processor is monitored to run abnormally in the running state of the equipment;
The system downtime information comprises system state information, register information, stack information and operation logs when an operating system is downtime, comprises information of a central processing unit, a hard disk and a memory of each component when the operating system is downtime, and also comprises firmware versions corresponding to each component;
The deploying the operating system processor uses a basic input output system of the device and the baseboard management controller to handshake customized intelligent platform management interface commands to establish a memory mapping from the operating system processor to the built-in static random access memory, so as to obtain a shared memory space between the operating system processor and the baseboard management controller, including:
determining the address range of the required shared storage space according to the data size of the system downtime information to be recorded at a time;
Checking the access right of the baseboard management controller to the built-in static random access memory;
if the baseboard management controller has access right to the built-in static random access memory, starting the drive of the built-in static random access memory;
if the drive of the built-in static random access memory is successfully started, initializing memory mapping to finish the initialization of the intelligent platform management interface command; the memory map includes a map of an address range of the shared memory space to a physical memory address of the operating system processor and a map of an address range of the shared memory space to a physical memory address of the baseboard management controller;
And when the operating system processor is identified to write a write signal of the system downtime information in the shared storage space through the intelligent platform management interface command, reading the system downtime information from the shared storage space, wherein the method comprises the following steps of:
The write signal is monitored by at least one mode of polling the data of the shared storage space, polling a system downtime detection pin agreed with the operating system processor and monitoring a system downtime event of the operating system processor through a heartbeat signal;
when at least one condition of polling the system downtime information in the shared storage space, monitoring that the state change of the system downtime detection pin occurs, not receiving a heartbeat signal sent by the operating system processor beyond the heartbeat interval time of an operating system, and receiving a heartbeat signal carrying system downtime alarm information sent by the operating system processor is met, determining that a system downtime event occurs in the operating system processor, and reading the system downtime information from the shared storage space;
the generating a system downtime log according to the system downtime information comprises the following steps:
when the write signal is identified, accessing a crash dump file system arranged in the electrified erasable programmable read-only memory through a serial peripheral bus to create a crash dump file;
and copying the system downtime information to the crash dump file to obtain the system downtime log.
12. A system downtime treatment apparatus, comprising:
A memory for storing a computer program;
a processor for executing the computer program, which when executed by the processor implements the steps of the system downtime treatment method according to any one of claims 1 to 9.
13. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, performs the steps of the system downtime treatment method of any one of claims 1 to 9.
CN202410269824.2A 2024-03-11 System downtime processing method, device, equipment, storage medium and server Active CN117873771B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410269824.2A CN117873771B (en) 2024-03-11 System downtime processing method, device, equipment, storage medium and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410269824.2A CN117873771B (en) 2024-03-11 System downtime processing method, device, equipment, storage medium and server

Publications (2)

Publication Number Publication Date
CN117873771A CN117873771A (en) 2024-04-12
CN117873771B true CN117873771B (en) 2024-06-07

Family

ID=

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10489232B1 (en) * 2015-09-29 2019-11-26 Amazon Technologies, Inc. Data center diagnostic information
CN110609778A (en) * 2019-08-16 2019-12-24 苏州浪潮智能科技有限公司 Method and system for storing server downtime log
CN113742120A (en) * 2021-08-06 2021-12-03 苏州浪潮智能科技有限公司 Method, system, device and medium for kdump triggering
CN114840456A (en) * 2022-06-02 2022-08-02 三星(中国)半导体有限公司 Out-of-band management method of storage device, baseboard management controller and storage device
CN115292082A (en) * 2022-08-18 2022-11-04 苏州浪潮智能科技有限公司 Method and system for processing Assert downtime fault in BIOS starting process
CN116679992A (en) * 2023-05-31 2023-09-01 联想(北京)有限公司 Information processing method and device, electronic equipment and storage medium
CN116680101A (en) * 2023-05-11 2023-09-01 苏州浪潮智能科技有限公司 Method and device for detecting downtime of operating system, and method and device for eliminating downtime of operating system
CN117093465A (en) * 2023-10-17 2023-11-21 苏州元脑智能科技有限公司 Server log collection method, device, communication equipment and storage medium
CN117370107A (en) * 2023-09-21 2024-01-09 超聚变数字技术有限公司 BIOS log collection method and computing device
CN117453442A (en) * 2023-10-20 2024-01-26 苏州元脑智能科技有限公司 Recording method, device, equipment and storage medium for server error reporting information

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10489232B1 (en) * 2015-09-29 2019-11-26 Amazon Technologies, Inc. Data center diagnostic information
CN110609778A (en) * 2019-08-16 2019-12-24 苏州浪潮智能科技有限公司 Method and system for storing server downtime log
CN113742120A (en) * 2021-08-06 2021-12-03 苏州浪潮智能科技有限公司 Method, system, device and medium for kdump triggering
CN114840456A (en) * 2022-06-02 2022-08-02 三星(中国)半导体有限公司 Out-of-band management method of storage device, baseboard management controller and storage device
CN115292082A (en) * 2022-08-18 2022-11-04 苏州浪潮智能科技有限公司 Method and system for processing Assert downtime fault in BIOS starting process
CN116680101A (en) * 2023-05-11 2023-09-01 苏州浪潮智能科技有限公司 Method and device for detecting downtime of operating system, and method and device for eliminating downtime of operating system
CN116679992A (en) * 2023-05-31 2023-09-01 联想(北京)有限公司 Information processing method and device, electronic equipment and storage medium
CN117370107A (en) * 2023-09-21 2024-01-09 超聚变数字技术有限公司 BIOS log collection method and computing device
CN117093465A (en) * 2023-10-17 2023-11-21 苏州元脑智能科技有限公司 Server log collection method, device, communication equipment and storage medium
CN117453442A (en) * 2023-10-20 2024-01-26 苏州元脑智能科技有限公司 Recording method, device, equipment and storage medium for server error reporting information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
管理代理与监控模块通信设计;康松林 等;《计算机应用研究》;20070531;第24卷(第5期);173-175 *

Similar Documents

Publication Publication Date Title
US7428663B2 (en) Electronic device diagnostic methods and systems
US6769077B2 (en) System and method for remotely creating a physical memory snapshot over a serial bus
CN111414268B (en) Fault processing method and device and server
CN111274059B (en) Software exception handling method and device of slave device
CN114328102B (en) Equipment state monitoring method, equipment state monitoring device, equipment and computer readable storage medium
CN112286709B (en) Diagnosis method, diagnosis device and diagnosis equipment for server hardware faults
CN107315656A (en) The Embedded PLC software rejuvenation method and PLC of many kernels
US5267246A (en) Apparatus and method for simultaneously presenting error interrupt and error data to a support processor
JPH0950424A (en) Dump sampling device and dump sampling method
CN114116280A (en) Interactive BMC self-recovery method, system, terminal and storage medium
EP3534259B1 (en) Computer and method for storing state and event log relevant for fault diagnosis
CN115373997A (en) Board card firmware abnormity monitoring and core data exporting method of multi-core SoC
CN117873771B (en) System downtime processing method, device, equipment, storage medium and server
EP2312443A2 (en) Information processing apparatus, method of controlling information processing apparatus and control program
CN111198832B (en) Processing method and electronic equipment
CN110825547B (en) PCIE card exception recovery device and method based on SMBUS
CN210721440U (en) PCIE card abnormity recovery device, PCIE card and PCIE expansion system
CN117873771A (en) System downtime processing method, device, equipment, storage medium and server
CN115543746A (en) Graphics processor monitoring method, system and device and electronic equipment
CN117311769B (en) Server log generation method and device, storage medium and electronic equipment
JPH11120154A (en) Device and method for access control in computer system
CN114327972A (en) Data processing method and device based on solid state disk
CN111767182A (en) SSD failure analysis method and device, computer equipment and storage medium
CN115220947A (en) Method for saving effective debugging information of firmware and obtaining reconstruction error field
CN115686914A (en) Fault recording method, computing device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant