CN117493057A

CN117493057A - Debugging method, equipment and medium for probabilistic startup failure of system

Info

Publication number: CN117493057A
Application number: CN202311477102.8A
Authority: CN
Inventors: 赵兴
Original assignee: Hexin Technology Co ltd; Shanghai Hexin Digital Technology Co ltd
Current assignee: Hexin Technology Co ltd; Shanghai Hexin Digital Technology Co ltd
Priority date: 2023-11-07
Filing date: 2023-11-07
Publication date: 2024-02-02

Abstract

The invention provides a debugging method, equipment and medium for probabilistic startup failure of a system; wherein the method comprises; reading each starting code number in the system starting program in the current system starting process, and respectively executing a first storage process and a second storage process on each starting code number; meanwhile, detecting whether the current system starting process is abnormal, and restarting the server system when the starting abnormality is detected; extracting each starting code number in the second storage area to analyze starting abnormality reasons based on the starting code numbers; the invention can analyze the reason of the abnormality by reading the starting code in the second storage area when the abnormality is started, thereby shortening the debugging time of the system software and improving the debugging effect of the software.

Description

Debugging method, equipment and medium for probabilistic startup failure of system

Technical Field

The invention belongs to the technical field of computers, and particularly relates to a debugging method and equipment for a probabilistic startup failure of a system and a computer storage medium.

Background

Due to the complexity of the system firmware, multithreading conflict and other reasons, a certain downtime probability, namely a probabilistic downtime condition, exists in the starting process of the existing server system, namely the server operating system.

In order to improve the stability of software in running, debugging is usually required in the process of software research and development and production so as to improve the performance and stability of the software and reduce the downtime probability in the actual use process; however, for the probabilistic downtime problem, the existing detection method generally uses a detection tool to perform corresponding specific detection on the server, for example: hard disk space detection, disk read-write speed detection, server load detection and the like; however, the existing detection mode is low in efficiency, so that abnormal program stages or positions cannot be quickly and accurately located, key information is lost in the detection process, analysis and acquisition efficiency of downtime reasons are low, and software debugging efficiency and debugging effect are affected.

Therefore, how to quickly and accurately obtain key information in software execution after occurrence of probabilistic downtime has become a technical problem to be solved in the art.

Disclosure of Invention

In view of the above drawbacks in the prior art, the present invention aims to provide a method, an apparatus and a computer storage medium for debugging a probabilistic startup failure of a system, which are used for solving the problems that the existing method for debugging the probabilistic downtime of the system cannot quickly and accurately locate to an abnormal program stage or position, resulting in lower analysis and acquisition efficiency of the reason of the downtime.

To achieve the above and other related objects, the present invention provides, in a first aspect, a method for debugging a probabilistic boot failure of a system, which is applicable to a server, where the server includes a first storage area and a second storage area; the debugging method for the probabilistic startup failure of the system comprises the following steps of;

reading each starting code in a system starting program in the current system starting process, and executing a first storage process and a second storage process on each starting code respectively;

meanwhile, detecting whether the current system starting process is abnormal, and restarting the server system when the starting abnormality is detected; extracting the starting code number in the second storage area to analyze the starting abnormality reason based on the starting code number;

the first storage process is to store each starting code number into the first storage area in sequence; and the second storage process is to transfer each starting code number in the first storage area to the second storage area in sequence.

In an embodiment of the present invention, the transferring each start code in the first storage area to the second storage area sequentially includes:

extracting a current starting code from the first storage area;

storing the current starting code into the second storage area in a repeated writing mode;

extracting a starting code positioned behind the current starting code from the first storage area to serve as a new current starting code, so as to execute a repeated writing process on the new current starting code; this step is repeated until exiting.

In an embodiment of the present invention, a storage rate of the second storage process is smaller than a storage rate of the first storage process.

In an embodiment of the present invention, the analyzing the reason for the start abnormality based on the start code includes:

extracting a starting abnormal code number which is a plurality of starting code numbers stored in the second storage area most recently; based on the corresponding relation between the starting code and the program execution stage/module, the program execution stage/module corresponding to the starting abnormal code is obtained and used as the starting abnormal stage/module; and analyzing the starting abnormal stage/module to obtain the reason of the starting abnormality.

In an embodiment of the present invention, the number of the second storage areas is several, and the second storage areas are used for correspondingly storing the start code number read in the system start process; the debugging method for the probabilistic startup failure of the system further comprises the following steps before each startup code in the startup program of the system is read:

determining a second storage area corresponding to the current system starting process in each second storage area;

and analyzing the reason of the starting abnormality based on the starting code, including:

extracting a current starting code corresponding to a current system starting process and extracting a historical starting code corresponding to a historical starting process; and comparing the current starting code with the corresponding information value of the historical starting code, obtaining a difference code with different information values, and analyzing the starting abnormality reason based on the difference code.

In an embodiment of the present invention, determining, in each of the second storage areas, a second storage area corresponding to a current system start-up procedure includes: acquiring the data storage time corresponding to each second storage; and taking the second storage area with the maximum data storage time as the second storage area corresponding to the current starting process.

after determining a second storage area corresponding to the current starting process, clearing data stored in the second storage area;

and sequentially storing the starting codes read in the current starting process into a second storage area after data clearing.

In an embodiment of the present invention, the comparing the current start code with the corresponding information value of the historical start code to obtain a difference code with different information values includes:

extracting the stored starting code from a second storage area corresponding to the current system starting process to serve as a current starting code; extracting the stored starting code from a second storage area corresponding to the last system starting process to be used as a historical starting code;

comparing the current starting code number with the historical starting code numbers corresponding to the same reading sequence; if the current starting code is different from the current starting code, the current starting code is used as a starting abnormal code; if the two are the same, extracting the stored starting code from a second storage area corresponding to the previous system starting process as a new historical starting code so as to re-execute comparison based on the new historical starting code; the previous system starting process is the last system starting process of the system starting process corresponding to the current historical starting code.

The present invention provides in a second aspect an electronic device comprising: the device comprises a processor and a memory, wherein the memory is in communication connection with the processor; the memory is used for storing a computer program, and the processor is used for executing the computer program stored by the memory, so that the electronic device executes the steps in the debugging method for the probabilistic boot failure of the system.

The present invention provides in a third aspect a computer storage medium storing a computer program which, when executed by a processor, implements the steps of the method for debugging a probabilistic boot failure of a system as any described above.

As described above, the debugging method, device and computer storage medium for probabilistic startup failure of a system provided by the invention are characterized in that each startup code in the startup process of the system is read, the read startup codes are stored in the first storage area, and the startup codes in the first storage area are stored in the second storage area, so that when startup abnormality occurs, the startup codes in the second storage area can be read to analyze the cause of the abnormality, thereby avoiding the loss of the startup codes in the storage process, improving the efficiency and accuracy of startup abnormality analysis, further greatly shortening the debugging time of system software and improving the debugging effect of software.

Drawings

Fig. 1 is a schematic diagram of an application scenario of the method for debugging a probabilistic startup failure of a system according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a method for debugging a probabilistic boot failure of the system according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of the server in another embodiment in the debugging method for probabilistic startup failure of the system provided by the present invention;

FIG. 4 is a flowchart of another embodiment of a method for debugging a probabilistic boot failure of a system according to the present invention;

fig. 5 is a schematic flow chart of step S100 in the debugging method for probabilistic startup failure of the system provided by the present invention;

fig. 6 is a schematic structural diagram of the electronic device according to an embodiment of the invention;

DESCRIPTION OF SYMBOLS IN THE DRAWINGS

100-a server side; 110-a first storage area; 120-a second storage area; 5-an electronic device; 51-memory; 52-processor.

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.

It should be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present invention by way of illustration, and only the components related to the present invention are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.

In order to solve the technical problems in the prior art, the invention firstly provides a debugging method for the probabilistic startup failure of a system, which is applicable to a server; the method aims at acquiring a starting code number in a server system (hereinafter referred to as a system) in the starting process of the system, so as to extract a starting abnormality code number related to starting abnormality when probabilistic starting failure occurs, and analyzing the reason of the system starting failure based on the starting abnormality code number.

The system starting process is a process of executing a system starting program by a server;

the start code is identification information for characterizing a program execution stage/module of the system start program in the execution process.

In one embodiment, the start code is information corresponding to a key instruction in a system start program; illustratively, when the server is a Power server, the start code includes 0x01, 0x02, 0x03, and 0x04; wherein 0x01 represents the end of power-up; 0x02 denotes general purpose input output port (General Purpose Input Output, GPIO) initialization; 0x03 denotes Flash memory initialization; 0x04 denotes the RAM code start-up.

In another embodiment, the start code is an identification code pre-embedded in the system start program at a critical stage/process.

Referring to fig. 1, an application scenario diagram of a debugging method for probabilistic startup failure of the system in an embodiment is shown. As shown in fig. 1, the server 100 includes a first storage area 110 and a second storage area 120; the first storage area 110 is configured to cache the read start code; the second storage area 120 is configured to store each of the start codes in the first storage area.

The access speed corresponding to the first storage area is greater than that of the second storage area, so that the quick storage of each starting code in the first storage area is realized; illustratively, the access speed corresponding to the first storage area is not lower than 300MB/S.

The size of the second storage area is not smaller than the data size of a starting code contained in the system starting program; illustratively, the second storage area is 64 bytes in size to ensure that all of the start codes can be stored.

In an alternative embodiment, the first storage area is a storage area provided in a Static Random-Access Memory (SRAM); the second storage region is a storage region provided in a complementary metal oxide semiconductor memory (Complementary Metal Oxide Semiconductor, CMOS).

Referring to fig. 2, a flow chart of an embodiment of the method for debugging the probabilistic boot failure of the system provided in the present application is shown; as shown in fig. 2, the method comprises the steps of:

s200, reading each starting code in the system starting program in the current system starting process, and executing a first storage process and a second storage process on each starting code respectively;

the first storage process is to sequentially store the read starting codes into the first storage area;

and the second storage process is to transfer each starting code number in the first storage area to the second storage area in sequence.

Specifically, the implementation manner of transferring each start code in the first storage area to the second storage area in turn includes the following steps:

extracting a current starting code from the first storage area;

storing the current starting code in the second storage area in a repeated writing mode to ensure that the current starting code can be successfully stored;

extracting a starting code number positioned behind the current starting code number from the first storage area as a new current starting code number so as to execute a repeated writing process on the new current starting code number; this step is repeated until exiting.

S400, while executing the step S200, detecting whether a starting abnormality occurs in the current system starting process, if so, restarting the server system;

specifically, whether the server has a starting abnormal state in the execution process of executing the system starting program is detected, if not, the current system starting process is continuously executed; if so, re-executing the system starting process of the server to realize the restarting of the system starting program.

The implementation manner of detecting whether the current system starting process has a starting abnormality is the same as the existing starting abnormality detection manner, and is not described herein.

S600, extracting each starting code number in the second storage area to analyze starting abnormality reasons based on the starting code numbers.

Specifically, after the system is restarted, each starting code stored in the second storage area is obtained in an operating system of the server, so that the starting abnormality reason of the server is analyzed based on the obtained starting code.

In a specific implementation manner, the implementation manner of analyzing the reason of the starting abnormality based on the starting code number includes:

extracting n latest stored starting codes from the second storage area to serve as starting abnormal codes;

acquiring a program execution stage/module corresponding to the starting abnormal code as a starting abnormal stage/module in the system starting program based on the corresponding relation between the starting code and the program execution stage/module;

and analyzing the starting abnormal stage/module to obtain the reason of the starting abnormality.

Wherein n is any integer from 1 to 10.

In order to solve the technical problems in the prior art, the invention also provides another debugging method for the probabilistic startup failure of the system; in this embodiment, as shown in fig. 3, the server includes a first storage area and a plurality of second storage areas; each second storage area is used for respectively storing the starting code read in the single system starting process.

In an alternative embodiment, the number of the second storage areas is the same as the number of the history starting processes to be detected; the history starting process is the system starting process which is executed when the server generates system starting abnormality; and when the historical starting process to be detected is abnormal system starting, analyzing each historical starting process for starting the reason of the abnormality.

Illustratively, when the history starting process to be detected is 4 times, the number of the second storage areas is also 4.

In this embodiment, before executing step S200, the debugging method for probabilistic boot failure of the system further includes, as shown in fig. 4:

s100, determining a second storage area corresponding to the current system starting process in each second storage area;

the second storage area corresponding to the current system starting process is a second storage area for storing the starting code in the current system starting process.

Specifically, in each of the second storage areas, an implementation manner of the second storage area corresponding to the current system start-up procedure is determined, as shown in fig. 5, including:

s101, acquiring data storage time corresponding to each second storage;

the data storage time is the storage time length of the currently stored starting code in the second storage area;

s102, taking the second storage area with the largest data storage time as the second storage area corresponding to the current starting process.

In this embodiment, when executing step S200, the second storing process is to sequentially transfer each start code in the first storage area to a second storage area corresponding to the current system start process.

Specifically, after determining a second storage area corresponding to the current starting process, clearing the stored data in the second storage area; and transferring the starting codes read in the current starting process from the first storage area to a second storage area after data clearing.

In this embodiment, step S600, when executed, includes:

s600', extracting a current starting code corresponding to a current system starting process and extracting a historical starting code corresponding to a historical starting process; and comparing the current starting code with the corresponding information value of the historical starting code, obtaining a difference code with different information values, and analyzing the starting abnormality reason based on the difference code.

The corresponding historical starting codes are the historical starting codes with the same reading sequence as the current starting codes; illustratively, in the current starting process, the current starting code number stored after the nth reading is performed on the system starting process; the historical starting code corresponding to the current starting code is the starting code stored after the nth reading in the historical starting process.

The difference code is a code with information value different from the historical starting code of the same reading sequence in the current starting code.

Specifically, the implementation manner of comparing the current start code with the corresponding information value of the historical start code to obtain the difference code with different information values comprises the following steps:

extracting the stored starting code from a second storage area corresponding to the current system starting process by using an extracting tool, and taking the stored starting code as the current starting code; extracting the stored starting code number from a second storage area corresponding to the last system starting process by using an extracting tool, and taking the starting code number as a historical starting code number;

comparing the current start code with the historical start code corresponding to the same reading order, namely comparing the current start code corresponding to the first reading order with the information value of the historical start code corresponding to the first reading order; if the current starting code is different from the historical starting code, the current starting code is used as a starting abnormal code, and the starting abnormal codes between the current starting process and the historical starting process are obtained by analogy;

if the two are the same, namely, the starting abnormal code is not generated between the current starting process and the historical starting process, the stored starting code is extracted from the second storage area corresponding to the previous system starting process and is used as a new historical starting code, the comparison process is executed based on the new historical starting code, and the process is repeated until the system exits.

The previous system starting process is the last system starting process of the system starting process corresponding to the current historical starting code.

and detecting and analyzing the starting abnormality stage/module to obtain the reason of the starting abnormality.

According to the debugging method for the probabilistic startup failure of the system, provided by the embodiment, the historical startup codes in the startup process of the corresponding system are stored by arranging a plurality of second storage areas, and the current startup codes and the historical startup codes in the same reading sequence are compared to realize the rapid and comprehensive acquisition of each startup abnormal code, so that the efficiency and accuracy of startup abnormal analysis are further improved.

In order to solve the technical problems in the prior art, the embodiment of the invention also provides an electronic device, please refer to fig. 6, which shows a schematic structural diagram of the electronic device; as shown in fig. 6, the electronic device 5 includes a memory 51 and a processor 52 connected to each other; the memory 51 is used for storing a computer program and the processor 52 is used for executing the computer program stored in the memory 51, so that the electronic device can implement the steps in the debugging method of the probabilistic boot failure of the system when the electronic device executes.

Alternatively, the number of the memories may be one or more, and the number of the processors may be one or more.

Optionally, according to the steps in the debugging method of the system probabilistic startup failure, the processor in the electronic device loads the instructions corresponding to the one or more application program processes into the memory, and the processor runs the application program stored in the memory, so that each function in the debugging method of the system probabilistic startup failure is correspondingly implemented, and details thereof are not repeated herein.

It should be noted that the memory includes, but is not limited to, a random access memory (Random Access Memory, abbreviated as RAM), and may further include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. The same processor may be a general processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processing, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field programmable gate arrays (Field Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when being called by a processor, implements the debugging method of the probabilistic startup failure of the system.

Wherein the computer-readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices.

The computer readable program described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

In summary, according to the debugging method, the device and the computer storage medium for the probabilistic startup failure of the system provided by the invention, by reading each startup code in the startup process of the system, storing the read startup code in the first storage area, and transferring the startup code in the first storage area to the second storage area, when startup abnormality occurs, the startup code in the second storage area can be read, and the cause of the abnormality can be analyzed, so that the loss of the startup code in the storage process can be avoided, the efficiency and the accuracy of analyzing the startup abnormality can be improved, the debugging time of the system software can be greatly shortened, and the debugging effect of the software can be improved; in addition, by setting a plurality of second storage areas corresponding to the system starting process so as to store the starting codes read in the corresponding historical starting process, starting abnormality occurs, and by comparing the current starting code with the starting code corresponding to the previous system starting process, the starting abnormality code can be obtained more comprehensively and rapidly, the influence of the starting code loss on abnormality analysis is further reduced, and the software debugging efficiency and the software debugging effect are further improved.

The above embodiments are merely illustrative of the principles of the present invention and its effectiveness, and are not intended to limit the invention. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the invention. Accordingly, it is intended that all equivalent modifications and variations of the invention be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.

Claims

1. The debugging method for the probabilistic startup failure of the system is characterized by being suitable for a server, wherein the server comprises a first storage area and a second storage area;

the debugging method for the probabilistic startup failure of the system comprises the following steps of;

2. The method for debugging a probabilistic boot failure of a system according to claim 1, wherein the step of sequentially transferring each boot code in the first storage area to the second storage area includes:

extracting a current starting code from the first storage area;

extracting a starting code positioned behind the current starting code from the first storage area to serve as a new current starting code, so as to execute a repeated writing process on the new current starting code;

this step is repeated until exiting.

3. The method of claim 1, wherein the second stored procedure has a lower storage rate than the first stored procedure.

4. The method for debugging a probabilistic boot failure of a system according to claim 1, wherein analyzing a cause of a boot exception based on the boot code comprises:

extracting a starting abnormal code number which is a plurality of starting code numbers stored in the second storage area most recently;

based on the corresponding relation between the starting code and the program execution stage/module, the program execution stage/module corresponding to the starting abnormal code is obtained and used as the starting abnormal stage/module;

5. The method for debugging a probabilistic startup failure of a system according to claim 1, wherein the number of the second storage areas is several, and the second storage areas are used for correspondingly storing the startup code read in the startup process of the system;

the debugging method for the probabilistic startup failure of the system further comprises the following steps before each startup code in the startup program of the system is read:

6. The method for debugging a probabilistic system boot failure of claim 5, wherein determining a second memory area corresponding to a current system boot process in each of the second memory areas comprises:

acquiring the data storage time corresponding to each second storage;

and taking the second storage area with the maximum data storage time as the second storage area corresponding to the current starting process.

7. The method for debugging a probabilistic boot failure of a system of claim 5, wherein the sequentially transferring each boot code in the first storage area to the second storage area comprises:

8. The method for debugging a probabilistic startup failure of a system of claim 5, wherein comparing the current startup code with the corresponding information value of the historical startup code to obtain a difference code with different information values comprises:

comparing the current starting code number with the historical starting code numbers corresponding to the same reading sequence; if the current starting code is different from the current starting code, the current starting code is used as a starting abnormal code; if the two are the same, extracting the stored starting code from a second storage area corresponding to the previous system starting process as a new historical starting code so as to re-execute comparison based on the new historical starting code;

9. An electronic device, comprising: the device comprises a processor and a memory, wherein the memory is in communication connection with the processor;

the memory is configured to store a computer program, and the processor is configured to execute the computer program stored in the memory, to cause the electronic device to execute the debugging method of the system probabilistic boot failure as claimed in any one of claims 1 to 8.

10. A computer storage medium storing a computer program, wherein the computer program when executed by a processor implements the method for debugging a probabilistic boot failure of a system according to any one of claims 1 to 8.