CN112395137A - Linux kernel exception processing method, equipment and device - Google Patents
Linux kernel exception processing method, equipment and device Download PDFInfo
- Publication number
- CN112395137A CN112395137A CN202110078299.2A CN202110078299A CN112395137A CN 112395137 A CN112395137 A CN 112395137A CN 202110078299 A CN202110078299 A CN 202110078299A CN 112395137 A CN112395137 A CN 112395137A
- Authority
- CN
- China
- Prior art keywords
- information
- exception
- kernel
- memory
- operating system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1415—Saving, restoring, recovering or retrying at system level
- G06F11/1441—Resetting or repowering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1479—Generic software techniques for error detection or fault masking
- G06F11/1482—Generic software techniques for error detection or fault masking by means of middleware or OS functionality
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
A handling method, equipment and a device for linux kernel exception are provided, wherein the linux kernel comprises a first operating system kernel and a second operating system kernel, and the handling method comprises the following steps: when the kernel of the first operating system is abnormal, capturing abnormal information and storing the abnormal information in a memory; starting a second operating system kernel; after the kernel of the second operating system is started successfully, extracting the abnormal information stored in the memory, and storing the extracted abnormal information into the nonvolatile memory; and restarting the first operating system kernel, and resetting the hardware system. By the scheme, the linux kernel can be prevented from being suspended due to the exception, and kernel exception information can be recorded.
Description
Technical Field
The present disclosure relates to computer technologies, and in particular, to a method, an apparatus, and a device for handling linux kernel exceptions.
Background
With the rapid development of network technology, the security threat of the network is getting bigger and bigger, and the application of security products in the network is getting wider and wider. Through the information security construction for many years, China obtains certain achievements in the aspects of virus prevention, network and boundary security, but the environmental security construction of network security products for storing and processing data is not paid enough attention, which is the most important information security and the last line of defense. Some hackers attack the kernel defect of the network security product to cause the stack exception of the operating system, so that the network security product cannot normally run, and the connectivity and the security of the whole network are affected. Because the linux kernel cannot log and track such stacks, locating exceptions is very difficult.
Because network security products have very strict standards for their own security, how to solve the recording and automatic reset when the linux kernel is abnormal is an important technical problem for security manufacturers.
Disclosure of Invention
The application provides a handling method, equipment and device for linux kernel exception, which can prevent the linux kernel from being suspended due to exception and can record kernel exception information.
The present disclosure provides a method for processing an exception of a linux kernel, where the linux kernel includes a first operating system kernel and a second operating system kernel, and the method includes:
when the kernel of the first operating system is abnormal, capturing abnormal information and storing the abnormal information in a memory;
starting a second operating system kernel;
after the kernel of the second operating system is started successfully, extracting the abnormal information stored in the memory, and storing the extracted abnormal information into the nonvolatile memory;
and restarting the first operating system kernel, and resetting the hardware system kernel.
In an exemplary embodiment, before the booting the second operating system kernel, the method further includes:
reserving a storage space with a preset size in a memory as a reserved memory when the first operating system kernel is initialized; after the first operating system kernel is started, the image file of the second operating system kernel is imported into the reserved memory;
the launching a second operating system kernel comprises:
and jumping to the second operating system kernel in the reserved memory, and running the second operating system kernel.
In an exemplary embodiment, the storing the captured exception information in the memory includes:
the abnormal information code recorded by the first operating system kernel is stored in a note area of the vmcore file;
the extracting the abnormal information stored in the memory comprises:
and reading the encoded abnormal information from the note area of the vmcore file and decoding.
In an exemplary embodiment, the capturing exception information includes:
analyzing corresponding information according to different exception types and storing the information as exception information, wherein the exception types comprise: illegal memory, software deadlock, memory exhaustion, other exceptions;
when the exception type is illegal memory, software deadlock and other exceptions, the analyzed information comprises: description information, register information, and stack information;
when the exception type is memory exhaustion, the analyzed information includes: description information, stack information, and memory usage.
In an exemplary embodiment, the saving the extracted exception information to non-volatile memory includes:
storing the extracted abnormal information into a non-volatile memory after storing the abnormal information as a text file;
in the text file, the abnormal information is classified according to the type of the information, and the information of the same type is concentrated in one area to be displayed in the text file; the categories of the information include: description information, stack information, register information, memory usage ranks.
The present disclosure also provides a device for handling an exception of a linux kernel, where the linux kernel includes a first operating system kernel and a second operating system kernel; the processing apparatus includes:
the storage control module is used for capturing abnormal information and storing the abnormal information in the memory when the kernel of the first operating system is abnormal;
the kernel control module is used for starting a second operating system kernel;
the storage control module is also used for extracting the abnormal information stored in the memory and storing the abnormal information into the nonvolatile memory after the kernel control module successfully starts the kernel of the second operating system;
the kernel control module is further configured to restart the first operating system kernel and reset the hardware system kernel after the storage control module stores the exception information in the nonvolatile memory.
In an exemplary embodiment, the kernel control module is further configured to reserve a storage space with a preset size in a memory as a reserved memory when the first operating system kernel is initialized; when the first operating system kernel is started, the image file of the second operating system kernel is imported into the reserved memory;
the kernel control module starts a second operating system kernel and comprises the following steps:
and the kernel control module jumps to the second operating system kernel in the reserved memory to run the second operating system kernel.
In an exemplary embodiment, the storage control module includes:
the exception interpretation submodule is used for capturing exception information when the kernel of the first operating system is abnormal;
the exception unloading/recovering submodule is used for coding and storing the exception information captured by the exception interpretation submodule into a note area of the vmcore file; when the kernel control module successfully starts the kernel of the second operating system, reading the encoded abnormal information from the note area of the vmcore file and decoding;
and the exception storage submodule is used for storing the decoded exception information into the nonvolatile memory.
In an exemplary embodiment, the exception interpretation sub-module capturing exception information includes:
the exception interpretation submodule analyzes corresponding information according to different exception types and stores the information as exception information; the exception types include: illegal memory, software deadlock, memory exhaustion, other exceptions; when the exception type is illegal memory, software deadlock and other exceptions, the analyzed information comprises: description information, register information, and stack information; when the exception type is memory exhaustion, the analyzed information includes: description information, stack information, and memory usage.
In an exemplary embodiment, the saving, by the exception storage submodule, the decoded exception information to the nonvolatile memory includes:
the exception storage submodule stores the decoded exception information into a text file and then stores the text file into a nonvolatile memory; in the text file, the abnormal information is classified according to the type of the information, and the information of the same type is concentrated in one area to be displayed in the text file; the categories of the information include: description information, stack information, register information, memory usage ranks.
The present disclosure also provides a device for handling linux kernel exception, where the linux kernel includes a first operating system kernel and a second operating system kernel, and the device includes a memory and a processor; the memory is used for storing a program for handling linux kernel exception handling, and the processor is used for reading and executing the program for handling linux kernel exception handling and executing the method of any one of the above embodiments.
The present disclosure also provides a storage medium, which is applied to a service system including a first operating system kernel and a second operating system kernel, where a program for handling linux kernel exception is stored in the storage medium, and the program is configured to execute the method in any one of the foregoing embodiments when running.
Compared with the prior art, the method mainly achieves the purposes of recording the kernel abnormal information into the storage device and automatically resetting the system through the double-kernel switching method, so that the safety device can be automatically recovered from the kernel abnormal information under the unattended condition, the related abnormal information is recorded, the subsequent positioning analysis of an administrator is facilitated, the service connection is rapidly recovered, and the long-time service interruption is avoided.
Other aspects will be apparent upon reading and understanding the attached drawings and detailed description.
Drawings
FIG. 1 is a flowchart of a handling method of linux kernel exceptions according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a device for handling linux kernel exceptions according to an embodiment of the present application;
FIG. 3 is a diagram illustrating connections between sub-modules in a processing device and various components in the business system in some example embodiments;
FIG. 4 is a schematic diagram of the operation of a dual-core architecture submodule in some example embodiments;
FIG. 5 is a schematic diagram illustrating the operation of an exception dump/restore submodule in some example embodiments;
FIG. 6 is a schematic diagram of a kernel exception log in some example embodiments.
Detailed Description
The present application describes embodiments, but the description is illustrative rather than limiting and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the embodiments described herein. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with or instead of any other feature or element in any other embodiment, unless expressly limited otherwise.
The present application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features and elements disclosed in this application may also be combined with any conventional features or elements to form a unique inventive concept as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive aspects to form yet another unique inventive aspect, as defined by the claims. Thus, it should be understood that any of the features shown and/or discussed in this application may be implemented alone or in any suitable combination. Accordingly, the embodiments are not limited except as by the appended claims and their equivalents. Furthermore, various modifications and changes may be made within the scope of the appended claims.
Further, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other orders of steps are possible as will be understood by those of ordinary skill in the art. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Further, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present application.
Aiming at the attack of linux kernel defects, the system of the network security product is often restarted and even suspended to cause serious abnormity. If the kernel is restarted, the abnormal stack information of the kernel is usually printed on a console (a serial port, a display and the like), and if the console is not connected in advance, the printed abnormal stack information of the kernel disappears, so that maintenance personnel cannot find and acquire the abnormal stack information in time and cannot analyze the reason of the abnormality. If the kernel is abnormal, the system is suspended, the whole equipment cannot operate, and the system cannot be automatically recovered, so that a serious accident of network failure is caused.
The existing linux system has two processing modes when an exception occurs:
one processing mode is to enter another system by using a kdump mechanism of a linux kernel, wherein kdump is a kernel crash dump mechanism, if the kernel uses the kdump mechanism, when the system crashes, the kdump mechanism starts a second kernel, at the moment, the system stops in the second kdump kernel, waits for an administrator to manually analyze and process system crash information (vmcore file), and restarts and restores the whole system; however, manually analyzing the exception requires analyzing the assembly code in the vmcore file using various linux tools, finding the assertion location, which has a very high requirement on the technical level of the administrator, and it is often difficult to find the corresponding CVE (Common vunneabilities & explores, general vulnerability disclosure). And for the bugs which are not published, the cause is more difficult to analyze. And manual recovery system can cause long network breaking time, and often causes great loss to some service scenes with high real-time requirement.
The other processing mode is that no processing is carried out and the robustness design of the linux kernel is relied on. At this time, the system may be in a down state and wait for manual reset, which also causes the problem that the service cannot be recovered for a long time; or the recovery service is automatically restarted, so that although the service can be recovered in a short time, the error site is covered at the moment, the cause of the problem cannot be located, and the abnormal cause cannot be traced.
The embodiment of the application provides a handling method of linux kernel exception, wherein the linux kernel comprises a first operating system kernel and a second operating system kernel; the processing method is shown in FIG. 1 and comprises steps S110-S140:
s110, capturing abnormal information and storing the abnormal information in a memory when the kernel of the first operating system is abnormal;
s120, starting a second operating system kernel;
s130, after the kernel of the second operating system is started successfully, extracting the abnormal information stored in the memory, and storing the extracted abnormal information into the nonvolatile memory;
s140, restarting the first operating system kernel, and resetting the hardware system kernel.
The method and the device can complete the system function of the linux double-kernel, and are designed into an operation mechanism of double-kernel cooperative work when the kernel is abnormal. The meaning of dual kernels is that the system contains two kernels:
1) the service kernel, i.e., the first operating system kernel above, is the resident operating system kernel of the user service scenario. This core is always running without a core exception.
2) The rescue kernel, i.e. the above second operating system kernel, is an operating system kernel responsible for recording abnormal information when the service kernel is abnormal. It only runs briefly and resets itself when a service kernel exception occurs.
The reason why the service kernel cannot record the abnormal information by itself is that, often when a fatal abnormality occurs, the kernel system is already collapsed (crash), and at this time, the service kernel cannot normally access the peripheral devices such as the hard disk and the like. Therefore, another kernel system (i.e., a rescue kernel) which normally operates is required to complete the functions of recording and resetting the abnormal information.
The linux native kdump technology can enter a kdump kernel system when some kernel exceptions occur in a main service system, but the kdump system enters a silent waiting state, so that the operation of the service system cannot be automatically recovered.
The Linux native kdump technique can only be triggered under a very limited number of kernel exceptions. The embodiment of the application can trigger the double-kernel system to take effect when almost all the kernels are abnormal, and a series of related operations are automatically completed.
The embodiment of the application can trigger the equipment to restart and recover to a normal service system under almost all abnormal conditions of the kernel, so that the shortest service interruption time is ensured; and in almost all kernel abnormal conditions, the abnormal information can be guaranteed to be recorded in a nonvolatile storage device (such as but not limited to a CF card or a hard disk lamp), and the abnormal information can be checked at any time after the service is recovered.
The embodiment of the application uses a double-kernel system and a high-risk service system which is positioned at an internet outlet and under a large number of high-frequency network attacks. Aiming at the attack of the kernel, which is the bottommost layer and has the largest destructive power to the system, the embodiment of the application can quickly recover the service continuity under the damage, avoid long-time service interruption and keep abnormal information.
In some exemplary embodiments, before the starting the second operating system kernel, the method further includes:
reserving a storage space with a preset size in a memory as a reserved memory when the first operating system kernel is initialized; after the first operating system kernel is started, the image file of the second operating system kernel is imported into the reserved memory;
the launching a second operating system kernel comprises:
and jumping to the second operating system kernel in the reserved memory, and running the second operating system kernel.
In other embodiments, the reservation of the memory and the saving of the second os kernel image file at other time nodes in other manners are not excluded, and the present application does not limit this.
In some exemplary embodiments, the storing the capture exception information in the memory includes:
the abnormal information code recorded by the first operating system kernel is stored in a note area of the vmcore file;
the extracting the abnormal information stored in the memory comprises:
and reading the encoded abnormal information from the note area of the vmcore file and decoding.
Information exchange between the dual-kernel systems and extraction of core valid exception information are important key points of the embodiment. Since the system is faced with a scenario that a fatal abnormality occurs and various operating system analysis tools have failed, the method for storing the field information most stably is realized by an effective information loading manner, namely a note area of the vmcore file.
In other embodiments, the exception information is not transmitted between the two cores in other manners, which is not limited in this application.
In some exemplary embodiments, the capturing of the anomaly information includes:
analyzing corresponding information according to different exception types and storing the information as exception information; the exception types include: illegal memory, software deadlock, memory exhaustion, other exceptions; when the exception type is illegal memory, software deadlock and other exceptions, the analyzed information comprises: description information, register information, and stack information; when the exception type is memory exhaustion, the analyzed information includes: description information, stack information, and memory usage.
In this embodiment, four types of exception information, i.e., illegal memory, software deadlock, memory exhaustion, and other exceptions, may be automatically captured by means of a kernel patch.
In this embodiment, by designing the function of recording the exception information, the function is called at each native system exception point in the first operating system kernel to record various types of exception information.
The embodiment can record different abnormal information aiming at various errors of the kernel. The exception information includes stack information that is most easily understood by IT personnel so that the cause of the exception can be understood without secondary analysis.
In other embodiments, which information is to be recorded as the exception information when different exception types are designed according to needs and conditions of the service system. The present application does not limit the type of the abnormality, the type of the analyzed and stored information, or the like.
In some exemplary embodiments, said saving the extracted exception information to non-volatile memory comprises:
storing the extracted abnormal information into a non-volatile memory after storing the abnormal information as a text file; in the text file, the abnormal information is classified according to the type of the information, and the information of the same type is concentrated in one area to be displayed in the text file; the categories of the information include: description information, stack information, register information, memory usage ranks.
The embodiment can be used for combing, classifying and displaying the abnormal information when the kernel is abnormal, and is convenient for subsequent analysis. In this embodiment, the exception information may be displayed in the form of a log, and sorted from near to far according to the exception occurrence time. Each exception forms a complete description, an example format of which is shown in fig. 6, and contains the following information: occurrence time, description information, detailed exception information (such as stack information, register information, memory occupation ranking, and the like), and record end identification.
After the linux native kdump technology is triggered, abnormal information cannot be automatically displayed, classified and stored, and the size of the generated vmcore file containing fault information is large like a physical memory, that is, a server with a 4-gigabyte physical memory, and the vmcore file generated by a single fault also has 4 gigabytes, so that the export offline analysis operation is difficult to perform. The analysis can only be performed locally, and the analysis process consumes manpower, resulting in long-time paralysis of the business system. The embodiment pertinently refines and stores the fault information, and can analyze the fault information after fault recovery. And a record generated by a single fault is only hundreds of bytes, so that the derivation and analysis are very convenient.
In other embodiments, the type of information and the format during storage may be designed according to needs and conditions of the service system, which are not limited in this application.
The embodiment of the application provides a device for processing exception of a linux kernel, wherein the linux kernel comprises a first operating system kernel and a second operating system kernel; as shown in fig. 2, the processing apparatus includes:
the storage control module is used for capturing abnormal information and storing the abnormal information in the memory when the kernel of the first operating system is abnormal;
the kernel control module is used for starting a second operating system kernel;
the storage control module is also used for extracting the abnormal information stored in the memory and storing the abnormal information into the nonvolatile memory after the kernel control module successfully starts the kernel of the second operating system;
the kernel control module is further configured to restart the first operating system kernel and reset the hardware system kernel after the storage control module stores the exception information in the nonvolatile memory.
In some exemplary embodiments, the kernel control module is further configured to reserve a storage space with a preset size in the memory as a reserved memory when the first operating system kernel is initialized; when the first operating system kernel is started, the image file of the second operating system kernel is imported into the reserved memory;
the kernel control module starts a second operating system kernel and comprises the following steps:
and the kernel control module jumps to the second operating system kernel in the reserved memory to run the second operating system kernel.
In other embodiments, the reservation of the memory and the saving of the second os kernel image file at other time nodes in other manners are not excluded, and the present application does not limit this.
In some exemplary embodiments, the storage control module comprises:
the exception interpretation submodule is used for capturing exception information when the kernel of the first operating system is abnormal;
the exception unloading/recovering submodule is used for coding and storing the exception information captured by the exception interpretation submodule into a note area of the vmcore file; when the kernel control module successfully starts the kernel of the second operating system, reading the encoded abnormal information from the note area of the vmcore file and decoding;
and the exception storage submodule is used for storing the decoded exception information into the nonvolatile memory.
In other embodiments, the storage control module may be implemented by adopting other sub-module division modes, which is not limited in this application.
In other embodiments, the exception information is not transmitted between the two cores in other manners, which is not limited in this application.
In some exemplary embodiments, the exception interpretation sub-module capturing exception information comprises:
the exception interpretation submodule analyzes corresponding information according to different exception types and stores the information as exception information; the exception types include: illegal memory, software deadlock, memory exhaustion, other exceptions; when the exception type is illegal memory, software deadlock and other exceptions, the analyzed information comprises: description information, register information, and stack information; when the exception type is memory exhaustion, the analyzed information includes: description information, stack information, and memory usage.
In other embodiments, which information is to be recorded as the exception information when different exception types are designed according to needs and conditions of the service system. The present application does not limit the type of the abnormality, the type of the analyzed and stored information, or the like.
In some exemplary embodiments, the saving, by the exception storage submodule, the decoded exception information to the nonvolatile memory includes:
the exception storage submodule stores the decoded exception information into a text file and then stores the text file into a nonvolatile memory; in the text file, the abnormal information is classified according to the type of the information, and the information of the same type is concentrated in one area to be displayed in the text file; the categories of the information include: description information, stack information, register information, memory usage ranks.
In other embodiments, the type of information and the format during storage may be designed according to needs and conditions of the service system, which are not limited in this application.
The processing device is described below by using an example, and is applied to a business system comprising a business core, a rescue core, a volatile storage device (such as a memory) and a nonvolatile storage; the connection relationship between the sub-modules in the processing apparatus and each part in the business system is shown in fig. 3, and includes the following sub-modules:
the dual-kernel architecture submodule (namely the kernel control module) is responsible for realizing the framework submodule for skipping from the service kernel to the rescue kernel when an exception occurs and restarting and recovering to the service kernel in the rescue kernel.
And the abnormal unloading/recovering submodule is responsible for unloading the abnormal information in the service kernel and recovering the abnormal information in the rescue kernel.
And the exception interpretation submodule and the exception storage submodule are respectively responsible for analyzing and storing the exception information in the service kernel and classifying and storing the exception information in the rescue kernel.
As shown in fig. 3, the whole business system operates as follows: the dual-kernel framework submodule is responsible for scheduling the import, the start, the reset and the reset of the two kernels. The running chain of the whole system can be ensured to normally complete the whole abnormal information recording and self-recovery process; the exception unloading/recovering submodule completes the transmission and recovery of exception information between the two kernels; and the exception interpretation/storage submodule completes the specific information extraction and classified storage of each kernel exception.
The working principle of the dual-core architecture submodule in this example is described below, as shown in fig. 4, the dual-core architecture submodule is responsible for the operation of the whole business system, and under normal conditions, the system will operate in the business core, and after an abnormality occurs, the system will enter the rescue core to capture and record the abnormal information, and then restart the system.
The key technical points of the submodule with the double-kernel architecture are as follows:
when a service kernel is started and initialized, a certain memory space is forcibly reserved for guiding in a rescue kernel.
The implementation mode is as follows: and the kernel codes modify kernel cmdlene parameters to realize related functions.
Implementation function/tool: and (2) modifying the linux starting memory reservation parameter by using the void tb _ adjust _ x86_64_ cmdline.
And secondly, after the service kernel is started, importing the image file of the rescue kernel into the reserved memory.
The implementation mode is as follows: adding related operation in the starting configuration file;
implementation function/tool: kexec;
the method comprises the following specific operations: kexec-p/mnt/boot/cap _ kernel.img-initrd =/mnt/boot/cap _ rootfs.img;
and skipping to the rescue kernel stored in the reserved memory after the business kernel encounters an abnormality, and operating the rescue kernel.
The implementation mode is as follows: the Kexec jump function is called by kernel coding;
implementation function/tool: void creat kexec (struct pt _ regs);
and fourthly, restarting the system after the rescue kernel finishes recording the abnormal information.
The implementation mode is as follows: adding related operations in a boot profile
Implementation function/tool: reboot (R)
The method comprises the following specific operations: reboot-f
The working principle of the exception dump/restore submodule in this example is explained below, as shown in fig. 5.
Since the service kernel and the rescue kernel are two independent operating systems, when the service system is switched from the service kernel to the rescue kernel, on one hand, the service kernel is already in panic (panic) and cannot access the hard disk, and on the other hand, the storage data of the whole memory space is volatile, so that how to transfer the abnormal information is a key technical point of the sub-module.
The linux operating system maintains a vmcore file during running, and the vmcore file is provided for an administrator in the kdump mechanism, is used for analyzing the reason of the abnormality, and can be transmitted between the two kernels. In this example, if an exception is encountered by the service kernel, exception information to be recorded is encoded and stored in a note area of the vmcore file. And after entering the rescue kernel, reading and decoding the information stored in the note area to obtain a corresponding abnormal log.
The key technical points of the exception unloading/recovering submodule are as follows:
firstly, coding and inserting an abnormal record into a vmcore file by a service kernel;
the implementation mode is as follows: the kernel code realizes related functions;
implementation function/tool:
the void tb _ print _ kdump (constchar × fmt.) encodes and formats the exception information;
the void tb _ appended _ exclusion _ to _ vmcore _ note (void) inserts the exception information into the note area of the vmcore file;
secondly, the rescue kernel reads abnormal information from the vmcore file;
the implementation mode is as follows: developing/using tools to implement the relevant functions;
implementation function/tool:
readelf file information extraction tool
vmcore _ note _ part exception information decoding tool
The method comprises the following specific operations:
readelf -n /proc/vmcore > /tmp/vmcore_note_info
reading out abnormal information of a note area in the vmcore file, and storing the abnormal information into a temporary file;
vmcore_note_parse /tmp/vmcore_note_info /mnt/boot/exception.txt
and decoding the abnormal information in the temporary file and recording the abnormal information in the abnormal log file.
The exception unloading/recovering submodule inserts the exception information code captured and stored by the exception interpretation submodule into the vmcore file, and sends the recorded exception log file to the exception storage submodule to be stored in the nonvolatile memory. The following explains the working principle of the exception interpretation submodule and the exception storage submodule in this example. The module mainly deeply excavates the reasons of each kernel exception, combs the trigger points of the kernel exception in the kernel code, and locates the information needed by the exceptions.
The exceptions supported by the exception interpretation submodule and capable of being captured comprise:
(1) illegal memory (bad _ area): meaning that a function in the kernel accesses an illegal address. The contents to be analyzed and stored in this case are:
typical description: unable to handle kernel pointer dereference at xxx
Register information: various variables and register values of the operation site at the moment;
stack information: function call relations before and after the abnormal function.
(2) Software deadlock (soft lockup): meaning that a function in the kernel has an endless loop. This problem is a problem that the damage to the user service system is the greatest, and once it occurs, the service system will be paralyzed and cannot be recovered. The contents to be analyzed and stored in this case are:
typical description: soft lockup-CPU # 0 stuck or xxxs!
Register information: various variables and register values of the operation site at the moment;
stack information: function call relations before and after the function of the dead loop occur.
(3) Memory exhaustion (OOM): refers to the exhaustion of the memory of the entire operating system. In this case, the situation is complicated, and there is a possibility that the system memory is insufficient. It is also possible that a memory leak exists in a function or that a memory leak is caused by an attack. This situation most requires capturing enough information to locate, and the contents to be parsed and stored are:
typical description: BUG, Kernel panel for Out of memory
Stack information: the function call relation before and after the current memory shortage event triggering abnormity has certain reference significance, but is probably not the root cause of the problem.
Memory occupation and row arrangement: this is important information for locating such problems, and applications with high memory occupation are often applications with memory leaks or attacks.
(4) Other exceptions (trap). The three exceptions are exceptions occupying most parts of linux kernel exceptions, and some unusual exceptions, such as an abnormal instruction, a zero-exception, and the like, can be captured in this example. This case requires parsing and storing information including: description, register information, stack information.
The exception storage submodule stores exception information in the exception log file in a text file in a classified manner according to the type of the information (such as description, stack information, register information, memory use condition and the like), and can store one piece of exception information as a kernel exception record log.
A complete kernel exception log in this example is shown in fig. 6. Wherein, the memory occupation row and stack information are respectively and intensively displayed.
For a system which is continuously self-perfected, not only the fault needs to be quickly recovered, but also the accident reason needs to be effectively analyzed after the accident, a solution and a patch are pertinently found, and the same problem is prevented from appearing again in the follow-up process. In the example, the field information can be restored into a text file which is simple and easy to store and transfer; and secondly, the original sentence which retains the typical description when the typical linux kernel is abnormal is intercepted, so that an administrator can conveniently go to various open source websites or CVE publishing websites to search, and relevant patches can be timely and effectively obtained.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
Claims (10)
1. A handling method for linux kernel exception, wherein the linux kernel comprises a first operating system kernel and a second operating system kernel, the method comprising:
when the kernel of the first operating system is abnormal, capturing abnormal information and storing the abnormal information in a memory;
starting a second operating system kernel;
after the kernel of the second operating system is started successfully, extracting the abnormal information stored in the memory, and storing the extracted abnormal information into the nonvolatile memory;
and restarting the first operating system kernel, and resetting the hardware system.
2. The method for handling the linux kernel exception according to claim 1,
the capturing abnormal information is stored in the memory and comprises the following steps:
the abnormal information code recorded by the first operating system kernel is stored in a note area of the vmcore file;
the extracting the abnormal information stored in the memory comprises:
and reading the encoded abnormal information from the note area of the vmcore file and decoding.
3. The method for handling the linux kernel exception as recited in claim 2, wherein the capturing exception information comprises:
analyzing corresponding information according to different exception types and storing the information as exception information, wherein the exception types comprise: illegal memory, software deadlock, memory exhaustion, other exceptions;
when the exception type is illegal memory, software deadlock and other exceptions, the analyzed information comprises: description information, register information, and stack information;
when the exception type is memory exhaustion, the analyzed information includes: description information, stack information, and memory usage.
4. The method of handling a linux kernel exception according to claim 3,
the saving the extracted exception information to a non-volatile memory comprises:
storing the extracted abnormal information into a non-volatile memory after storing the abnormal information as a text file; in the text file, the abnormal information is classified according to the type of the information, and the information of the same type is concentrated in one area to be displayed in the text file; the categories of the information include: description information, stack information, register information, memory usage ranks.
5. A linux kernel exception handling device, the linux kernel including a first operating system kernel and a second operating system kernel, the handling device comprising:
the storage control module is used for capturing abnormal information and storing the abnormal information in the memory when the kernel of the first operating system is abnormal;
the kernel control module is used for starting a second operating system kernel;
the storage control module is further configured to extract the abnormal information stored in the memory and store the abnormal information in the nonvolatile memory after the kernel control module successfully starts the kernel of the second operating system;
the kernel control module is further configured to restart the first operating system kernel and reset the hardware system kernel after the storage control module stores the exception information in the nonvolatile memory.
6. The apparatus for handling the linux kernel exception as recited in claim 5, wherein the storage control module comprises:
the exception interpretation submodule is used for capturing exception information when the kernel of the first operating system is abnormal;
the exception unloading/recovering submodule is used for coding and storing the exception information captured by the exception interpretation submodule into a note area of the vmcore file; when the kernel control module successfully starts the kernel of the second operating system, reading the encoded abnormal information from the note area of the vmcore file and decoding;
and the exception storage submodule is used for storing the decoded exception information into the nonvolatile memory.
7. The apparatus for handling a linux core exception according to claim 6,
the exception interpretation sub-module captures exception information, including:
the exception interpretation submodule analyzes corresponding information according to different exception types and stores the information as exception information; the exception types include: illegal memory, software deadlock, memory exhaustion, other exceptions; when the exception type is illegal memory, software deadlock and other exceptions, the analyzed information comprises: description information, register information, and stack information; when the exception type is memory exhaustion, the analyzed information includes: description information, stack information, and memory usage.
8. The apparatus for handling a linux core exception according to claim 7,
the exception storage submodule stores the decoded exception information into a nonvolatile memory, and the exception storage submodule comprises the following steps:
the exception storage submodule stores the decoded exception information into a text file and then stores the text file into a nonvolatile memory; in the text file, the abnormal information is classified according to the type of the information, and the information of the same type is concentrated in one area to be displayed in the text file; the categories of the information include: description information, stack information, register information, memory usage ranks.
9. A handling device of linux kernel exception, the linux kernel comprises a first operating system kernel and a second operating system kernel, and the handling device comprises a memory and a processor; the device is characterized in that the memory is used for storing a program for handling linux kernel exception handling, and the processor is used for reading and executing the program for handling linux kernel exception handling and executing the method of any one of claims 1 to 4.
10. A storage medium, for use in a business system comprising a first operating system kernel and a second operating system kernel, wherein a program for handling linux kernel exception handling is stored in the storage medium, and the program is configured to perform the method according to any one of claims 1 to 4 when executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110078299.2A CN112395137B (en) | 2021-01-21 | 2021-01-21 | Linux kernel exception processing method, equipment and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110078299.2A CN112395137B (en) | 2021-01-21 | 2021-01-21 | Linux kernel exception processing method, equipment and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112395137A true CN112395137A (en) | 2021-02-23 |
CN112395137B CN112395137B (en) | 2021-11-09 |
Family
ID=74624961
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110078299.2A Active CN112395137B (en) | 2021-01-21 | 2021-01-21 | Linux kernel exception processing method, equipment and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112395137B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114115025A (en) * | 2021-11-24 | 2022-03-01 | 国汽智控(北京)科技有限公司 | Fault information saving method, device and equipment based on automatic driving system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102819466A (en) * | 2012-06-29 | 2012-12-12 | 华为技术有限公司 | Method and device for processing operating system exceptions |
CN102929747A (en) * | 2012-11-05 | 2013-02-13 | 中标软件有限公司 | Method for treating crash dump of Linux operation system based on loongson server |
CN104063477A (en) * | 2014-06-30 | 2014-09-24 | 广东威创视讯科技股份有限公司 | Processing method and processing device for startup abnormalities of embedded system |
CN105426293A (en) * | 2015-10-29 | 2016-03-23 | 汉柏科技有限公司 | Method and system for recording kernel exception stack and vmcore file |
CN108121630A (en) * | 2016-11-29 | 2018-06-05 | 株式会社理光 | Electronic device, method for restarting and recording medium |
-
2021
- 2021-01-21 CN CN202110078299.2A patent/CN112395137B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102819466A (en) * | 2012-06-29 | 2012-12-12 | 华为技术有限公司 | Method and device for processing operating system exceptions |
CN102929747A (en) * | 2012-11-05 | 2013-02-13 | 中标软件有限公司 | Method for treating crash dump of Linux operation system based on loongson server |
CN104063477A (en) * | 2014-06-30 | 2014-09-24 | 广东威创视讯科技股份有限公司 | Processing method and processing device for startup abnormalities of embedded system |
CN105426293A (en) * | 2015-10-29 | 2016-03-23 | 汉柏科技有限公司 | Method and system for recording kernel exception stack and vmcore file |
CN108121630A (en) * | 2016-11-29 | 2018-06-05 | 株式会社理光 | Electronic device, method for restarting and recording medium |
Non-Patent Citations (2)
Title |
---|
LINUX中国: ""使用 Kdump 检查 Linux 内核崩溃"", 《HTTPS://ZHUANLAN.ZHIHU.COM/P/28249684》 * |
YG@HUNTER: ""crash分析linux内核崩溃转储文件vmcore"", 《HTTPS://BLOG.CSDN.NET/WEIXIN_42915431/ARTICLE/DETAILS/105666507》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114115025A (en) * | 2021-11-24 | 2022-03-01 | 国汽智控(北京)科技有限公司 | Fault information saving method, device and equipment based on automatic driving system |
CN114115025B (en) * | 2021-11-24 | 2024-05-28 | 国汽智控(北京)科技有限公司 | Method, device and equipment for storing fault information based on automatic driving system |
Also Published As
Publication number | Publication date |
---|---|
CN112395137B (en) | 2021-11-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3121726B1 (en) | Fault processing method, related device and computer | |
US8140907B2 (en) | Accelerated virtual environments deployment troubleshooting based on two level file system signature | |
US20180074884A1 (en) | Information Handling System Pre-Boot Fault Management | |
KR101143679B1 (en) | Automated firmware recovery | |
US11157349B2 (en) | Systems and methods for pre-boot BIOS healing of platform issues from operating system stop error code crashes | |
CN104734979A (en) | Control method for storage device externally connected with router | |
KR101649909B1 (en) | Method and apparatus for virtual machine vulnerability analysis and recovery | |
CN112395137B (en) | Linux kernel exception processing method, equipment and device | |
CN102541739B (en) | The method of testing of (SuSE) Linux OS and device | |
US10514972B2 (en) | Embedding forensic and triage data in memory dumps | |
CN101145983B (en) | A self-diagnosis and self-discovery subsystem and method of network management system | |
CN111090546B (en) | Method, device and equipment for restarting operating system and readable storage medium | |
CN101677276B (en) | Method, device and system of managing execution environment (EE) | |
CN114579971A (en) | Starting method of safety control module and related device | |
CN109086085B (en) | Operating system start management method and device | |
CN115599310B (en) | Method and device for controlling storage resources in storage node and storage node | |
CN114116330B (en) | Server performance testing method, system, terminal and storage medium | |
CN112579330A (en) | Method, device and equipment for processing abnormal data of operating system | |
WO2012097761A1 (en) | Recovery method for communication exceptions between data card and host, and data card | |
CN114546420A (en) | Software remote installation protection uninstalling method | |
CN104516791A (en) | Data processing method and device and electronic device | |
CN114328379A (en) | Snapshot creating method, device and equipment and readable storage medium | |
CN117294463A (en) | BIOS information safety protection checking system | |
CN118193270A (en) | Program protection method, device, computer equipment and storage medium | |
CN114547615A (en) | System, method and device for carrying out antivirus and system repair on vehicle-mounted information system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |