CN112395137A - Linux kernel exception processing method, equipment and device - Google Patents

Linux kernel exception processing method, equipment and device Download PDF

Info

Publication number
CN112395137A
CN112395137A CN202110078299.2A CN202110078299A CN112395137A CN 112395137 A CN112395137 A CN 112395137A CN 202110078299 A CN202110078299 A CN 202110078299A CN 112395137 A CN112395137 A CN 112395137A
Authority
CN
China
Prior art keywords
information
exception
kernel
memory
operating system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110078299.2A
Other languages
Chinese (zh)
Other versions
CN112395137B (en
Inventor
印朝晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Taiyi Xingchen Information Technology Co ltd
Original Assignee
Beijing Taiyi Xingchen Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Taiyi Xingchen Information Technology Co ltd filed Critical Beijing Taiyi Xingchen Information Technology Co ltd
Priority to CN202110078299.2A priority Critical patent/CN112395137B/en
Publication of CN112395137A publication Critical patent/CN112395137A/en
Application granted granted Critical
Publication of CN112395137B publication Critical patent/CN112395137B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1441Resetting or repowering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1479Generic software techniques for error detection or fault masking
    • G06F11/1482Generic software techniques for error detection or fault masking by means of middleware or OS functionality

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A handling method, equipment and a device for linux kernel exception are provided, wherein the linux kernel comprises a first operating system kernel and a second operating system kernel, and the handling method comprises the following steps: when the kernel of the first operating system is abnormal, capturing abnormal information and storing the abnormal information in a memory; starting a second operating system kernel; after the kernel of the second operating system is started successfully, extracting the abnormal information stored in the memory, and storing the extracted abnormal information into the nonvolatile memory; and restarting the first operating system kernel, and resetting the hardware system. By the scheme, the linux kernel can be prevented from being suspended due to the exception, and kernel exception information can be recorded.

Description

Linux kernel exception processing method, equipment and device
Technical Field
The present disclosure relates to computer technologies, and in particular, to a method, an apparatus, and a device for handling linux kernel exceptions.
Background
With the rapid development of network technology, the security threat of the network is getting bigger and bigger, and the application of security products in the network is getting wider and wider. Through the information security construction for many years, China obtains certain achievements in the aspects of virus prevention, network and boundary security, but the environmental security construction of network security products for storing and processing data is not paid enough attention, which is the most important information security and the last line of defense. Some hackers attack the kernel defect of the network security product to cause the stack exception of the operating system, so that the network security product cannot normally run, and the connectivity and the security of the whole network are affected. Because the linux kernel cannot log and track such stacks, locating exceptions is very difficult.
Because network security products have very strict standards for their own security, how to solve the recording and automatic reset when the linux kernel is abnormal is an important technical problem for security manufacturers.
Disclosure of Invention
The application provides a handling method, equipment and device for linux kernel exception, which can prevent the linux kernel from being suspended due to exception and can record kernel exception information.
The present disclosure provides a method for processing an exception of a linux kernel, where the linux kernel includes a first operating system kernel and a second operating system kernel, and the method includes:
when the kernel of the first operating system is abnormal, capturing abnormal information and storing the abnormal information in a memory;
starting a second operating system kernel;
after the kernel of the second operating system is started successfully, extracting the abnormal information stored in the memory, and storing the extracted abnormal information into the nonvolatile memory;
and restarting the first operating system kernel, and resetting the hardware system kernel.
In an exemplary embodiment, before the booting the second operating system kernel, the method further includes:
reserving a storage space with a preset size in a memory as a reserved memory when the first operating system kernel is initialized; after the first operating system kernel is started, the image file of the second operating system kernel is imported into the reserved memory;
the launching a second operating system kernel comprises:
and jumping to the second operating system kernel in the reserved memory, and running the second operating system kernel.
In an exemplary embodiment, the storing the captured exception information in the memory includes:
the abnormal information code recorded by the first operating system kernel is stored in a note area of the vmcore file;
the extracting the abnormal information stored in the memory comprises:
and reading the encoded abnormal information from the note area of the vmcore file and decoding.
In an exemplary embodiment, the capturing exception information includes:
analyzing corresponding information according to different exception types and storing the information as exception information, wherein the exception types comprise: illegal memory, software deadlock, memory exhaustion, other exceptions;
when the exception type is illegal memory, software deadlock and other exceptions, the analyzed information comprises: description information, register information, and stack information;
when the exception type is memory exhaustion, the analyzed information includes: description information, stack information, and memory usage.
In an exemplary embodiment, the saving the extracted exception information to non-volatile memory includes:
storing the extracted abnormal information into a non-volatile memory after storing the abnormal information as a text file;
in the text file, the abnormal information is classified according to the type of the information, and the information of the same type is concentrated in one area to be displayed in the text file; the categories of the information include: description information, stack information, register information, memory usage ranks.
The present disclosure also provides a device for handling an exception of a linux kernel, where the linux kernel includes a first operating system kernel and a second operating system kernel; the processing apparatus includes:
the storage control module is used for capturing abnormal information and storing the abnormal information in the memory when the kernel of the first operating system is abnormal;
the kernel control module is used for starting a second operating system kernel;
the storage control module is also used for extracting the abnormal information stored in the memory and storing the abnormal information into the nonvolatile memory after the kernel control module successfully starts the kernel of the second operating system;
the kernel control module is further configured to restart the first operating system kernel and reset the hardware system kernel after the storage control module stores the exception information in the nonvolatile memory.
In an exemplary embodiment, the kernel control module is further configured to reserve a storage space with a preset size in a memory as a reserved memory when the first operating system kernel is initialized; when the first operating system kernel is started, the image file of the second operating system kernel is imported into the reserved memory;
the kernel control module starts a second operating system kernel and comprises the following steps:
and the kernel control module jumps to the second operating system kernel in the reserved memory to run the second operating system kernel.
In an exemplary embodiment, the storage control module includes:
the exception interpretation submodule is used for capturing exception information when the kernel of the first operating system is abnormal;
the exception unloading/recovering submodule is used for coding and storing the exception information captured by the exception interpretation submodule into a note area of the vmcore file; when the kernel control module successfully starts the kernel of the second operating system, reading the encoded abnormal information from the note area of the vmcore file and decoding;
and the exception storage submodule is used for storing the decoded exception information into the nonvolatile memory.
In an exemplary embodiment, the exception interpretation sub-module capturing exception information includes:
the exception interpretation submodule analyzes corresponding information according to different exception types and stores the information as exception information; the exception types include: illegal memory, software deadlock, memory exhaustion, other exceptions; when the exception type is illegal memory, software deadlock and other exceptions, the analyzed information comprises: description information, register information, and stack information; when the exception type is memory exhaustion, the analyzed information includes: description information, stack information, and memory usage.
In an exemplary embodiment, the saving, by the exception storage submodule, the decoded exception information to the nonvolatile memory includes:
the exception storage submodule stores the decoded exception information into a text file and then stores the text file into a nonvolatile memory; in the text file, the abnormal information is classified according to the type of the information, and the information of the same type is concentrated in one area to be displayed in the text file; the categories of the information include: description information, stack information, register information, memory usage ranks.
The present disclosure also provides a device for handling linux kernel exception, where the linux kernel includes a first operating system kernel and a second operating system kernel, and the device includes a memory and a processor; the memory is used for storing a program for handling linux kernel exception handling, and the processor is used for reading and executing the program for handling linux kernel exception handling and executing the method of any one of the above embodiments.
The present disclosure also provides a storage medium, which is applied to a service system including a first operating system kernel and a second operating system kernel, where a program for handling linux kernel exception is stored in the storage medium, and the program is configured to execute the method in any one of the foregoing embodiments when running.
Compared with the prior art, the method mainly achieves the purposes of recording the kernel abnormal information into the storage device and automatically resetting the system through the double-kernel switching method, so that the safety device can be automatically recovered from the kernel abnormal information under the unattended condition, the related abnormal information is recorded, the subsequent positioning analysis of an administrator is facilitated, the service connection is rapidly recovered, and the long-time service interruption is avoided.
Other aspects will be apparent upon reading and understanding the attached drawings and detailed description.
Drawings
FIG. 1 is a flowchart of a handling method of linux kernel exceptions according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a device for handling linux kernel exceptions according to an embodiment of the present application;
FIG. 3 is a diagram illustrating connections between sub-modules in a processing device and various components in the business system in some example embodiments;
FIG. 4 is a schematic diagram of the operation of a dual-core architecture submodule in some example embodiments;
FIG. 5 is a schematic diagram illustrating the operation of an exception dump/restore submodule in some example embodiments;
FIG. 6 is a schematic diagram of a kernel exception log in some example embodiments.
Detailed Description
The present application describes embodiments, but the description is illustrative rather than limiting and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the embodiments described herein. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with or instead of any other feature or element in any other embodiment, unless expressly limited otherwise.
The present application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features and elements disclosed in this application may also be combined with any conventional features or elements to form a unique inventive concept as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive aspects to form yet another unique inventive aspect, as defined by the claims. Thus, it should be understood that any of the features shown and/or discussed in this application may be implemented alone or in any suitable combination. Accordingly, the embodiments are not limited except as by the appended claims and their equivalents. Furthermore, various modifications and changes may be made within the scope of the appended claims.
Further, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other orders of steps are possible as will be understood by those of ordinary skill in the art. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Further, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present application.
Aiming at the attack of linux kernel defects, the system of the network security product is often restarted and even suspended to cause serious abnormity. If the kernel is restarted, the abnormal stack information of the kernel is usually printed on a console (a serial port, a display and the like), and if the console is not connected in advance, the printed abnormal stack information of the kernel disappears, so that maintenance personnel cannot find and acquire the abnormal stack information in time and cannot analyze the reason of the abnormality. If the kernel is abnormal, the system is suspended, the whole equipment cannot operate, and the system cannot be automatically recovered, so that a serious accident of network failure is caused.
The existing linux system has two processing modes when an exception occurs:
one processing mode is to enter another system by using a kdump mechanism of a linux kernel, wherein kdump is a kernel crash dump mechanism, if the kernel uses the kdump mechanism, when the system crashes, the kdump mechanism starts a second kernel, at the moment, the system stops in the second kdump kernel, waits for an administrator to manually analyze and process system crash information (vmcore file), and restarts and restores the whole system; however, manually analyzing the exception requires analyzing the assembly code in the vmcore file using various linux tools, finding the assertion location, which has a very high requirement on the technical level of the administrator, and it is often difficult to find the corresponding CVE (Common vunneabilities & explores, general vulnerability disclosure). And for the bugs which are not published, the cause is more difficult to analyze. And manual recovery system can cause long network breaking time, and often causes great loss to some service scenes with high real-time requirement.
The other processing mode is that no processing is carried out and the robustness design of the linux kernel is relied on. At this time, the system may be in a down state and wait for manual reset, which also causes the problem that the service cannot be recovered for a long time; or the recovery service is automatically restarted, so that although the service can be recovered in a short time, the error site is covered at the moment, the cause of the problem cannot be located, and the abnormal cause cannot be traced.
The embodiment of the application provides a handling method of linux kernel exception, wherein the linux kernel comprises a first operating system kernel and a second operating system kernel; the processing method is shown in FIG. 1 and comprises steps S110-S140:
s110, capturing abnormal information and storing the abnormal information in a memory when the kernel of the first operating system is abnormal;
s120, starting a second operating system kernel;
s130, after the kernel of the second operating system is started successfully, extracting the abnormal information stored in the memory, and storing the extracted abnormal information into the nonvolatile memory;
s140, restarting the first operating system kernel, and resetting the hardware system kernel.
The method and the device can complete the system function of the linux double-kernel, and are designed into an operation mechanism of double-kernel cooperative work when the kernel is abnormal. The meaning of dual kernels is that the system contains two kernels:
1) the service kernel, i.e., the first operating system kernel above, is the resident operating system kernel of the user service scenario. This core is always running without a core exception.
2) The rescue kernel, i.e. the above second operating system kernel, is an operating system kernel responsible for recording abnormal information when the service kernel is abnormal. It only runs briefly and resets itself when a service kernel exception occurs.
The reason why the service kernel cannot record the abnormal information by itself is that, often when a fatal abnormality occurs, the kernel system is already collapsed (crash), and at this time, the service kernel cannot normally access the peripheral devices such as the hard disk and the like. Therefore, another kernel system (i.e., a rescue kernel) which normally operates is required to complete the functions of recording and resetting the abnormal information.
The linux native kdump technology can enter a kdump kernel system when some kernel exceptions occur in a main service system, but the kdump system enters a silent waiting state, so that the operation of the service system cannot be automatically recovered.
The Linux native kdump technique can only be triggered under a very limited number of kernel exceptions. The embodiment of the application can trigger the double-kernel system to take effect when almost all the kernels are abnormal, and a series of related operations are automatically completed.
The embodiment of the application can trigger the equipment to restart and recover to a normal service system under almost all abnormal conditions of the kernel, so that the shortest service interruption time is ensured; and in almost all kernel abnormal conditions, the abnormal information can be guaranteed to be recorded in a nonvolatile storage device (such as but not limited to a CF card or a hard disk lamp), and the abnormal information can be checked at any time after the service is recovered.
The embodiment of the application uses a double-kernel system and a high-risk service system which is positioned at an internet outlet and under a large number of high-frequency network attacks. Aiming at the attack of the kernel, which is the bottommost layer and has the largest destructive power to the system, the embodiment of the application can quickly recover the service continuity under the damage, avoid long-time service interruption and keep abnormal information.
In some exemplary embodiments, before the starting the second operating system kernel, the method further includes:
reserving a storage space with a preset size in a memory as a reserved memory when the first operating system kernel is initialized; after the first operating system kernel is started, the image file of the second operating system kernel is imported into the reserved memory;
the launching a second operating system kernel comprises:
and jumping to the second operating system kernel in the reserved memory, and running the second operating system kernel.
In other embodiments, the reservation of the memory and the saving of the second os kernel image file at other time nodes in other manners are not excluded, and the present application does not limit this.
In some exemplary embodiments, the storing the capture exception information in the memory includes:
the abnormal information code recorded by the first operating system kernel is stored in a note area of the vmcore file;
the extracting the abnormal information stored in the memory comprises:
and reading the encoded abnormal information from the note area of the vmcore file and decoding.
Information exchange between the dual-kernel systems and extraction of core valid exception information are important key points of the embodiment. Since the system is faced with a scenario that a fatal abnormality occurs and various operating system analysis tools have failed, the method for storing the field information most stably is realized by an effective information loading manner, namely a note area of the vmcore file.
In other embodiments, the exception information is not transmitted between the two cores in other manners, which is not limited in this application.
In some exemplary embodiments, the capturing of the anomaly information includes:
analyzing corresponding information according to different exception types and storing the information as exception information; the exception types include: illegal memory, software deadlock, memory exhaustion, other exceptions; when the exception type is illegal memory, software deadlock and other exceptions, the analyzed information comprises: description information, register information, and stack information; when the exception type is memory exhaustion, the analyzed information includes: description information, stack information, and memory usage.
In this embodiment, four types of exception information, i.e., illegal memory, software deadlock, memory exhaustion, and other exceptions, may be automatically captured by means of a kernel patch.
In this embodiment, by designing the function of recording the exception information, the function is called at each native system exception point in the first operating system kernel to record various types of exception information.
The embodiment can record different abnormal information aiming at various errors of the kernel. The exception information includes stack information that is most easily understood by IT personnel so that the cause of the exception can be understood without secondary analysis.
In other embodiments, which information is to be recorded as the exception information when different exception types are designed according to needs and conditions of the service system. The present application does not limit the type of the abnormality, the type of the analyzed and stored information, or the like.
In some exemplary embodiments, said saving the extracted exception information to non-volatile memory comprises:
storing the extracted abnormal information into a non-volatile memory after storing the abnormal information as a text file; in the text file, the abnormal information is classified according to the type of the information, and the information of the same type is concentrated in one area to be displayed in the text file; the categories of the information include: description information, stack information, register information, memory usage ranks.
The embodiment can be used for combing, classifying and displaying the abnormal information when the kernel is abnormal, and is convenient for subsequent analysis. In this embodiment, the exception information may be displayed in the form of a log, and sorted from near to far according to the exception occurrence time. Each exception forms a complete description, an example format of which is shown in fig. 6, and contains the following information: occurrence time, description information, detailed exception information (such as stack information, register information, memory occupation ranking, and the like), and record end identification.
After the linux native kdump technology is triggered, abnormal information cannot be automatically displayed, classified and stored, and the size of the generated vmcore file containing fault information is large like a physical memory, that is, a server with a 4-gigabyte physical memory, and the vmcore file generated by a single fault also has 4 gigabytes, so that the export offline analysis operation is difficult to perform. The analysis can only be performed locally, and the analysis process consumes manpower, resulting in long-time paralysis of the business system. The embodiment pertinently refines and stores the fault information, and can analyze the fault information after fault recovery. And a record generated by a single fault is only hundreds of bytes, so that the derivation and analysis are very convenient.
In other embodiments, the type of information and the format during storage may be designed according to needs and conditions of the service system, which are not limited in this application.
The embodiment of the application provides a device for processing exception of a linux kernel, wherein the linux kernel comprises a first operating system kernel and a second operating system kernel; as shown in fig. 2, the processing apparatus includes:
the storage control module is used for capturing abnormal information and storing the abnormal information in the memory when the kernel of the first operating system is abnormal;
the kernel control module is used for starting a second operating system kernel;
the storage control module is also used for extracting the abnormal information stored in the memory and storing the abnormal information into the nonvolatile memory after the kernel control module successfully starts the kernel of the second operating system;
the kernel control module is further configured to restart the first operating system kernel and reset the hardware system kernel after the storage control module stores the exception information in the nonvolatile memory.
In some exemplary embodiments, the kernel control module is further configured to reserve a storage space with a preset size in the memory as a reserved memory when the first operating system kernel is initialized; when the first operating system kernel is started, the image file of the second operating system kernel is imported into the reserved memory;
the kernel control module starts a second operating system kernel and comprises the following steps:
and the kernel control module jumps to the second operating system kernel in the reserved memory to run the second operating system kernel.
In other embodiments, the reservation of the memory and the saving of the second os kernel image file at other time nodes in other manners are not excluded, and the present application does not limit this.
In some exemplary embodiments, the storage control module comprises:
the exception interpretation submodule is used for capturing exception information when the kernel of the first operating system is abnormal;
the exception unloading/recovering submodule is used for coding and storing the exception information captured by the exception interpretation submodule into a note area of the vmcore file; when the kernel control module successfully starts the kernel of the second operating system, reading the encoded abnormal information from the note area of the vmcore file and decoding;
and the exception storage submodule is used for storing the decoded exception information into the nonvolatile memory.
In other embodiments, the storage control module may be implemented by adopting other sub-module division modes, which is not limited in this application.
In other embodiments, the exception information is not transmitted between the two cores in other manners, which is not limited in this application.
In some exemplary embodiments, the exception interpretation sub-module capturing exception information comprises:
the exception interpretation submodule analyzes corresponding information according to different exception types and stores the information as exception information; the exception types include: illegal memory, software deadlock, memory exhaustion, other exceptions; when the exception type is illegal memory, software deadlock and other exceptions, the analyzed information comprises: description information, register information, and stack information; when the exception type is memory exhaustion, the analyzed information includes: description information, stack information, and memory usage.
In other embodiments, which information is to be recorded as the exception information when different exception types are designed according to needs and conditions of the service system. The present application does not limit the type of the abnormality, the type of the analyzed and stored information, or the like.
In some exemplary embodiments, the saving, by the exception storage submodule, the decoded exception information to the nonvolatile memory includes:
the exception storage submodule stores the decoded exception information into a text file and then stores the text file into a nonvolatile memory; in the text file, the abnormal information is classified according to the type of the information, and the information of the same type is concentrated in one area to be displayed in the text file; the categories of the information include: description information, stack information, register information, memory usage ranks.
In other embodiments, the type of information and the format during storage may be designed according to needs and conditions of the service system, which are not limited in this application.
The processing device is described below by using an example, and is applied to a business system comprising a business core, a rescue core, a volatile storage device (such as a memory) and a nonvolatile storage; the connection relationship between the sub-modules in the processing apparatus and each part in the business system is shown in fig. 3, and includes the following sub-modules:
the dual-kernel architecture submodule (namely the kernel control module) is responsible for realizing the framework submodule for skipping from the service kernel to the rescue kernel when an exception occurs and restarting and recovering to the service kernel in the rescue kernel.
And the abnormal unloading/recovering submodule is responsible for unloading the abnormal information in the service kernel and recovering the abnormal information in the rescue kernel.
And the exception interpretation submodule and the exception storage submodule are respectively responsible for analyzing and storing the exception information in the service kernel and classifying and storing the exception information in the rescue kernel.
As shown in fig. 3, the whole business system operates as follows: the dual-kernel framework submodule is responsible for scheduling the import, the start, the reset and the reset of the two kernels. The running chain of the whole system can be ensured to normally complete the whole abnormal information recording and self-recovery process; the exception unloading/recovering submodule completes the transmission and recovery of exception information between the two kernels; and the exception interpretation/storage submodule completes the specific information extraction and classified storage of each kernel exception.
The working principle of the dual-core architecture submodule in this example is described below, as shown in fig. 4, the dual-core architecture submodule is responsible for the operation of the whole business system, and under normal conditions, the system will operate in the business core, and after an abnormality occurs, the system will enter the rescue core to capture and record the abnormal information, and then restart the system.
The key technical points of the submodule with the double-kernel architecture are as follows:
when a service kernel is started and initialized, a certain memory space is forcibly reserved for guiding in a rescue kernel.
The implementation mode is as follows: and the kernel codes modify kernel cmdlene parameters to realize related functions.
Implementation function/tool: and (2) modifying the linux starting memory reservation parameter by using the void tb _ adjust _ x86_64_ cmdline.
And secondly, after the service kernel is started, importing the image file of the rescue kernel into the reserved memory.
The implementation mode is as follows: adding related operation in the starting configuration file;
implementation function/tool: kexec;
the method comprises the following specific operations: kexec-p/mnt/boot/cap _ kernel.img-initrd =/mnt/boot/cap _ rootfs.img;
and skipping to the rescue kernel stored in the reserved memory after the business kernel encounters an abnormality, and operating the rescue kernel.
The implementation mode is as follows: the Kexec jump function is called by kernel coding;
implementation function/tool: void creat kexec (struct pt _ regs);
and fourthly, restarting the system after the rescue kernel finishes recording the abnormal information.
The implementation mode is as follows: adding related operations in a boot profile
Implementation function/tool: reboot (R)
The method comprises the following specific operations: reboot-f
The working principle of the exception dump/restore submodule in this example is explained below, as shown in fig. 5.
Since the service kernel and the rescue kernel are two independent operating systems, when the service system is switched from the service kernel to the rescue kernel, on one hand, the service kernel is already in panic (panic) and cannot access the hard disk, and on the other hand, the storage data of the whole memory space is volatile, so that how to transfer the abnormal information is a key technical point of the sub-module.
The linux operating system maintains a vmcore file during running, and the vmcore file is provided for an administrator in the kdump mechanism, is used for analyzing the reason of the abnormality, and can be transmitted between the two kernels. In this example, if an exception is encountered by the service kernel, exception information to be recorded is encoded and stored in a note area of the vmcore file. And after entering the rescue kernel, reading and decoding the information stored in the note area to obtain a corresponding abnormal log.
The key technical points of the exception unloading/recovering submodule are as follows:
firstly, coding and inserting an abnormal record into a vmcore file by a service kernel;
the implementation mode is as follows: the kernel code realizes related functions;
implementation function/tool:
the void tb _ print _ kdump (constchar × fmt.) encodes and formats the exception information;
the void tb _ appended _ exclusion _ to _ vmcore _ note (void) inserts the exception information into the note area of the vmcore file;
secondly, the rescue kernel reads abnormal information from the vmcore file;
the implementation mode is as follows: developing/using tools to implement the relevant functions;
implementation function/tool:
readelf file information extraction tool
vmcore _ note _ part exception information decoding tool
The method comprises the following specific operations:
readelf -n /proc/vmcore > /tmp/vmcore_note_info
reading out abnormal information of a note area in the vmcore file, and storing the abnormal information into a temporary file;
vmcore_note_parse /tmp/vmcore_note_info /mnt/boot/exception.txt
and decoding the abnormal information in the temporary file and recording the abnormal information in the abnormal log file.
The exception unloading/recovering submodule inserts the exception information code captured and stored by the exception interpretation submodule into the vmcore file, and sends the recorded exception log file to the exception storage submodule to be stored in the nonvolatile memory. The following explains the working principle of the exception interpretation submodule and the exception storage submodule in this example. The module mainly deeply excavates the reasons of each kernel exception, combs the trigger points of the kernel exception in the kernel code, and locates the information needed by the exceptions.
The exceptions supported by the exception interpretation submodule and capable of being captured comprise:
(1) illegal memory (bad _ area): meaning that a function in the kernel accesses an illegal address. The contents to be analyzed and stored in this case are:
typical description: unable to handle kernel pointer dereference at xxx
Register information: various variables and register values of the operation site at the moment;
stack information: function call relations before and after the abnormal function.
(2) Software deadlock (soft lockup): meaning that a function in the kernel has an endless loop. This problem is a problem that the damage to the user service system is the greatest, and once it occurs, the service system will be paralyzed and cannot be recovered. The contents to be analyzed and stored in this case are:
typical description: soft lockup-CPU #0 stuck or xxxs!
Register information: various variables and register values of the operation site at the moment;
stack information: function call relations before and after the function of the dead loop occur.
(3) Memory exhaustion (OOM): refers to the exhaustion of the memory of the entire operating system. In this case, the situation is complicated, and there is a possibility that the system memory is insufficient. It is also possible that a memory leak exists in a function or that a memory leak is caused by an attack. This situation most requires capturing enough information to locate, and the contents to be parsed and stored are:
typical description: BUG, Kernel panel for Out of memory
Stack information: the function call relation before and after the current memory shortage event triggering abnormity has certain reference significance, but is probably not the root cause of the problem.
Memory occupation and row arrangement: this is important information for locating such problems, and applications with high memory occupation are often applications with memory leaks or attacks.
(4) Other exceptions (trap). The three exceptions are exceptions occupying most parts of linux kernel exceptions, and some unusual exceptions, such as an abnormal instruction, a zero-exception, and the like, can be captured in this example. This case requires parsing and storing information including: description, register information, stack information.
The exception storage submodule stores exception information in the exception log file in a text file in a classified manner according to the type of the information (such as description, stack information, register information, memory use condition and the like), and can store one piece of exception information as a kernel exception record log.
A complete kernel exception log in this example is shown in fig. 6. Wherein, the memory occupation row and stack information are respectively and intensively displayed.
For a system which is continuously self-perfected, not only the fault needs to be quickly recovered, but also the accident reason needs to be effectively analyzed after the accident, a solution and a patch are pertinently found, and the same problem is prevented from appearing again in the follow-up process. In the example, the field information can be restored into a text file which is simple and easy to store and transfer; and secondly, the original sentence which retains the typical description when the typical linux kernel is abnormal is intercepted, so that an administrator can conveniently go to various open source websites or CVE publishing websites to search, and relevant patches can be timely and effectively obtained.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Claims (10)

1. A handling method for linux kernel exception, wherein the linux kernel comprises a first operating system kernel and a second operating system kernel, the method comprising:
when the kernel of the first operating system is abnormal, capturing abnormal information and storing the abnormal information in a memory;
starting a second operating system kernel;
after the kernel of the second operating system is started successfully, extracting the abnormal information stored in the memory, and storing the extracted abnormal information into the nonvolatile memory;
and restarting the first operating system kernel, and resetting the hardware system.
2. The method for handling the linux kernel exception according to claim 1,
the capturing abnormal information is stored in the memory and comprises the following steps:
the abnormal information code recorded by the first operating system kernel is stored in a note area of the vmcore file;
the extracting the abnormal information stored in the memory comprises:
and reading the encoded abnormal information from the note area of the vmcore file and decoding.
3. The method for handling the linux kernel exception as recited in claim 2, wherein the capturing exception information comprises:
analyzing corresponding information according to different exception types and storing the information as exception information, wherein the exception types comprise: illegal memory, software deadlock, memory exhaustion, other exceptions;
when the exception type is illegal memory, software deadlock and other exceptions, the analyzed information comprises: description information, register information, and stack information;
when the exception type is memory exhaustion, the analyzed information includes: description information, stack information, and memory usage.
4. The method of handling a linux kernel exception according to claim 3,
the saving the extracted exception information to a non-volatile memory comprises:
storing the extracted abnormal information into a non-volatile memory after storing the abnormal information as a text file; in the text file, the abnormal information is classified according to the type of the information, and the information of the same type is concentrated in one area to be displayed in the text file; the categories of the information include: description information, stack information, register information, memory usage ranks.
5. A linux kernel exception handling device, the linux kernel including a first operating system kernel and a second operating system kernel, the handling device comprising:
the storage control module is used for capturing abnormal information and storing the abnormal information in the memory when the kernel of the first operating system is abnormal;
the kernel control module is used for starting a second operating system kernel;
the storage control module is further configured to extract the abnormal information stored in the memory and store the abnormal information in the nonvolatile memory after the kernel control module successfully starts the kernel of the second operating system;
the kernel control module is further configured to restart the first operating system kernel and reset the hardware system kernel after the storage control module stores the exception information in the nonvolatile memory.
6. The apparatus for handling the linux kernel exception as recited in claim 5, wherein the storage control module comprises:
the exception interpretation submodule is used for capturing exception information when the kernel of the first operating system is abnormal;
the exception unloading/recovering submodule is used for coding and storing the exception information captured by the exception interpretation submodule into a note area of the vmcore file; when the kernel control module successfully starts the kernel of the second operating system, reading the encoded abnormal information from the note area of the vmcore file and decoding;
and the exception storage submodule is used for storing the decoded exception information into the nonvolatile memory.
7. The apparatus for handling a linux core exception according to claim 6,
the exception interpretation sub-module captures exception information, including:
the exception interpretation submodule analyzes corresponding information according to different exception types and stores the information as exception information; the exception types include: illegal memory, software deadlock, memory exhaustion, other exceptions; when the exception type is illegal memory, software deadlock and other exceptions, the analyzed information comprises: description information, register information, and stack information; when the exception type is memory exhaustion, the analyzed information includes: description information, stack information, and memory usage.
8. The apparatus for handling a linux core exception according to claim 7,
the exception storage submodule stores the decoded exception information into a nonvolatile memory, and the exception storage submodule comprises the following steps:
the exception storage submodule stores the decoded exception information into a text file and then stores the text file into a nonvolatile memory; in the text file, the abnormal information is classified according to the type of the information, and the information of the same type is concentrated in one area to be displayed in the text file; the categories of the information include: description information, stack information, register information, memory usage ranks.
9. A handling device of linux kernel exception, the linux kernel comprises a first operating system kernel and a second operating system kernel, and the handling device comprises a memory and a processor; the device is characterized in that the memory is used for storing a program for handling linux kernel exception handling, and the processor is used for reading and executing the program for handling linux kernel exception handling and executing the method of any one of claims 1 to 4.
10. A storage medium, for use in a business system comprising a first operating system kernel and a second operating system kernel, wherein a program for handling linux kernel exception handling is stored in the storage medium, and the program is configured to perform the method according to any one of claims 1 to 4 when executed.
CN202110078299.2A 2021-01-21 2021-01-21 Linux kernel exception processing method, equipment and device Active CN112395137B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110078299.2A CN112395137B (en) 2021-01-21 2021-01-21 Linux kernel exception processing method, equipment and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110078299.2A CN112395137B (en) 2021-01-21 2021-01-21 Linux kernel exception processing method, equipment and device

Publications (2)

Publication Number Publication Date
CN112395137A true CN112395137A (en) 2021-02-23
CN112395137B CN112395137B (en) 2021-11-09

Family

ID=74624961

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110078299.2A Active CN112395137B (en) 2021-01-21 2021-01-21 Linux kernel exception processing method, equipment and device

Country Status (1)

Country Link
CN (1) CN112395137B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114115025A (en) * 2021-11-24 2022-03-01 国汽智控(北京)科技有限公司 Fault information saving method, device and equipment based on automatic driving system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819466A (en) * 2012-06-29 2012-12-12 华为技术有限公司 Method and device for processing operating system exceptions
CN102929747A (en) * 2012-11-05 2013-02-13 中标软件有限公司 Method for treating crash dump of Linux operation system based on loongson server
CN104063477A (en) * 2014-06-30 2014-09-24 广东威创视讯科技股份有限公司 Processing method and processing device for startup abnormalities of embedded system
CN105426293A (en) * 2015-10-29 2016-03-23 汉柏科技有限公司 Method and system for recording kernel exception stack and vmcore file
CN108121630A (en) * 2016-11-29 2018-06-05 株式会社理光 Electronic device, method for restarting and recording medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819466A (en) * 2012-06-29 2012-12-12 华为技术有限公司 Method and device for processing operating system exceptions
CN102929747A (en) * 2012-11-05 2013-02-13 中标软件有限公司 Method for treating crash dump of Linux operation system based on loongson server
CN104063477A (en) * 2014-06-30 2014-09-24 广东威创视讯科技股份有限公司 Processing method and processing device for startup abnormalities of embedded system
CN105426293A (en) * 2015-10-29 2016-03-23 汉柏科技有限公司 Method and system for recording kernel exception stack and vmcore file
CN108121630A (en) * 2016-11-29 2018-06-05 株式会社理光 Electronic device, method for restarting and recording medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LINUX中国: ""使用 Kdump 检查 Linux 内核崩溃"", 《HTTPS://ZHUANLAN.ZHIHU.COM/P/28249684》 *
YG@HUNTER: ""crash分析linux内核崩溃转储文件vmcore"", 《HTTPS://BLOG.CSDN.NET/WEIXIN_42915431/ARTICLE/DETAILS/105666507》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114115025A (en) * 2021-11-24 2022-03-01 国汽智控(北京)科技有限公司 Fault information saving method, device and equipment based on automatic driving system
CN114115025B (en) * 2021-11-24 2024-05-28 国汽智控(北京)科技有限公司 Method, device and equipment for storing fault information based on automatic driving system

Also Published As

Publication number Publication date
CN112395137B (en) 2021-11-09

Similar Documents

Publication Publication Date Title
EP3121726B1 (en) Fault processing method, related device and computer
US8140907B2 (en) Accelerated virtual environments deployment troubleshooting based on two level file system signature
US20180074884A1 (en) Information Handling System Pre-Boot Fault Management
KR101143679B1 (en) Automated firmware recovery
US11157349B2 (en) Systems and methods for pre-boot BIOS healing of platform issues from operating system stop error code crashes
CN104734979A (en) Control method for storage device externally connected with router
KR101649909B1 (en) Method and apparatus for virtual machine vulnerability analysis and recovery
CN112395137B (en) Linux kernel exception processing method, equipment and device
CN102541739B (en) The method of testing of (SuSE) Linux OS and device
US10514972B2 (en) Embedding forensic and triage data in memory dumps
CN101145983B (en) A self-diagnosis and self-discovery subsystem and method of network management system
CN111090546B (en) Method, device and equipment for restarting operating system and readable storage medium
CN101677276B (en) Method, device and system of managing execution environment (EE)
CN114579971A (en) Starting method of safety control module and related device
CN109086085B (en) Operating system start management method and device
CN115599310B (en) Method and device for controlling storage resources in storage node and storage node
CN114116330B (en) Server performance testing method, system, terminal and storage medium
CN112579330A (en) Method, device and equipment for processing abnormal data of operating system
WO2012097761A1 (en) Recovery method for communication exceptions between data card and host, and data card
CN114546420A (en) Software remote installation protection uninstalling method
CN104516791A (en) Data processing method and device and electronic device
CN114328379A (en) Snapshot creating method, device and equipment and readable storage medium
CN117294463A (en) BIOS information safety protection checking system
CN118193270A (en) Program protection method, device, computer equipment and storage medium
CN114547615A (en) System, method and device for carrying out antivirus and system repair on vehicle-mounted information system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant