CN115292077A - Kernel exception handling method and system - Google Patents

Kernel exception handling method and system Download PDF

Info

Publication number
CN115292077A
CN115292077A CN202210821112.8A CN202210821112A CN115292077A CN 115292077 A CN115292077 A CN 115292077A CN 202210821112 A CN202210821112 A CN 202210821112A CN 115292077 A CN115292077 A CN 115292077A
Authority
CN
China
Prior art keywords
kernel
remote host
crash
instruction
exception
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210821112.8A
Other languages
Chinese (zh)
Inventor
李国辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Autel Intelligent Automobile Corp Ltd
Original Assignee
Autel Intelligent Automobile Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Autel Intelligent Automobile Corp Ltd filed Critical Autel Intelligent Automobile Corp Ltd
Priority to CN202210821112.8A priority Critical patent/CN115292077A/en
Publication of CN115292077A publication Critical patent/CN115292077A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing

Abstract

The embodiment of the application relates to the field of kernel exception handling, and discloses a kernel exception handling method and system. The kernel exception handling system comprises electronic equipment, a network console and a remote host; the electronic equipment is in communication connection with the network console and the remote host, and the network console is in communication connection with the remote host. In the application, when the kernel of the operating system of the electronic equipment is crashed, the electronic equipment sends the context information of the crashed kernel to the remote host and starts the network console; and after receiving the context information, the remote host interacts with the network console so as to obtain a corresponding crash log and accurately analyze the cause of the kernel crash, and the requirement for storing the Vmcore file is saved without storing the Vmcore file in a large capacity.

Description

Kernel exception handling method and system
Technical Field
The embodiment of the application relates to the technical field of kernel processing, in particular to a kernel exception handling method and system.
Background
At present, an electronic device, such as a computer, is installed with a Linux operating system, and may also be other types of operating systems, taking the Linux operating system as an example, kernel crash of the operating system means kernel crash of the Linux operating system, and certainly, other types of operating systems and kernels may also be used. The kernel crash is also called kernel exception, and refers to that the kernel encounters an unrecoverable error, such as an instruction access address error, an instruction content error, and the like. When a kernel exception occurs, the system typically needs to be restarted to recover.
At present, the reason for the kernel crash is generally analyzed for the memory dump file Vmcore generated by the memory dump mechanism Kdump of the kernel, but this method has a high memory requirement.
Disclosure of Invention
The embodiment of the application aims to provide a kernel exception handling method and a kernel exception handling system, when a kernel exception occurs, a network console is switched to a kernel-state network console, the network console interacts with a remote host, the kernel crash reason is analyzed, a large-capacity storage of Vmcore files is not needed, and the requirement for storing the Vmcore files is saved.
In order to solve the technical problem, the embodiment of the application adopts the following technical scheme:
in a first aspect, an embodiment of the present application provides a kernel exception handling method, which is applied to a kernel exception handling system, where the system includes an electronic device, a network console, and a remote host; the electronic equipment is in communication connection with the network console and the remote host, and the network console is in communication connection with the remote host; the method comprises the following steps:
when the kernel of the operating system of the electronic equipment crashes, the electronic equipment sends the context information of the kernel crash to the remote host, and the electronic equipment starts the network console;
after receiving the context information, the remote host generates a debugging command and sends the debugging command to the network console;
the network console receives a debugging command sent by the remote host, and sends a crash log corresponding to the kernel crash to the remote host according to the debugging command;
and after receiving the crash log, the remote host analyzes the cause of the crash of the kernel.
In some embodiments, the method further comprises:
the electronic equipment judges whether the current instruction is abnormal or not as a user mode instruction;
if the current instruction exception is not the user mode instruction exception, the electronic equipment determines that the current instruction exception is a kernel mode instruction exception;
the electronic equipment calls an interrupt function to close the interrupt in which the kernel mode exception occurs.
In some embodiments, the electronic device sends context information of the kernel crash to the remote host by using a polling mechanism; the remote host receives the context information by using the polling mechanism; and the network console receives the debugging command sent by the remote host by using a polling mechanism.
In some embodiments, after the remote host sends the debug command to the network console, the method further comprises:
the electronic equipment acquires a crash log in a kernel log buffer area;
the electronic device sends the crash log to the network console using a polling mechanism.
In some embodiments, after receiving the crash log, the remote host analyzes a kernel crash reason, including:
and after receiving the crash log, the remote host calls a crash tool to analyze the crash log by using an abnormal analysis program to obtain the kernel crash reason.
In some embodiments, after receiving the crash log, the remote host invokes a crash tool to analyze the crash log by using an exception analysis program, and obtains a kernel crash reason, where the method includes:
the remote host acquires the first instruction content and the corresponding address of the abnormal instruction according to the crash log;
the remote host calls a crash tool to judge whether the command with the abnormality is a memory access command or not by using an abnormality analysis program;
if the remote host judges that the command with the abnormality is a memory access command, the remote host judges whether a memory access address corresponding to the command with the abnormality is abnormal or not;
and if the memory access address corresponding to the abnormal instruction is abnormal, the remote host determines the data access abnormality and tracks the reason of the data access abnormality.
In some embodiments, the method further comprises:
if the far-end host determines that the abnormal instruction is not a memory access instruction, the far-end host acquires second instruction content from the abnormality analysis program according to an address corresponding to the abnormal instruction;
the remote host judges whether the second instruction content is consistent with the first instruction content;
if the contents of the first instruction and the contents of the second instruction are different, the remote host determines that the memory of the instruction with the exception is modified;
and the remote host analyzes the reason why the memory of the command with the exception is modified.
In some embodiments, after the memory access address corresponding to the instruction in which the exception occurs is abnormal, or after the remote host determines that the memory of the instruction in which the exception occurs is modified, the method further includes:
and the remote host acquires memory data from the network console, wherein the memory data corresponds to the reason of the memory access address abnormity or the reason of the memory modification.
In a second aspect, an embodiment of the present application further provides a kernel exception handling system, where the system includes an electronic device, a network console, and a remote host; the electronic device is in communication connection with the network console and the remote host, and the network console is in communication connection with the remote host, wherein:
the electronic device is used for sending the context information of kernel crash to the far-end host when the kernel crash occurs to the operating system of the electronic device, and the electronic device starts the network console;
the remote host is used for generating a debugging command after receiving the context information and sending the debugging command to the network console;
the network console is used for receiving a debugging command sent by the remote host and sending a crash log corresponding to the kernel crash to the remote host according to the debugging command;
and the remote host is also used for analyzing the kernel crash reason after receiving the crash log.
In some embodiments, the electronic device is further configured to:
judging whether the current instruction is abnormal or not as a user mode instruction;
if the current instruction exception is not the user mode instruction exception, determining that the current instruction exception is a kernel mode instruction exception;
and calling an interrupt function to close the interrupt in which the kernel mode exception occurs.
The beneficial effects of the embodiment of the application are as follows: different from the situation in the prior art, the kernel exception handling method and system provided by the embodiment of the application include an electronic device, a network console and a remote host. When the kernel of the operating system of the electronic equipment collapses, the electronic equipment sends the context information of the collapsed kernel to the remote host, and starts a network console; and after receiving the context information, the remote host interacts with the network console, so that a corresponding crash log is obtained, the kernel crash reason is accurately analyzed, a large-capacity storage of the Vmcore file is not needed, and the requirement for storing the Vmcore file is saved.
Drawings
One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the figures in which like reference numerals refer to similar elements and which are not to scale unless otherwise specified.
FIG. 1 is a schematic diagram of a kernel exception handling system according to the present application;
FIG. 2 is a diagram of a hardware configuration of a controller in an embodiment of an electronic device of the present application;
FIG. 3 is a flowchart illustrating an embodiment of a kernel exception handling method according to the present application;
FIG. 4 is a schematic diagram of one embodiment of a crash log of the kernel exception handling method of the present application;
fig. 5 is a schematic structural diagram of interaction between a remote host and a network console according to the kernel exception handling method of the present application.
Detailed Description
The present application will be described in detail with reference to specific examples. The following examples will aid those skilled in the art in further understanding the present application, but are not intended to limit the present application in any way. It should be noted that various changes and modifications can be made by one skilled in the art without departing from the spirit of the application. All falling within the scope of protection of the present application.
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.
It should be noted that, if not conflicted, the various features of the embodiments of the present application may be combined with each other within the scope of protection of the present application. Additionally, while functional block divisions are performed in system schematics, with logical sequences shown in flowcharts, in some cases the steps shown or described may be performed in a different order than the block divisions in the systems, or in the flowcharts. Further, the terms "first," "second," and the like, as used herein do not limit the data and the order of execution, but merely distinguish between the same or similar items that have substantially the same function and effect.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
In addition, the technical features mentioned in the embodiments of the present application described below may be combined with each other as long as they do not conflict with each other.
At present, an electronic device, such as a computer, is installed with a Linux operating system, or may be another type of operating system, taking the Linux operating system as an example, a kernel crash of the Linux operating system refers to a kernel crash of the Linux operating system, and of course, the electronic device may also be another type of operating system and kernel, which is not limited herein. A kernel crash, also called a kernel exception, refers to a kernel encountering an unrecoverable error, such as an instruction access address error, an instruction content error, and the like.
When the kernel of the Linux operating system of the electronic device is crashed, a memory dump mechanism of the Linux kernel, which is a Kdump mechanism, can be used for capturing field memory data generated by kernel coast. The Kjump mechanism needs to use a capture kernel resident in a memory, when the kernel is abnormal, a patch kexec of a Linux kernel is used for starting the capture kernel, and the patch kexec can bypass hardware initialization processes such as a BIOS (basic input output System) and the like, so that memory data of a production kernel can be reserved and read by the capture kernel to realize dump.
The production kernel is a kernel used by the Linux kernel system in a working state; the capture kernel means that when the production kernel is crashed, the Kjump mechanism can enable the Linux kernel system to switch the kernels, and the switched kernels are the capture kernels. The patch kexec may be booted directly from the currently running core to a new core.
In some embodiments, the existing kernel exception analysis mode mainly depends on a large-capacity Vmcore file, where the Vmcore file is a memory dump file generated by a kdump mechanism when a Linux kernel crashes, and the Vmcore file retains kernel debugging information including memory stack information, instruction memory contents, register information, and the like when the Linux kernel crashes, and can be used to analyze the cause of the kernel crash. However, for embedded systems, there are no large-capacity file systems to store Vmcore files, due to cost, bulk, power consumption, or security reasons, and the kdump mechanism or similar is difficult to use in embedded systems.
Moreover, if the kdump mechanism is used, a memory space needs to be reserved for capturing the kernel capture kernel, at least 64M is needed, and the part of the memory cannot be used when the production kernel works, and it is not cost-effective to reserve at least 64M of system memory in consideration that the capture kernel is only used when the kernel crashes, and the kernel crash is a low-probability accidental event.
Therefore, the present embodiment provides a kernel exception handling system, as shown in fig. 1, a kernel exception handling system 100 includes an electronic device 101, a network console 102, and a remote host 103; the electronic device 101 is communicatively coupled to the network console 102 and the remote host 103, and the network console 102 is communicatively coupled to the remote host 103.
The electronic device 101 is configured to, when a kernel crash occurs in an operating system of the electronic device 101, send context information of the kernel crash to the remote host 103, and the electronic device 101 starts the network console 102;
the remote host 103 is configured to generate a debugging command after receiving the context information, and send the debugging command to the network console 102;
the network console 102 is configured to receive a debugging command sent by the remote host 103, and send a crash log corresponding to the kernel crash to the remote host 103 according to the debugging command;
the remote host 103 is further configured to analyze a kernel crash reason after receiving the crash log.
The electronic device 101, the network console 102, and the remote host 103 may be computers, and the Linux operating system is installed in the electronic device 101, and kernel crash of the Linux operating system refers to kernel crash of the Linux operating system.
In some embodiments, the electronic device 101 is further configured to:
judging whether the current instruction is abnormal or not as a user mode instruction;
if the current instruction exception is not the user state instruction exception, determining that the current instruction exception is a kernel state instruction exception;
and calling an interrupt function to close the interrupt in which the kernel mode exception occurs.
The electronic device 101 determines whether the kernel crash occurs in the operating system of the electronic device 101 by determining that the current instruction exception is a user-mode instruction exception or a kernel-mode instruction exception.
In some embodiments, as shown in fig. 2, fig. 2 is a schematic hardware structure diagram of the controller 11 in one embodiment of the electronic device 101. The controller 11 of the electronic device 101 is configured to execute method steps executed by the electronic device 101 in the kernel exception handling method, for example, when a kernel crash occurs in an operating system of the electronic device 101, the controller 11 of the electronic device 101 starts the network console 102 and the like, and sends context information of the kernel crash to the remote host 103.
Further, the controller 11 of the electronic apparatus 101 includes:
one or more processors 111, memory 112. Fig. 2 illustrates an example of a processor 111 and a memory 112.
The processor 111 and the memory 112 may be connected by a bus or other means, and fig. 2 illustrates the connection by the bus as an example.
The memory 112, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the kernel exception handling method in the embodiment of the present application. The processor 111 executes various functional applications and data processing of the controller 11, that is, the core exception handling method of the above-described method embodiment, by executing the nonvolatile software program, instructions, and modules stored in the memory 112.
The memory 112 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the electronic device 101, and the like. Further, the memory 112 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 112 may optionally include memory located remotely from the processor 111, which may be connected to the electronic device 101 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The one or more modules are stored in the memory 112 and, when executed by the one or more processors 111, perform the steps performed by the electronic device 101 in any of the method embodiments described above, e.g., performing the method steps performed by the electronic device 101 in steps S301 to S304 of the method of fig. 3 described below.
The product can execute the kernel exception handling method provided by the embodiment of the application, and has corresponding functional modules and beneficial effects of the execution method. For details of the technique not described in detail in this embodiment, reference may be made to the kernel exception handling method provided in the embodiment of the present application.
Embodiments of the present application provide a non-transitory computer-readable storage medium, which stores computer-executable instructions, which are executed by one or more processors, such as one processor 111 in fig. 2, and can cause the one or more processors to perform steps performed by the electronic device 101 in any of the method embodiments described above, for example, the method steps performed by the electronic device 101 in the method steps S301 to S304 in fig. 3 described below.
The above-described embodiments of the electronic device 101 are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
Similar to the electronic device 101, the network console 102 and the remote host 103 also include a controller (not shown), the controller (not shown) of the network console 102 performs the method steps performed by the network console 102 in the kernel exception handling method, and the controller (not shown) of the remote host 103 performs the method steps performed by the remote host 103 in the kernel exception handling method.
According to the kernel exception handling system 100 provided by the embodiment of the application, by arranging the remote host 103 and the network console 102, when the kernel of the Linux operating system of the electronic device 101 is crashed, the crash log is sent to the remote host 103 to analyze the reason of the kernel crash, a large-capacity file system is not required to store the Vmcore file, memory data to be analyzed can be accurately obtained according to the requirement of exception positioning, and the requirement for storing the Vmcore file is saved.
Referring to fig. 3, a schematic flowchart of an embodiment of a kernel exception handling method applied in the present application, which may be executed by the kernel exception handling system 100, includes steps S301 to S304.
S301: when a kernel crash occurs in the operating system of the electronic device 101, the electronic device 101 sends context information of the kernel crash to the remote host 103, and the electronic device 101 starts the network console 102.
The operating system of the electronic device 101 may be a linux operating system, where a kernel crash refers to that a kernel encounters an unrecoverable error, such as an instruction access address error, an instruction content error, and the like, and when the kernel crashes, the operating system generally needs to be restarted to recover, and the kernel crash is also called a kernel exception.
Because the kernel crash of the operating system of the electronic device 101 may cause the interrupt to be closed, if the serial port is cancelled and data transmission cannot be performed by using a general network PAI instead of the development stage, the electronic device 101 may send a data command to the network interface in a polling manner or receive a command from the network interface in order to transfer the command, thereby implementing interaction between the electronic device 101 and the remote host 103 and between the electronic device 101 and the network console 102. The polling mechanism of the Linux kernel is Netpoll. Netpoll is a pure polling data packet receiving mechanism provided by an equipment interface layer based on NAPI mode, only depends on the drive of network equipment, does not depend on an interrupt mechanism and a protocol stack, and realizes the receiving and sending of UDP messages through Netpoll. Through the polling mechanism, a mechanism capable of sending a message to the outside can be provided under the condition of an interrupt exception or an exception of a protocol stack, so that the self condition of the remote host 103 can be informed.
Therefore, when the kernel of the operating system of the electronic device 101 crashes, the electronic device 101 sends the context information of the kernel crash to the network interface by using the polling mechanism, the remote host 103 receives the context information through the network interface, and the electronic device 101 with the kernel crash starts the network console 102 connected thereto.
The context information when the kernel crashes is used as an effective means for positioning the kernel and driving the kernel, so that the context information is sent to the remote host 103 and can be sent through a network, and the remote host 103 can conveniently position the cause of the kernel crash. Meanwhile, the electronic device 101 starts the network console 102 connected thereto, and switches to the network console 102 mechanism in the kernel mode, so that the network console 102 and the remote host 103 analyze and locate the abnormality.
In some embodiments, in order to determine whether a kernel crash occurs in the operating system, the electronic device 101 may determine a current instruction, and therefore, the method may further include:
the electronic equipment 101 judges whether the current instruction is abnormal or not as a user mode instruction;
if the current instruction exception is not the user mode instruction exception, the electronic device 101 determines that the current instruction exception is a kernel mode instruction exception;
the electronic device 101 calls an interrupt function to shut down the interrupt in which the kernel-state exception occurred.
Specifically, the operating system of the electronic device 101 may constantly detect whether the current instruction is abnormal, where the instruction abnormality includes a user mode instruction abnormality and a kernel mode instruction abnormality, the user mode and the kernel mode are obtained by dividing the architecture of the Linux operating system by using the system 100 as a boundary, the user mode and the kernel mode of the operating system actually correspond to the execution states of the non-privileged instruction and the privileged instruction in the CPU instruction set, and the CPU divides different execution levels to execute the instruction with corresponding privilege.
Therefore, if the operating system of the electronic device 101 finds that the current instruction is abnormal, the electronic device 101 first determines whether the current instruction is abnormal or not, and if the current instruction is abnormal, it indicates that the current instruction is not abnormal, and the kernel is not crashed.
If the current instruction exception is not the user mode instruction exception, the electronic device 101 may determine that the current instruction exception is the kernel mode instruction exception, the kernel of the operating system of the electronic device 101 crashes, the electronic device 101 calls an interrupt function to close the interrupt in which the kernel mode exception occurs, where the interrupt function may be a pancnic function, and in the Linux operating system, when a fault that cannot continue to run is found, the pancic function is called to terminate the current instruction, close the kernel interrupt, and the operating system of the kernel cannot recover by itself and enter a crashed state.
When the electronic device 101 determines that a kernel crash occurs to the operating system, the buffer Logbuf in the electronic device 101, which records a kernel log, prints field information (including registers, stack information, and the like) at the time of the crash, then the electronic device 101 sends context information of the kernel crash to the remote host 103 through a network interface, and the electronic device 101 starts the network console 102. The context information may be obtained from a buffer Logbuf that records kernel logs.
When the kernel of the operating system of the electronic device 101 crashes, the electronic device 101 interacts with the remote host 103 in a polling mode (netpoll) to send context information during the crash, the occupied memory occupies less memory than a capture kernel used in a kdump mode, and the capture kernel needs dozens of M of memory, which is only that the network console 102 occupies a part of instruction memory, and only a few K bytes are needed, and can be almost ignored.
S302: after receiving the context information, the remote host 103 generates a debug command, and sends the debug command to the network console 102.
The remote host 103 receives the context information sent by the electronic device 101 through the network interface by using a polling mechanism, and then generates a debugging command, and the remote host 103 sends the debugging command to the network console 102 by using the polling mechanism.
S303: the network console 102 receives a debugging command sent by the remote host 103, and sends a crash log corresponding to the kernel crash to the remote host 103 according to the debugging command.
The network console 102 receives the debug command from the network interface by using a polling mechanism, and then parses the debug command, and the network console 102 sends a crash log corresponding to the kernel crash to the remote host 103 according to the debug command.
In order for the network console 102 to obtain a crash log corresponding to the kernel crash, after the remote host 103 sends the debug command to the network console 102, the method may further include:
the electronic device 101 acquires a crash log in a kernel log buffer;
the electronic device 101 sends the crash log to the network console 102 using a polling mechanism.
Specifically, when a kernel crash occurs in the electronic device 101, the network console 102 is started, and the electronic device 101 obtains a crash log in the kernel log buffer Logbuf, and then the interrupt is closed due to the kernel crash, so that the electronic device 101 transmits the crash log to the network interface by using a polling mechanism, and the network console 102 receives the crash log transmitted by the electronic device 101 through the network interface.
After the network console 102 obtains the crash log, it sends the crash log to the remote host 103.
S304: after receiving the crash log, the remote host 103 analyzes the cause of the kernel crash.
Specifically, after the remote host 103 receives the crash log, the kernel crash reason is directly analyzed, a dump storage file Vmcore generated by a kdump mechanism when the kernel crashes occurs in the electronic device 101 is not needed, that is, the kernel crash reason is not needed to be analyzed by using a Vmcore file, and the remote host 103 is used for analyzing the kernel crash reason, so that the electronic device 101 does not need to store a large-capacity Vmcore data file at one time, and a storage space is saved.
In some embodiments, step S304 may include:
after receiving the crash log, the remote host 103 calls a crash tool to analyze the crash log by using an exception analysis program, so as to obtain a kernel crash reason.
Specifically, an exception analysis program, a crash wrapper, is provided in the remote host 103, and the exception analysis program may receive memory data generated by kernel crash and a symbol table file vmlinux generated when the kernel is compiled, and may call a crash tool to perform data analysis, thereby analyzing a cause of the kernel crash.
In some embodiments, after receiving the crash log, the remote host 103 calls a crash tool to analyze the crash log by using an exception analysis program, and obtaining a kernel crash reason may include:
the remote host 103 acquires the first instruction content and the corresponding address of the abnormal instruction according to the crash log;
the remote host 103 calls a blast tool to judge whether the command with the exception is a memory access command by using an exception analysis program;
if the remote host 103 determines that the instruction in which the exception occurs is a memory access instruction, the remote host 103 determines whether a memory access address corresponding to the instruction in which the exception occurs is abnormal;
if the memory access address corresponding to the abnormal instruction is abnormal, the remote host 103 determines that the data access is abnormal and tracks the reason of the data access abnormality.
Specifically, the remote host 103 is provided with an exception analysis program crash wrapper, after the remote host 103 obtains the crash log, the first instruction content and the corresponding address of the abnormal instruction are obtained according to the crash log, the crash log obtained by the remote host 103 is shown in fig. 4, and the first instruction content and the corresponding address of the abnormal instruction can be analyzed from the crash log as the abnormal instruction running at 0xc010ea 48.
Then, the remote host 103 calls a crash tool for further analysis by using an exception analysis program crash wrapper, the crash tool has a plurality of analysis commands when analyzing the kernel crash reason, as shown in fig. 5, the analysis commands include bt command, dis command, sym command, log command \8230 \ 8230et al, different analysis commands represent different analysis modes, and the remote host 103 calls different commands in the crash tool to analyze different exception reasons.
Taking the analysis of the data access abnormality as an example, the remote host 103 uses a command in the blast tool to determine whether the instruction in which the abnormality occurs is a memory access instruction; if the remote host 103 determines that the instruction with the abnormality is a memory access instruction, the remote host 103 determines whether a memory access address corresponding to the instruction with the abnormality is abnormal, specifically, whether the memory access address corresponding to the instruction with the abnormality is abnormal can be determined by combining the register content in a crash log, if the memory access address corresponding to the instruction with the abnormality is abnormal, the remote host 103 determines that data access is abnormal, the remote host 103 obtains memory data from the network console 102, the memory data corresponds to the reason of the memory access address abnormality, then, the remote host 103 tracks the reason of the data access abnormality, and specifically, the reason of the data access abnormality can be tracked according to a disassembling instruction in a symbol table file vmlinux, so that the reason of kernel crash can be obtained; correspondingly, if the access address corresponding to the instruction with the exception is not abnormal, the far-end host 103 determines the reason for the crash of the other cores.
Correspondingly, if the remote host 103 determines that the instruction with the exception is not a memory access instruction, the remote host 103 obtains a second instruction content from the exception analysis program according to an address corresponding to the instruction with the exception; the remote host 103 determines whether the second instruction content is consistent with the first instruction content; if the contents of the first instruction and the contents of the second instruction are different, the remote host 103 determines that the memory of the instruction in which the exception occurs is modified; the remote host 103 obtains memory data from the network console 102, where the memory data corresponds to the reason why the memory is modified, and the remote host 103 analyzes the reason why the memory of the instruction in which the abnormality occurs is modified, thereby analyzing the reason why the kernel crashes. Further, the reasons that the memory may be modified may be a physical memory failure or a data access violation, resulting in a kernel crash.
Further, if the remote host 103 determines that the second instruction content is consistent with the first instruction content, the remote host 103 determines that the memory of the instruction in which the exception occurs is not modified, and the remote host 103 determines the cause of the crash of the other cores.
When the kernel crash reason is analyzed, the analysis mode of using the Vmcore file can be referred to, and the difference from the analysis of the kernel crash reason by using the Vmcore file is that the remote host 103 of the present application does not need to store all the data of the large-capacity Vmcore file at one time when the kernel crash reason is analyzed, and only needs to obtain the corresponding memory data from the network console 102 when different analysis instructions are used, so that the memory data can be accurately located, thereby saving the storage space, and the use of the rest of the analysis instructions can refer to the analysis of the kernel crash reason by using the Vmcore file, which is not repeated herein.
Moreover, when analyzing the cause of kernel crash, considering that the kernel cannot be used by interruption when abnormal occurs, the network console 102 interacts with the remote host 103 in a polling manner to send and receive data, unlike the prior art in which the Linux kernel only sends a kernel crash log at one time and then shuts down or restarts the operating system, the remote host 103 sends different commands of analysis programs to the network console 102 when analyzing the cause of kernel crash, where the command may be a command for reading memory data or a command for reading device hardware registers, and the network console 102 selectively sends corresponding memory data or registers and the like according to the analysis commands sent by the remote host 103, so as to accurately locate the abnormality and comprehensively analyze the cause of kernel abnormality.
According to the kernel exception handling method provided by the embodiment of the application, by arranging the remote host 103 and the network console 102, when the kernel of the Linux operating system of the electronic device 101 is crashed, the crash log is sent to the remote host 103 to analyze the reason of the kernel crash, a large-capacity file system is not required to store the Vmcore file, the memory data to be analyzed can be accurately obtained according to the requirement of exception positioning, and the requirement for storing the Vmcore file is saved.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; within the idea of the invention, also technical features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A kernel exception handling method is applied to a kernel exception handling system, and the system comprises electronic equipment, a network console and a remote host; the electronic equipment is in communication connection with the network console and the far-end host computer, and the network console is in communication connection with the far-end host computer; the method comprises the following steps:
when the kernel of the operating system of the electronic equipment crashes, the electronic equipment sends the context information of the kernel crash to the remote host, and the electronic equipment starts the network console;
after receiving the context information, the remote host generates a debugging command and sends the debugging command to the network console;
the network console receives a debugging command sent by the far-end host, and sends a crash log corresponding to the kernel crash to the far-end host according to the debugging command;
and after receiving the crash log, the remote host analyzes the cause of the crash of the kernel.
2. The method of claim 1, further comprising:
the electronic equipment judges whether the current instruction is abnormal or not as a user mode instruction;
if the current instruction exception is not the user mode instruction exception, the electronic equipment determines that the current instruction exception is a kernel mode instruction exception;
the electronic equipment calls an interrupt function to close the interrupt in which the kernel mode exception occurs.
3. The method of claim 1, wherein the electronic device sends context information of the kernel crash to the remote host using a polling mechanism; the remote host receives the context information by using the polling mechanism; and the network console receives the debugging command sent by the remote host by using a polling mechanism.
4. The method of claim 1, wherein after the remote host sends the debug command to the network console, the method further comprises:
the electronic equipment acquires a crash log in a kernel log buffer area;
and the electronic equipment sends the crash log to the network console by utilizing a polling mechanism.
5. The method of claim 1, wherein analyzing a cause of the kernel crash after the remote host receives the crash log comprises:
and after receiving the crash log, the remote host calls a crash tool to analyze the crash log by using an abnormal analysis program to obtain the kernel crash reason.
6. The method of claim 5, wherein after receiving the crash log, the remote host invokes a crash tool to analyze the crash log by using an exception analyzer to obtain a kernel crash cause, comprising:
the remote host acquires the first instruction content and the corresponding address of the abnormal instruction according to the crash log;
the remote host calls a crash tool to judge whether the command with the exception is a memory access command or not by using an exception analysis program;
if the remote host judges that the command with the abnormality is a memory access command, the remote host judges whether a memory access address corresponding to the command with the abnormality is abnormal or not;
and if the memory access address corresponding to the abnormal instruction is abnormal, the remote host determines the data access abnormality and tracks the reason of the data access abnormality.
7. The method of claim 6, further comprising:
if the far-end host determines that the abnormal instruction is not a memory access instruction, the far-end host acquires second instruction content from the abnormality analysis program according to an address corresponding to the abnormal instruction;
the remote host judges whether the second instruction content is consistent with the first instruction content;
if the content of the first instruction is different from that of the second instruction, the remote host determines that the memory of the instruction with the exception is modified;
and the remote host analyzes the reason why the memory of the command with the exception is modified.
8. The method as claimed in claim 6, wherein after the memory access address corresponding to the instruction that is abnormal if occurring is abnormal, or after the remote host determines that the memory of the instruction that is abnormal is modified, the method further comprises:
and the remote host acquires memory data from the network console, wherein the memory data corresponds to the reason that the memory access address is abnormal or the reason that the memory is modified.
9. A kernel exception handling system is characterized by comprising an electronic device, a network console and a remote host; the electronic device is in communication connection with the network console and the remote host, and the network console is in communication connection with the remote host, wherein:
the electronic device is used for sending the context information of kernel crash to the remote host when the kernel crash occurs to the operating system of the electronic device, and the electronic device starts the network console;
the remote host is used for generating a debugging command after receiving the context information and sending the debugging command to the network console;
the network console is used for receiving a debugging command sent by the remote host and sending a crash log corresponding to the kernel crash to the remote host according to the debugging command;
and the remote host is also used for analyzing the kernel crash reason after receiving the crash log.
10. The system of claim 9, wherein the electronic device is further configured to:
judging whether the current instruction is abnormal or not as a user mode instruction;
if the current instruction exception is not the user mode instruction exception, determining that the current instruction exception is a kernel mode instruction exception;
and calling an interrupt function to close the interrupt in which the kernel mode exception occurs.
CN202210821112.8A 2022-07-13 2022-07-13 Kernel exception handling method and system Pending CN115292077A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210821112.8A CN115292077A (en) 2022-07-13 2022-07-13 Kernel exception handling method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210821112.8A CN115292077A (en) 2022-07-13 2022-07-13 Kernel exception handling method and system

Publications (1)

Publication Number Publication Date
CN115292077A true CN115292077A (en) 2022-11-04

Family

ID=83822425

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210821112.8A Pending CN115292077A (en) 2022-07-13 2022-07-13 Kernel exception handling method and system

Country Status (1)

Country Link
CN (1) CN115292077A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116680208A (en) * 2022-12-16 2023-09-01 荣耀终端有限公司 Abnormality recognition method and electronic device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116680208A (en) * 2022-12-16 2023-09-01 荣耀终端有限公司 Abnormality recognition method and electronic device

Similar Documents

Publication Publication Date Title
US8266395B2 (en) Detecting attempts to change memory
KR101759379B1 (en) Memory dump with expanded data and user privacy protection
EP1668509B1 (en) Method and apparatus for monitoring and resetting a co-processor
US8099636B2 (en) System and method for protecting memory stacks using a debug unit
US20120042215A1 (en) Request processing system provided with multi-core processor
US8893122B2 (en) Virtual computer system and a method of controlling a virtual computer system on movement of a virtual computer
US20140122421A1 (en) Information processing apparatus, information processing method and computer-readable storage medium
US20090083736A1 (en) Virtualized computer, monitoring method of the virtualized computer and a computer readable medium thereof
CN101937344B (en) Computer and method for quickly starting same
WO2014209251A2 (en) Recovery after input/ouput error-containment events
US10514972B2 (en) Embedding forensic and triage data in memory dumps
CN115292077A (en) Kernel exception handling method and system
CN106997313B (en) Signal processing method and system of application program and terminal equipment
US20090144733A1 (en) Virtual machine system and control method of virtual machine system
CN115904793B (en) Memory transfer method, system and chip based on multi-core heterogeneous system
CN115576734B (en) Multi-core heterogeneous log storage method and system
CN107818034B (en) Method and device for monitoring running space of process in computer equipment
US20160292108A1 (en) Information processing device, control program for information processing device, and control method for information processing device
US20210390022A1 (en) Systems, methods, and apparatus for crash recovery in storage devices
WO2022186859A1 (en) Uncorrectable memory error recovery for virtual machine hosts
CN103123603A (en) Echnique to improve performance of software breakpoint handling
CN101311909A (en) System peculiarity diagnose method
WO2023169289A1 (en) Method and apparatus for switching execution status of process
JP5832408B2 (en) Virtual computer system and control method thereof
EP4167095A1 (en) Systems, methods, and devices for accessing a device operating system over an interconnect

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 518055 401, Building B1, Nanshan Zhiyuan, No. 1001, Xueyuan Avenue, Changyuan Community, Taoyuan Street, Nanshan District, Shenzhen, Guangdong

Applicant after: Shenzhen Saifang Technology Co.,Ltd.

Address before: 518000 room 701, building B1, Nanshan wisdom garden, 1001 Xueyuan Avenue, Changyuan community, Taoyuan Street, Nanshan District, Shenzhen City, Guangdong Province

Applicant before: Shenzhen Daotong Intelligent Automobile Co.,Ltd.

CB02 Change of applicant information