CN113934561A - Fault positioning method, device, system, hardware platform and storage medium - Google Patents

Fault positioning method, device, system, hardware platform and storage medium Download PDF

Info

Publication number
CN113934561A
CN113934561A CN202010605429.9A CN202010605429A CN113934561A CN 113934561 A CN113934561 A CN 113934561A CN 202010605429 A CN202010605429 A CN 202010605429A CN 113934561 A CN113934561 A CN 113934561A
Authority
CN
China
Prior art keywords
command
kernel
state information
hardware platform
linux system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010605429.9A
Other languages
Chinese (zh)
Inventor
袁俊卿
薛雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Loongson Technology Corp Ltd
Original Assignee
Loongson Technology Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Loongson Technology Corp Ltd filed Critical Loongson Technology Corp Ltd
Priority to CN202010605429.9A priority Critical patent/CN113934561A/en
Publication of CN113934561A publication Critical patent/CN113934561A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy

Abstract

The embodiment of the invention provides a fault positioning method, which is applied to a hardware platform running a Linux system, wherein the hardware platform is provided with a debugging interface, and the method comprises the following steps: under the condition that a kernel of the Linux system fails, receiving a first command from the control equipment through a debugging interface; in response to the first command, entering a debugging state to traverse and execute processes in a kernel of the Linux system, and recording process state information of each process in the traversing process; receiving a second command from the control device through the debugging interface; reading the process state information in response to the second command; and sending process state information to the control equipment through the debugging interface so that the control equipment can position the target process with the fault in the kernel according to the process state information. The embodiment of the invention can realize the rapid positioning of the kernel fault of the Linux system.

Description

Fault positioning method, device, system, hardware platform and storage medium
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a fault location method, a fault location apparatus, a fault location system, a hardware platform, and a storage medium.
Background
With the development of the computer market, the Linux operating system is widely applied. The Linux operating system is developed from a relatively mature Unix operating system, and is an integrated kernel (monolithic kernel) system, wherein a kernel refers to system software providing functions such as a hardware abstraction layer, disk and file system control, multitasking, and the like.
Due to the complexity of the kernel implementation mechanism, how to locate the fault of the kernel in a system platform running with a Linux operating system under the condition of kernel breakdown becomes an extremely complex work.
At present, under the condition that kernel of a Linux operating system is broken down, trial fault location work is usually performed through manual processing modes such as data searching and code manual analysis, and the like, and the mode not only consumes a large amount of manpower and material resources and time cost, but also causes low fault location efficiency and difficulty in ensuring accuracy.
Disclosure of Invention
In view of the above problems, embodiments of the present invention are proposed to provide a fault location method that overcomes or at least partially solves the above problems to quickly locate a kernel fault of a Linux system.
Correspondingly, the embodiment of the invention also provides a fault positioning method, a fault positioning device, a fault positioning system, a hardware platform and a storage medium, which are used for ensuring the realization and the application of the method.
In order to solve the above problem, an embodiment of the present invention discloses a fault location method, which is applied to a hardware platform running a Linux system, where the hardware platform is configured with a debug interface, and the method includes:
under the condition that a kernel of the Linux system fails, receiving a first command from a control device through the debugging interface;
in response to the first command, entering a debugging state to traverse and execute processes in a kernel of the Linux system, and recording process state information of each process in the traversing process;
receiving a second command from the control device through the debugging interface;
reading the process state information in response to the second command;
and sending the process state information to the control equipment through the debugging interface so that the control equipment can position the target process with the fault in the kernel according to the process state information.
The embodiment of the invention also discloses a fault positioning device, which is applied to a hardware platform running a Linux system, wherein the hardware platform is provided with a debugging interface, and the device comprises:
the first receiving module is used for receiving a first command from the control equipment through the debugging interface under the condition that the kernel of the Linux system has a fault;
the traversal module is used for responding to the first command, entering a debugging state to traverse and execute processes in a kernel of the Linux system, and recording process state information of each process in the traversal process;
the second receiving module is used for receiving a second command from the control equipment through the debugging interface;
a reading module, configured to read the process state information in response to the second command;
and the sending module is used for sending the process state information to the control equipment through the debugging interface so that the control equipment can position the target process with the fault in the kernel according to the process state information.
The embodiment of the invention also discloses a hardware platform which comprises a memory and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs are configured to be executed by one or more processors and comprise instructions for:
under the condition that a kernel of the Linux system fails, receiving a first command from a control device through the debugging interface;
in response to the first command, entering a debugging state to traverse and execute processes in a kernel of the Linux system, and recording process state information of each process in the traversing process;
receiving a second command from the control device through the debugging interface;
reading the process state information in response to the second command;
and sending the process state information to the control equipment through the debugging interface so that the control equipment can position the target process with the fault in the kernel according to the process state information.
The embodiment of the invention also discloses a readable storage medium, and when instructions in the storage medium are executed by a processor of a hardware platform, the hardware platform can execute one or more fault location methods in the embodiment of the invention.
The embodiment of the invention has the following advantages:
in the embodiment of the invention, the hardware platform can receive a first command from the control device through the configured debugging interface, enter a debugging state in response to the first command so as to traverse and execute the processes in the kernel of the Linux system, and record the process state information of each process in the kernel of the Linux system. After the first command is executed, the hardware platform may receive a second command from the control device, read the process state information in response to the second command, and send the process state information to the control device, so that the control device side analyzes the process state information and locates a target process that has a fault. Therefore, the control device sends the control command to the debugging interface of the hardware platform, and the CPU of the hardware platform is controlled to execute the control command, so that the hardware platform reads the process state information and sends the process state information to the control device, the control device can be used for rapidly positioning the kernel fault of the hardware platform, the efficiency and the accuracy of fault positioning of the kernel of the Linux system can be improved, and manpower, material resources and time cost can be saved.
Drawings
FIG. 1 is a flow chart of the steps of one embodiment of a fault location method of the present invention;
FIG. 2 is a flow chart of an implementation of an objective function provided by an embodiment of a fault location method of the present invention;
FIG. 3 is a block diagram of a fault locating device according to an embodiment of the present invention;
FIG. 4 is a block diagram of a fault location system embodiment of the present invention;
FIG. 5 is a block diagram illustrating a hardware platform for fault location, according to an example embodiment.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Method embodiment
Referring to fig. 1, a flowchart of a first embodiment of a fault location method according to the present invention is shown, and is applied to a hardware platform running a Linux system, where the hardware platform is configured with a debug interface, and the method specifically includes the following steps:
step 101, receiving a first command from a control device through the debugging interface under the condition that the kernel of the Linux system has a fault;
step 102, responding to the first command, entering a debugging state to traverse and execute processes in a kernel of the Linux system, and recording process state information of each process in the traversing process;
103, receiving a second command from the control device through the debugging interface;
step 104, responding to the second command, and reading the process state information;
and 105, sending the process state information to the control equipment through the debugging interface so that the control equipment can locate the target process with the fault in the kernel according to the process state information.
The fault positioning method of the embodiment of the invention can be applied to a hardware platform running a Linux system, and a debugging interface is configured on a mainboard of the hardware platform. The Linux system is an integrated kernel system, wherein a kernel refers to system software which provides functions of a hardware abstraction layer, disk and file system control, multitask and the like.
The debugging interface is used for transmitting commands between the control equipment and the hardware platform and sending process state information in the hardware platform to the control equipment. It is to be understood that the specific type of the debug interface is not limited by the embodiments of the present invention. Preferably, the debugging interface is an Ejtag interface. For convenience of description, the Ejtag interface is taken as an example in the embodiments of the present invention, and other types of debug interfaces may be referred to each other.
The Ejtag (Enhanced Joint Test Action Group) is a specification established according to the basic structure and function extension of the IEEE 1149.1 protocol, and an Ejtag debugging tool comprises a hardware part and a software part, wherein the software part can be deployed in the control device of the invention, and the hardware part refers to the Ejtag interface. The Ejtag debugging tool supports reading and writing of registers, memory, disassembling, execution of programs written by a user, gdb (GNU Debugger) remote debugging and scripting language.
It should be noted that the present invention does not limit the type of the hardware platform, for example, the hardware platform may include a loongson platform having loongson CPUs, and each loongson CPU supports Ejtag debugging.
The control device may be a device installed with Ejtag debugging tool software, and may be an electronic device such as a PC (personal computer), a server, a remote control terminal, and the like. The control device may be connected to an Ejtag interface of the hardware platform to be tested through the converter, so as to control a Central Processing Unit (CPU) of the hardware platform to execute a specific instruction, thereby implementing a debugging process of the Linux system kernel of the hardware platform.
The converter can be used for receiving a command sent by the control equipment, converting the command into a command which can be recognized by the hardware platform and then sending the command to an Ejtag interface of the hardware platform. That is, the converter may convert the command transmitted by the control device into a command conforming to the Ejtag protocol. The converter can communicate with the control equipment through Ethernet, and can also be connected with Ejtag interface through bus to realize the communication between the control equipment and the hardware platform.
The Linux system kernel of the hardware platform runs a plurality of processes, when a certain process fails, a first command can be sent to the hardware platform through the control device, and the control device is connected with the Ejtag interface in the hardware platform through the converter, so that the first command sent by the control device can be sent to the Ejtag interface of the hardware platform through the converter. The Ejtag interface can directly operate the CPU on the premise of not depending on an upper operating system, so that the CPU executes a specified machine code instruction. For example, read and write registers, read and write memory, set breakpoints, etc.
The first command may be used to control the hardware platform Linux system to enter a debug state, such as an ejtag debug state. The debug state is a state in which the program stops running. In the state of the debugging state which is stopped at a moment, the kernel can be observed in various states, so that the running condition of the kernel can be controlled conveniently. The embodiment of the invention can control the CPU of the hardware platform to traverse and execute the process in the kernel of the Linux system by controlling the hardware platform to enter the debugging state, and record the process state information of each process in the traversing process.
And the CPU of the hardware platform responds to the first command, enters a debugging state, executes the process in the kernel of the Linux system in a traversing way, and records the process state information of each process in the traversing process. And according to the process state information, a target process with a fault in the kernel can be positioned.
In an optional embodiment of the present invention, the first command carries a first address of a target function and a second address of a target cache, and the step 102, in response to the first command, performs traversal of a process in a kernel of the Linux system, and records process state information of each process in a traversal process, where the method includes:
and responding to the first command, running the target function according to the first address, wherein the target function is used for traversing and executing processes in a kernel of the Linux system, and storing process state information of each process in the traversing process into a target cache corresponding to the second address.
The target function may be pre-designed according to a service requirement or an actual debugging requirement, and is pre-written in a kernel file of the Linux system, for example, the target function may be a custom function with a function name of dump _ for _ tasks. The target cache may be a cache area owned by the Linux system kernel, such as a location of a memory address corresponding to the __ log _ buf function, and is used to temporarily store data information in the process traversal process.
Optionally, the operating the objective function according to the first address includes: and controlling a CPU pointer of the hardware platform to jump to the first address so as to run the target function.
For example, after the Ejtag interface of the hardware platform receives the first command, the CPU pointer is controlled to jump to the location of the first address carried in the first command, that is, to jump to the start address of the target function dump _ for _ tasks, so as to run the target function.
In an optional embodiment of the invention, before the receiving the first command from the control device, the method may further comprise: and disassembling the kernel file of the Linux system to obtain a first address of the target function and a second address of the target cache.
The target function can be designed in advance according to business needs or actual debugging needs, and is written into a kernel file of the Linux system in advance, wherein the kernel file refers to a vmlinux file. By disassembling the vmlinux file, a first address of a target function pre-written in the kernel file by the target function and a second address of the target cache in the kernel file can be obtained, and the first address and the second address can be used as input parameters of a first command.
Referring to fig. 2, a flowchart of executing an objective function according to an embodiment of the fault location method of the present invention is shown.
In an optional embodiment of the present invention, the target function may include a first calling function and a second calling function, and the running the target function may include:
step S11, traversing and executing the process in the kernel of the Linux system by operating the first calling function;
step S12, by operating the second call function, saving the process state information of each process in the traversal process to the target cache corresponding to the second address.
In the embodiment of the present invention, the target function may call a system function of the Linux system. For example, the first calling function may be a for _ reach _ process function, and by making a macro call to the for _ reach _ process function, all processes in the kernel of the Linux system may be executed in a traversal manner.
For another example, the second call function may be a show _ stack function, and in the process of traversing and executing processes in the kernel of the Linux system, the show _ stack function may be called to obtain process state information in real time, and store the process state information of each process in the traversal process into the target cache corresponding to the second address.
Optionally, before step S12, the method may further include: judging whether the traversal process is completed, executing the step S12 when the traversal process is not completed, and executing the step S22 when the traversal process is completed.
In an optional embodiment of the present invention, the target function may further include a third calling function and a fourth calling function, and before the first calling function is executed, the method may further include:
step S21, starting interrupt protection for the process in the kernel by running the third calling function;
and after determining that all process traversals are complete, the method further comprises:
and step S22, executing the fourth calling function to finish interrupt protection on the process in the kernel.
By running the third calling function, such as the write _ lock _ irq function, when the kernel of the Linux system fails, interrupt protection can be started for the process in the kernel, and the interrupt protection can lock the interrupt so as to protect the subsequent code of the target function from being safely executed and prevent the subsequent code of the target function from being interrupted by other interrupts in the running process.
After all the process traversal execution of the kernel is determined to be completed, the interrupt protection is finished for the process in the kernel by running the fourth calling function, such as the write _ unlock _ irq function, namely, the interrupt lock is opened.
Optionally, after receiving the first command from the control device in step 101, the method may further include:
step S31, receiving a third command from the control equipment through the debugging interface;
and step S32, responding to the third command, and exiting the debugging state of the Linux system to continue a subsequent command.
The third command can be used for exiting the debugging state, and after all processes of the Linux system kernel are determined to be traversed and executed completely, the third command can be issued to the hardware platform through the control equipment. The hardware platform receives a third command from the control device through the debugging interface (such as the Ejtag interface), so that the kernel exits from a debugging state, such as an Ejtag debug state, and meanwhile, the process in the kernel of the Linux system can continue to run, and the execution of subsequent commands is continued, such as the execution of a second command is continued.
The exiting of the debugging state of the Linux system further comprises: and enabling the CPU of the hardware platform to enter a fixed cycle state.
When the kernel receives a first command and enters a debugging state, a CPU pointer of the hardware platform jumps to a first address to enable the kernel to run the target function, when the kernel receives a third command and exits the debugging state, the CPU of the hardware platform does not need to fixedly point to the first address, at the moment, the CPU pointer needs to point to any address or point to a certain specified address, and the CPU pointer starts to enter a fixed cycle state from the address, wherein the fixed cycle state means that the CPU starts to run from a function corresponding to the address, and circularly runs all functions in the kernel according to a preset fixed cycle rule. The selection of a specific address and the loop design of the fixed loop state are set by those skilled in the art, and the invention is not limited thereto.
After it is determined that all kernel processes of the Linux system of the hardware platform are traversed, a second command can be issued to the hardware platform through the control device, and the second command is used for reading the process state information recorded by the first command. And the hardware platform receives a second command from the control equipment through the Ejtag interface and responds to the second command to read the process state information.
In an optional embodiment of the present invention, the second address is carried in the second command, and the step 104 of reading the process state information in response to the second command specifically includes:
step S41, in response to the second command, reading stored data with a preset length from a target cache corresponding to the second address;
and step S42, saving the read storage data as a log file.
After the kernel process of the Linux system of the hardware platform is completely traversed through the first command, the process state information of each process in the traversing process is cached into a target cache. At this time, a second command may be issued to the hardware platform through the control device, where the second command carries a second address corresponding to the target cache.
The hardware platform can receive a second command sent by the control device through an Ejtag interface, read storage data with a preset length from a target cache corresponding to the second address, and store the storage data as a log file, wherein the storage data comprises process state information of each process recorded in the running process of the target function.
The first command, the second command, and the third command in the embodiment of the present invention are specifically exemplified as follows:
set pc 0xfffffff 8022bee4(dump _ for _ tasks disassembled address); cont;
fget aa.log 0 xfffffffffff 80d7daf0(__ log _ buf)0x200000 (size);
(1) a first command: the setpc address is used for controlling the Linux system of the hardware platform to enter a debugging state and controlling a CPU pointer of the hardware platform to point to a first address (address) corresponding to the target function so as to execute the target function;
(2) a third command: cont for controlling the Linux system of the hardware platform to exit the debugging state;
(3) a second command: the fget filename address size is used for controlling the CPU of the hardware platform to read the storage data with a preset length (size) from the second address (address) corresponding to the target cache, and the storage data is stored as a log file with the filename name.
It should be noted that, the first command is sent to the hardware platform through the control device when the kernel of the Linux system fails; the third command is sent to the hardware platform through the control equipment after all processes of the Linux system kernel are determined to be traversed and executed, namely the first command is executed; the second command may be sent to the hardware platform through the control device after the first command is executed and the hardware platform receives the third command, that is, after the Linux system of the hardware platform exits the debug state; of course, the second command may also be sent to the hardware platform through the control device when the hardware platform does not receive the third command, that is, when the Linux system of the hardware platform does not exit the debug state.
After executing a second command to read the recorded process state information, the hardware platform may send the process state information to the control device through the Ejtag interface, so that the control device locates a target process in the kernel, where a fault occurs, according to the process state information.
Optionally, step 105, sending the process state information to the control device through the debug interface, so that the control device locates, according to the process state information, a target process in the kernel that has a fault, which may specifically include: and sending the log file to the control equipment so that the control equipment positions the target process with the fault in the kernel according to the process state information recorded in the log file.
Through the debugging interface (such as an Ejtag interface), the hardware platform sends the read log file containing the process state information to the control equipment so that a technician in the field can analyze the log file at the control equipment end, or the control equipment directly and autonomously analyzes the log file so as to position the target process with a fault in the kernel according to the process state information contained in the log file. Examples are as follows: the process state information comprises a running state, a blocking state, a ready state and the like, wherein the running state refers to that the process is executing; the blocking state means that the process cannot run temporarily due to waiting for some event to occur; the ready state means that the process is ready, and the control device can identify the process state information in the log file in the blocking state as long as the process starts to be executed at an opportunity; if the process state information of the storage function A in the kernel of the Linux system is in a blocking state, the control device can quickly locate that the process corresponding to the storage function A in the kernel fails by analyzing the log file, and the process corresponding to the storage function A is a target process.
In summary, in the embodiment of the present invention, a hardware platform may receive a first command from a control device through a configured debug interface, enter a debug state in response to the first command to traverse and execute processes in a kernel of the Linux system, and record process state information of each process in the kernel of the Linux system. After the first command is executed, the hardware platform may receive a second command from the control device, read the process state information in response to the second command, and send the process state information to the control device, so that the control device side analyzes the process state information and locates a target process that has a fault. Therefore, the embodiment of the invention can send the control command to the debugging interface of the hardware platform through the control equipment, control the CPU of the hardware platform to execute the control command, so that the hardware platform reads the process state information and sends the process state information to the control equipment, and the control equipment can be used for rapidly positioning the kernel fault of the hardware platform, thereby improving the efficiency and accuracy of fault positioning of the kernel of the Linux system, and saving manpower, material resources and time cost.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Device embodiment
Referring to fig. 3, a block diagram of a fault location apparatus according to an embodiment of the present invention is shown, and is applied to a hardware platform running a Linux system, where the hardware platform is configured with a debug interface, and the apparatus may specifically include the following modules:
a first receiving module 301, configured to receive, through the debug interface, a first command from a control device when a kernel of the Linux system fails;
a traversing module 302, configured to, in response to the first command, enter a debugging state to traverse processes in a kernel of the Linux system, and record process state information of each process in a traversing process;
a second receiving module 303, configured to receive a second command from the control device through the debug interface;
a reading module 304, configured to read the process state information in response to the second command;
a sending module 305, configured to send the process state information to the control device through the debug interface, so that the control device locates, according to the process state information, a target process in the kernel that has a fault.
Optionally, the first command carries a first address of a target function and a second address of a target cache, and the traversal module 302 includes:
and the target function running submodule is used for responding to the first command and running the target function according to the first address, and the target function is used for traversing and executing processes in a kernel of the Linux system and storing the process state information of each process in the traversing process into a target cache corresponding to the second address.
Optionally, the target function includes a first call function and a second call function, and the target function execution submodule includes:
the first running unit is used for traversing and executing the process in the kernel of the Linux system by running the first calling function;
and the second operation unit is used for storing the process state information of each process in the traversal process into the target cache corresponding to the second address by operating the second calling function.
Optionally, the target function further includes a third calling function and a fourth calling function, and the apparatus further includes:
a third running module, configured to start interrupt protection for the process in the kernel by running the third calling function;
and the fourth running module is used for stopping interrupt protection on the process in the kernel by running the fourth calling function.
Optionally, the apparatus further comprises:
and the disassembling module is used for disassembling the kernel file of the Linux system to obtain a first address of the target function and a second address of the target cache.
Optionally, the second address is carried in the second command, and the reading module 404 includes:
and the data reading submodule is used for responding to the second command and reading the storage data with the preset length from the target cache corresponding to the second address.
And the storage submodule is used for storing the read storage data into a log file.
Optionally, the sending module 305 includes:
and the log sending submodule is used for sending the log file to the control equipment so that the control equipment can locate the target process with the fault in the kernel according to the process state information recorded in the log file.
Optionally, the apparatus further comprises:
the third receiving module is used for receiving a third command from the control equipment through the debugging interface;
and the exit module is used for responding to the third command and exiting the debugging state of the Linux system to continuously execute the subsequent command.
Optionally, the debugging interface is an Ejtag interface.
Referring to fig. 4, a block diagram of a fault location system of an embodiment of the present invention is shown, the fault location system including: a control device 401, a converter 402, and a hardware platform 403; the control device 401 is connected with the hardware platform 403 through the converter 402; the hardware platform 403 is configured with a debug interface;
the converter is used for carrying out protocol conversion on the command from the control equipment and sending the command to the debugging interface, and carrying out protocol conversion on the data from the debugging interface and sending the data to the control equipment;
the hardware platform comprises: the device comprises a first receiving module, a traversing module, a second receiving module, a reading module and a sending module; wherein the content of the first and second substances,
the first receiving module is used for receiving a first command from the control device through the debugging interface under the condition that the kernel of the Linux system fails;
the traversing module is used for responding to the first command, entering a debugging state to traverse and execute processes in a kernel of the Linux system, and recording process state information of each process in the traversing process;
the second receiving module is used for receiving a second command from the control equipment through the debugging interface;
the reading module is used for responding to the second command and reading the process state information;
the sending module is configured to send the process state information to the control device through the debugging interface, so that the control device locates a target process in the kernel, where a fault occurs, according to the process state information.
In summary, in the embodiment of the present invention, the hardware platform may receive a first command from the control device through the configured debug interface, enter a debug state in response to the first command, traverse and execute processes in the kernel of the Linux system, record process state information of each process in the kernel of the Linux system, and read the process state information in response to the second command by receiving a second command from the control device, and send the process state information to the control device, so that the control device analyzes the process state information to perform fault location. Therefore, the embodiment of the invention can flexibly control the hardware platform to traverse the process in the kernel through the control equipment, read the process state information and send the process state information to the control equipment, so that the control equipment can quickly perform fault location on the kernel of the Linux system, the efficiency and the accuracy of the fault location of the kernel of the Linux system are improved, and manpower, material resources and time cost are saved.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
Fig. 5 is a block diagram illustrating a hardware platform 500 for fault location, according to an example embodiment. For example, the hardware platform 500 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 5, hardware platform 500 may include one or more of the following components: processing component 502, memory 504, power component 506, multimedia component 508, audio component 510, input/output (I/O) interface 512, sensor component 514, and communication component 516. It should be noted that the hardware platform 500 runs with a Linux system, and the hardware platform 500 is configured with a debugging interface.
The processing component 502 generally controls the overall operation of the hardware platform 500, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 502 may include one or more processors 520 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 502 can include one or more modules that facilitate interaction between the processing component 502 and other components. For example, the processing component 502 can include a multimedia module to facilitate interaction between the multimedia component 508 and the processing component 502.
The memory 504 is configured to store various types of data to support operation at the device 500. Examples of such data include instructions for any application or method operating on hardware platform 500, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 504 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power components 504 provide power to the various components of hardware platform 500. Power components 504 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for hardware platform 500.
The multimedia component 508 includes a screen that provides an output interface between the hardware platform 500 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 508 includes a front facing camera and/or a rear facing camera. When the hardware platform 500 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 510 is configured to output and/or input audio signals. For example, audio component 510 includes a Microphone (MIC) configured to receive external audio signals when hardware platform 500 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 504 or transmitted via the communication component 516. In some embodiments, audio component 510 further includes a speaker for outputting audio signals.
The I/O interface 512 provides an interface between the processing component 502 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor component 514 includes one or more sensors for providing various aspects of state assessment for the hardware platform 500. For example, sensor component 514 may detect an open/closed state of device 500, the relative positioning of components, such as a display and keypad of hardware platform 500, the change in position of hardware platform 500 or a component of hardware platform 500, the presence or absence of user contact with hardware platform 500, the orientation or acceleration/deceleration of hardware platform 500, and the change in temperature of hardware platform 500. The sensor assembly 514 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 514 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 514 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 516 is configured to facilitate communication between the hardware platform 500 and other devices in a wired or wireless manner. The hardware platform 500 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication section 514 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 514 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the hardware platform 500 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 504 comprising instructions, executable by the processor 520 of the hardware platform 500 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
A non-transitory computer readable storage medium, instructions in which, when executed by a processor of a terminal, enable the terminal to perform a method of launching an application for a hardware platform running a Linux system, the hardware platform configured with a debug interface, the method comprising:
under the condition that a kernel of the Linux system fails, receiving a first command from a control device through the debugging interface;
in response to the first command, entering a debugging state to traverse and execute processes in a kernel of the Linux system, and recording process state information of each process in the traversing process;
receiving a second command from the control device through the debugging interface;
reading the process state information in response to the second command;
and sending the process state information to the control equipment through the debugging interface so that the control equipment can position the target process with the fault in the kernel according to the process state information.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a predictive manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The above detailed description is made on a fault location method, a fault location device, a fault location system, a hardware platform and a storage medium provided by the present invention, and a specific example is applied in the present document to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (12)

1. A fault location method is applied to a hardware platform running a Linux system, wherein the hardware platform is configured with a debugging interface, and the method comprises the following steps:
under the condition that a kernel of the Linux system fails, receiving a first command from a control device through the debugging interface;
in response to the first command, entering a debugging state to traverse and execute processes in a kernel of the Linux system, and recording process state information of each process in the traversing process;
receiving a second command from the control device through the debugging interface;
reading the process state information in response to the second command;
and sending the process state information to the control equipment through the debugging interface so that the control equipment can position the target process with the fault in the kernel according to the process state information.
2. The method according to claim 1, wherein the first command carries a first address of a target function and a second address of a target cache, and the traversing executes processes in a kernel of the Linux system in response to the first command, and records process state information of each process in the traversing process, including:
and responding to the first command, running the target function according to the first address, wherein the target function is used for traversing and executing processes in a kernel of the Linux system, and storing process state information of each process in the traversing process into a target cache corresponding to the second address.
3. The method of claim 2, wherein the target function comprises a first calling function and a second calling function, and wherein executing the target function comprises:
traversing and executing the process in the kernel of the Linux system by operating the first calling function;
and storing the process state information of each process in the traversal process into a target cache corresponding to the second address by operating the second calling function.
4. The method of claim 3, wherein the target function further comprises a third calling function and a fourth calling function, and wherein prior to executing the first calling function, the method further comprises:
starting interrupt protection for the process in the kernel by running the third calling function;
after determining that all process traversals are complete, the method further comprises:
and executing the fourth calling function to finish interrupt protection on the process in the kernel.
5. The method of any of claims 2 to 4, wherein prior to receiving the first command from the control device, the method further comprises:
and disassembling the kernel file of the Linux system to obtain a first address of the target function and a second address of the target cache.
6. The method according to any one of claims 2 to 4, wherein the second address is carried in the second command, and the reading the process status information in response to the second command comprises:
responding to the second command, and reading storage data with a preset length from a target cache corresponding to the second address;
saving the read storage data as a log file;
the sending the process state information to the control device to enable the control device to locate the target process with the fault in the kernel according to the process state information includes:
and sending the log file to the control equipment so that the control equipment positions the target process with the fault in the kernel according to the process state information recorded in the log file.
7. The method of claim 1, wherein after the receiving the first command from the control device and before the receiving the second command from the control device, the method further comprises:
receiving a third command from the control device through the debugging interface;
and in response to the third command, exiting the debugging state of the Linux system to continue executing subsequent commands.
8. The method of any of claims 1 to 7, wherein the debug interface is an Ejtag interface.
9. A fault location device, applied to a hardware platform running a Linux system, wherein the hardware platform is configured with a debugging interface, and the device comprises:
the first receiving module is used for receiving a first command from the control equipment through the debugging interface under the condition that the kernel of the Linux system has a fault;
the traversal module is used for responding to the first command, entering a debugging state, so as to traverse and execute processes in a kernel of the Linux system, and recording process state information of each process in the traversal process;
the second receiving module is used for receiving a second command from the control equipment through the debugging interface;
a reading module, configured to read the process state information in response to the second command;
and the sending module is used for sending the process state information to the control equipment through the debugging interface so that the control equipment can position the target process with the fault in the kernel according to the process state information.
10. A fault location system, characterized in that the fault location system comprises: a control device, a converter, and a hardware platform; the control equipment is connected with the hardware platform through the converter; the hardware platform is configured with a debugging interface;
the converter is used for carrying out protocol conversion on the command from the control equipment and sending the command to the debugging interface, and carrying out protocol conversion on the data from the debugging interface and sending the data to the control equipment;
the hardware platform comprises: the device comprises a first receiving module, a traversing module, a second receiving module, a reading module and a sending module; wherein the content of the first and second substances,
the first receiving module is used for receiving a first command from the control device through the debugging interface under the condition that the kernel of the Linux system fails;
the traversing module is used for responding to the first command, entering a debugging state to traverse and execute processes in a kernel of the Linux system, and recording process state information of each process in the traversing process;
the second receiving module is used for receiving a second command from the control equipment through the debugging interface;
the reading module is used for responding to the second command and reading the process state information;
the sending module is configured to send the process state information to the control device through the debugging interface, so that the control device locates a target process in the kernel, where a fault occurs, according to the process state information.
11. A hardware platform running a Linux system and configured with a debug interface, the hardware platform comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors the one or more programs comprising instructions for:
under the condition that a kernel of the Linux system fails, receiving a first command from a control device through the debugging interface;
in response to the first command, entering a debugging state to traverse and execute processes in a kernel of the Linux system, and recording process state information of each process in the traversing process;
receiving a second command from the control device through the debugging interface;
reading the process state information in response to the second command;
and sending the process state information to the control equipment through the debugging interface so that the control equipment can position the target process with the fault in the kernel according to the process state information.
12. A readable storage medium, characterized in that the instructions in the storage medium, when executed by a processor of a hardware platform, enable the hardware platform to perform the fault localization method according to one or more of the method claims 1-8.
CN202010605429.9A 2020-06-29 2020-06-29 Fault positioning method, device, system, hardware platform and storage medium Pending CN113934561A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010605429.9A CN113934561A (en) 2020-06-29 2020-06-29 Fault positioning method, device, system, hardware platform and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010605429.9A CN113934561A (en) 2020-06-29 2020-06-29 Fault positioning method, device, system, hardware platform and storage medium

Publications (1)

Publication Number Publication Date
CN113934561A true CN113934561A (en) 2022-01-14

Family

ID=79272901

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010605429.9A Pending CN113934561A (en) 2020-06-29 2020-06-29 Fault positioning method, device, system, hardware platform and storage medium

Country Status (1)

Country Link
CN (1) CN113934561A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024002325A1 (en) * 2022-07-01 2024-01-04 北京比特大陆科技有限公司 Positioning method for faulty chip on computing device, device, and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070266376A1 (en) * 2006-05-11 2007-11-15 Samsung Electronics Co., Ltd. Kernel-aware debugging system, medium, and method
US20190318081A1 (en) * 2018-04-16 2019-10-17 International Business Machines Corporation Injecting trap code in an execution path of a process executing a program to generate a trap address range to detect potential malicious code

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070266376A1 (en) * 2006-05-11 2007-11-15 Samsung Electronics Co., Ltd. Kernel-aware debugging system, medium, and method
US20190318081A1 (en) * 2018-04-16 2019-10-17 International Business Machines Corporation Injecting trap code in an execution path of a process executing a program to generate a trap address range to detect potential malicious code

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
NERD呱呱: "杭电操作系统实验二报告", 《HTTPS://BLOG.CSDN.NET/QQ_36285879/ARTICLE/DETAILS/88771623》 *
NERD呱呱: "杭电操作系统实验二报告", 《HTTPS://BLOG.CSDN.NET/QQ_36285879/ARTICLE/DETAILS/88771623》, 24 March 2019 (2019-03-24), pages 1 - 3 *
PONY: "基于龙芯一号IP核的EJTAG调试", 《HTTPS://BLOG.CSDN.NET/HUGUOHU2006/ARTICLE/DETAILS/6896590》, 22 October 2011 (2011-10-22), pages 1 - 6 *
李学勇,孙甲霞,付俊辉,成继福: "《计算机操作系统 第4版》", 西安电子科技大学出版社, pages: 42 - 47 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024002325A1 (en) * 2022-07-01 2024-01-04 北京比特大陆科技有限公司 Positioning method for faulty chip on computing device, device, and storage medium

Similar Documents

Publication Publication Date Title
CN105955765B (en) Application preloading method and device
EP3239841A1 (en) Method and device for managing application program
CN109298995B (en) Performance test method and device, electronic equipment and storage medium
CN115527603A (en) Hard disk performance testing method, central control machine and testing machine
CN113934561A (en) Fault positioning method, device, system, hardware platform and storage medium
CN116069612A (en) Abnormality positioning method and device and electronic equipment
CN113377370A (en) File processing method and device, electronic equipment and storage medium
CN112256563A (en) Android application stability testing method and device, electronic equipment and storage medium
CN110457192A (en) A kind of file monitor method and device, terminal, storage medium
CN106354595B (en) Mobile terminal, hardware component state detection method and device
CN115604132A (en) Remote control method and device for server, electronic equipment and readable storage medium
CN104765686A (en) Application program testing method and device
CN113778696A (en) Thread control method, thread control device, and storage medium
CN113407368B (en) Process management method and device and electronic equipment
CN111367787B (en) Method and device for checking leakage of page script object in application
CN112363917B (en) Application program debugging exception processing method and device, electronic equipment and medium
CN110659081B (en) File processing method and device for program object and electronic equipment
CN114531493B (en) Request processing method and device, electronic equipment and storage medium
CN113590470A (en) Software debugging method and device, electronic equipment and storage medium
CN109933517A (en) Test method, device and equipment based on android system
CN106598811A (en) Abnormal event processing method and apparatus, and terminal
CN114661606A (en) Program debugging method and device, electronic equipment and storage medium
CN112416697A (en) Information processing device, terminal and storage medium
CN116450534B (en) Method, device, equipment and medium for generating mobile terminal application program
CN110618938B (en) Quick application entry debugging method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination