CN116662134A - Linux kernel performance tracking tool based on eBPF - Google Patents

Linux kernel performance tracking tool based on eBPF Download PDF

Info

Publication number
CN116662134A
CN116662134A CN202310564324.7A CN202310564324A CN116662134A CN 116662134 A CN116662134 A CN 116662134A CN 202310564324 A CN202310564324 A CN 202310564324A CN 116662134 A CN116662134 A CN 116662134A
Authority
CN
China
Prior art keywords
kernel
ebpf
tracking
data
tool
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310564324.7A
Other languages
Chinese (zh)
Inventor
李宏杰
常盛华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anyang Institute of Technology
Original Assignee
Anyang Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anyang Institute of Technology filed Critical Anyang Institute of Technology
Priority to CN202310564324.7A priority Critical patent/CN116662134A/en
Publication of CN116662134A publication Critical patent/CN116662134A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses a Linux kernel performance tracking tool based on eBPF, which comprises a user mode eperf tool frame and a kernel mode netlink comm communication module; the user mode eperf tool is used for converting a tracking target to be tracked into bpf byte codes executable by the kernel mode ebpf engine, and loading the bpf byte codes into the kernel ebpf engine through ebpf system call to execute tracking of relevant kprobe points; the kernel mode netlink comm communication module is responsible for collecting original kprobe data captured by epbf, transmitting the data to a user mode through a netlink interface, and storing the original binary tracking data to a file by a user mode eperf tool. The method can be applied to the fields of cloud computing, high-performance computing, embedded type and the like, and is used for kernel performance optimization, performance problem tracking, performance analysis and the like in the related fields.

Description

Linux kernel performance tracking tool based on eBPF
Technical Field
The invention particularly relates to an eBPF-based Linux kernel performance tracking tool, which belongs to the technical field of computer operating systems and is mainly applied to the fields of cloud computing, high-performance computing, embedded type and the like.
Background
The Linux operating system is widely applied to the fields of big data and cloud computing, high-performance computing, desktop and mobile operating systems, embedded systems and the like, and is the system software which is the most widely applied and also the most complex. Modern operating systems mostly consist of a user mode and a kernel mode, and user mode programs mostly interact with the operating system kernel through POSIX interface semantics. The kernel of the operating system provides a multitasking mechanism, a process virtual memory space, file abstraction and the like, and the kernel mode is strictly abstract and isolated from the user mode, so that the user mode program development does not need to pay attention to hardware details, and the processes are mutually independent, thereby facilitating the user mode development. In contrast, the development of the linux kernel function and module is very difficult, mainly because the linux is a macro kernel operating system, all kernel threads share a memory address space, the kernel threads all work in a "privileged" state, and any thread abnormality may cause the operating system to crash. In addition, the linux kernel consists of a plurality of subsystems, such as a process scheduling subsystem, a memory management subsystem, a storage subsystem, a network, a drive subsystem and the like; taking a storage subsystem as an example, the storage subsystem further comprises a VFS layer, a block layer, a device driving layer and the like. The linux kernel development and performance tuning mostly adopts tracking (tracing) and analysis (profiling) technology, but a gdb-like tool is rarely used, because the problems are mostly caused on the interlayer coupling of the system due to the complexity of the system, the problem concealment is strong, and the reproduction is difficult.
In order to facilitate tracking kernel function running states, the Linux kernel provides a set of performance tracking infrastructure: kprobe. The kprobe technology supports dynamic insertion of detection points into the detection objective function, provides possibility for collecting state information of an online running system, and does not influence the original execution flow. But kprobe only provides a set of kernel function interfaces, so that kernel function tracking is far from being completed. And ftrace is built into the linux kernel as a kprobe-based auxiliary tool kit, and exposes a set of user-state access interfaces through debug. However, ftrace only supports limited functions, such as obtaining a function_graph (function_graph) of kernel function call, and has poor flexibility, and cannot fully exert the capability provided by kprobe, and cannot complete any kernel function performance tracking.
To fully exploit kprobe capabilities, a method is needed to call directly through the kernel function interface provided by kprobe. The system map tool directly operates the kprobe function interface by generating the linux kernel module, so that more flexible kernel function tracking capability is realized. The system map provides a script programming interface with a class of C language, supports the user to trace any kernel function based on script writing, and has better flexibility. However, the system map has a fatal disadvantage in that it compiles the user script directly into a kernel module and then implements the call to the kprobe interface by inserting the kernel module form on-line. Since linux is a macro kernel (monolithic kernel), the kernel module is directly loaded into the kernel address space, and has very large authority and very large safety problem, any small flaw can directly cause the kernel to crash or jam. Therefore, the systomap tool is not accepted by the kernel community from the invention to the present, and is not suggested to be applied in the production environment, so that the application range of the systomap tool is greatly limited.
The eBPF is evolved from a BPF, the BPF is an original interface used for filtering network messages in a kernel, and the filtering function of the BPF realizes the support of dynamic packet filtering rules by interpreting and executing user instructions through a BPF language virtual machine. The eBPF is an expansion and enhancement of the BPF language, and the BPF is stripped from the network packet filter to become a general virtual machine language execution environment in the linux kernel. Through the eBPF, a user can execute any sandbox program in the kernel, and higher flexibility and infinite possibility are provided for the linux kernel.
The eBPF and kprobe are combined, so that new possibility is provided for kernel function behavior tracking, and the bpftrace tool is developed based on the thought. The bpftrace tool emulates a systomap implementation, but unlike the systomap compiling the trace script into a kernel module, bpftrace compiles the trace script into an ebpf executable program that is handed over to the kernel ebpf sandbox environment for execution via bpf () system calls. Because the ebpf sandbox environment only executes the preferential ebpf instruction set and can execute strict security check on the ebpf program to be executed, the executed ebpf program can be ensured not to influence the kernel running environment. Indeed, various schemes based on ebpf implementation (XDP, cilium, etc.) have been applied in large numbers to production environments, the security of which has been demonstrated. bpftrace supports kprobe tracking, the use mode of the bpftrace mimics a system map, a user is supported to specify a function tracking point (probe) through a bpftrace script, and after the tracking point is captured, the specified function in the corresponding tracking script is executed. The tool provides the capability of carrying out security tracking on any kernel function, and is suitable for carrying out stable positioning, information capturing and the like on kernel module problems. However, bpftrace is not suitable for kernel function performance tracking, and mainly has the following problems: 1. the bpftrace design is mainly used for replacing unsafe tracing tool design similar to a system map, and more flexible kernel tracing points (trace points), kprobe points and the like are traced through the system map script. The mechanism of bpftrace script makes it more flexible but not suitable for kernel performance tracking; 2. the execution state information of the bpftrace script is realized through the map of the ebpf when working, and the space is limited. Kernel function tracking generates large amounts of data (hundreds of MB or even GB levels), so bpftrace is only suitable for tracking a short, small number of kernel tracking points. And therefore is better suited for tracking specific kernel functions for problem localization or optimization rather than performance tracking.
Perf is a performance analysis tool commonly used under linux, and Perf is realized based on an event sampling mechanism. It may collect hardware events, software events, kernel predefined tracepoints. However, perf has a serious drawback: 1. the accuracy is poor. Because the method is realized based on an event sampling mechanism, the problem of sampling leakage exists; especially when the observation function performs fast scenes, it is difficult to acquire. Therefore, the perf results often cannot accurately reflect the performance tracking results of the observed objects; 2. the flexibility is poor. The supported event categories are limited and of a fixed type. For example, perf cannot be done to observe a particular kernel function that does not contain tracepoint.
English expression and Chinese translation of the English abbreviations are as follows:
eBPF: extended Berkeley Packet Filter expanded Berkeley packet filter
Linux: GNU/Linux free-to-use and free-to-spread UNIX-like operating system
blktrace: linux block layer IO performance tracking tool
perf: a linux performance analysis tool realized based on a target sampling technology,
kprobe: kernel probes linux kernel function tracking technique
Disclosure of Invention
Aiming at the problems of the existing linux performance tracking tool, the invention provides a design and implementation scheme of a linux kernel performance tracking tool based on ebpf.
The technical scheme of the invention is as follows:
the Linux kernel performance tracking tool based on the eBPF comprises a user mode eperf tool frame and a kernel mode netlink comm communication module;
the user mode eperf tool is used for converting a tracking target to be tracked into bpf byte codes executable by the kernel mode ebpf engine, and loading the bpf byte codes into the kernel ebpf engine through ebpf system call to execute tracking of relevant kprobe points; the user-state eprf tool also carries out asynchronous analysis on the obtained original binary tracking data and outputs the data in a user-understandable or visual form;
the kernel mode netlink comm communication module is responsible for collecting original kprobe data captured by epbf, transmitting the data to a user mode through a netlink interface, and storing the original binary tracking data to a file by a user mode eperf tool.
A Linux kernel performance tracking method based on eBPF comprises the following steps:
firstly, constructing the user mode eperf tool frame;
secondly, constructing a large data volume communication module netlink comm of the ebpf kernel space and the user space, and transmitting and storing information tracked by the ebpf to the user space through the netlink; the method overcomes the limitation of bpf native map, makes large data volume trace possible, and is one of key points constituting the patent scheme.
Thirdly, designing and data grabbing definitions and formats, including time stamps of kprobe and kretprobe points, function stacks, function parameter entering and return value information and the like, converting the information into binary data packets according to a specified format, and meanwhile, including verification information;
fourth, data analysis, through the user state performance data analysis engine, the tracking original data is output in an understandable or visual form.
The beneficial effects are that: the invention provides a design and implementation scheme of a linux kernel performance tracking tool based on ebpf, and the scheme has two points: 1. a perf-like tool is realized again by using bpf; 2. the map mechanism of the native bpf cannot support trace on a large amount of data due to space limitation, a data storage mechanism based on a netlink mechanism is introduced, and massive data generated in the trace is stored to a user state, so that the trace tool based on the bpf is possible to realize. Therefore, the method can support the performance tracking of any kernel function of the linux kernel; meanwhile, based on the tool framework, various analysis plug-ins can be flexibly designed, and various types of kernel tracking and performance analysis display are realized; based on the safety characteristics of ebpf, the tool can be directly applied to the production environment for problem tracking, and the scheme can be applied to the fields of cloud computing, high-performance computing, embedded type and the like, and is used for core performance optimization, performance problem tracking, performance analysis and the like in the related fields.
Drawings
Fig. 1 is a schematic diagram of the present invention.
Detailed Description
The present invention will be described in detail with reference to fig. 1.
The invention provides a design and implementation scheme of a linux kernel performance tracking tool based on ebpf. The tool is named eperf (ebpf perf) temporarily and mainly consists of two parts: the user mode eperf tool and the kernel mode netlink comm communication module. The user mode eperf tool is used for converting a kernel observation object (tracking target) to be tracked into bpf byte codes executable by the kernel ebpf engine and loading the bpf byte codes into the kernel ebpf engine through ebpf system call. The user-state eprf tool also parses the raw binary trace data obtained and outputs it in a form that is understandable or visual to the user. This step is performed asynchronously, i.e. the analysis is performed after the trace has obtained trace raw data. The kernel mode netlink comm communication module is responsible for collecting original kprobe data captured by epbf, transmitting the data to a user mode through a netlink interface, and storing the original binary tracking data to a file by a user mode eperf tool. The eperf tool supports asynchronous conversion of raw trace result parsing into final result output.
The specific implementation steps of the scheme are as follows:
first, we implement a framework that supports tracking arbitrary kernel symbols through ebpf scripts, i.e., eperf in fig. 1, based on ebpf and kprobe designs. The framework refers to bpftrace implementation, a function or event to be tracked is converted into an ebpf binary byte code, and then the ebpf binary byte code is loaded to a kernel for execution through the ebpf framework, so that the tracking of relevant kprobe points is realized. This process involves essentially translating, compiling the trace objectives into bytecodes that can be executed by the kernel ebpf engine. Taking the observation object as a designated kernel module as an example, the "execution engine" can analyze and obtain all kernel function lists of the kernel module, and convert the trace logic compilation into bpf byte codes. The execution engine will then pass the bytecode to the kernel ebpf engine for execution via ebpf system call.
Secondly, in order to overcome the limit of the map mechanism space limitation, a communication module netlink comm of the ebpf kernel space and the user space is designed based on the linux netlink, and information tracked by the ebpf is transmitted through the netlink and stored in the user space. The native linux ebpf framework interacts with the user control plane and data transfer mainly based on the map mechanism. However, the map mechanism is more focused on the interaction of the control plane, and for the performance trace scene, massive data of a large number of tracked points needs to be collected, and the map mechanism cannot meet the requirement. Therefore, our framework is realized based on a netlink mechanism, and when ebpf captures a related kprobe event in a kernel mode, the data is transferred to a user mode through the netlink; and saving the data to the file in the user mode.
Again, we design and data-grab definitions and formats, mainly including time stamps of kprobe and kretprobe points, function stacks, function entry and return value information, etc. And converts the information to binary data packets in a specified format, while containing verification information. This ensures more efficient transfer of data to user mode while ensuring data integrity.
Finally, we implement a user-state performance data analysis engine for exposing trace raw data in an understandable or visual form. The eperf makes the kernel performance data trace and analysis asynchronous in two phases, where there are two considerations: 1. under high pressure, a large number of kprobe events and thus a large number of raw trace data are generated, and real-time analysis is difficult to achieve, so that the trace stage mainly performs raw data collection. 2. The performance analysis of the completion requires the acquisition of completion data for each kprobe, and performance tracking is often in units of specific event segments. For example, we trace a specific module for 5 minutes and take the complete raw data before analyzing to get a complete view of the operation.
According to the Linux kernel performance tracking tool and method based on the ebpf, the safety characteristic of the ebpf and the kernel kprobe mechanism are fully utilized, so that the Linux kernel performance can be tracked through efficient and safe support, and the defects of the conventional trace tools are overcome. Firstly, a framework supporting tracking of any kernel symbol through an ebpf script is realized based on ebpf and kprobe design; secondly, in order to overcome the limit of limited space of an ebpf map mechanism, a communication mechanism of an ebpf kernel space and a user space is designed based on a linux netlink, and information tracked by the kernel is transmitted through the netlink and stored in the user space; thirdly, designing and data grabbing definitions and formats, mainly comprising time stamps of kprobe and kretprobe points, function stacks, function parameter entering and return value information and the like, and transmitting and storing the information to a user space in a binary form; finally, a user state performance data analysis and display tool is realized, and the tracking original data is displayed in an understandable or visual form. The method can be widely applied to performance tracking and analysis of Linux kernels in different fields such as cloud computing, high-performance computing and embedded type.

Claims (2)

1. The Linux kernel performance tracking tool based on the eBPF is characterized by comprising a user mode eperf tool framework and a kernel mode netlink comm communication module;
the user mode eperf tool is used for converting a tracking target to be tracked into bpf byte codes executable by the kernel mode ebpf engine, and loading the bpf byte codes into the kernel ebpf engine through ebpf system call to execute tracking of relevant kprobe points; the user-state eprf tool also carries out asynchronous analysis on the obtained original binary tracking data and outputs the data in a user-understandable or visual form;
the kernel mode netlink comm communication module is responsible for collecting original kprobe data captured by epbf, transmitting the data to a user mode through a netlink interface, and storing the original binary tracking data to a file by a user mode eperf tool.
2. An eBPF-based Linux kernel performance tracking method, which uses the eBPF-based Linux kernel performance tracking tool according to claim 1, comprises the following steps:
firstly, constructing the user mode eperf tool frame;
secondly, constructing a large data volume communication module netlink comm of the ebpf kernel space and the user space, and transmitting and storing information tracked by the ebpf to the user space through the netlink;
thirdly, designing and data grabbing definitions and formats, including time stamps of kprobe and kretprobe points, function stacks, function parameter entering and return value information and the like, converting the information into binary data packets according to a specified format, and meanwhile, including verification information;
fourth, data analysis, through the user state performance data analysis engine, the tracking original data is output in an understandable or visual form.
CN202310564324.7A 2023-05-17 2023-05-17 Linux kernel performance tracking tool based on eBPF Pending CN116662134A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310564324.7A CN116662134A (en) 2023-05-17 2023-05-17 Linux kernel performance tracking tool based on eBPF

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310564324.7A CN116662134A (en) 2023-05-17 2023-05-17 Linux kernel performance tracking tool based on eBPF

Publications (1)

Publication Number Publication Date
CN116662134A true CN116662134A (en) 2023-08-29

Family

ID=87725270

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310564324.7A Pending CN116662134A (en) 2023-05-17 2023-05-17 Linux kernel performance tracking tool based on eBPF

Country Status (1)

Country Link
CN (1) CN116662134A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117215901A (en) * 2023-11-09 2023-12-12 华南师范大学 Programming exercise evaluation method, system, equipment and medium based on dynamic tracking

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117215901A (en) * 2023-11-09 2023-12-12 华南师范大学 Programming exercise evaluation method, system, equipment and medium based on dynamic tracking
CN117215901B (en) * 2023-11-09 2024-03-08 华南师范大学 Programming exercise evaluation method, system, equipment and medium based on dynamic tracking

Similar Documents

Publication Publication Date Title
US8108839B2 (en) Method and apparatus for tracing execution of computer programming code using dynamic trace enablement
US7150006B2 (en) Techniques for managed code debugging
EP1170661A2 (en) Method and system for improving performance of applications that employ a cross-language interface
US10614227B2 (en) Method and system for identifying functional attributes that change the intended operation of a compiled binary extracted from a target system
CN112181833A (en) Intelligent fuzzy test method, device and system
US20120151450A1 (en) Platform-Agnostic Diagnostic Data Collection and Display
CN102508775A (en) Interactive automation test system
CN101183332A (en) Method and device for automatically generating testing datasets by program content
CN116662134A (en) Linux kernel performance tracking tool based on eBPF
CN109542444B (en) JAVA application monitoring method, device, server and storage medium
CN115328796A (en) Software vulnerability auxiliary positioning method and system for ARM architecture
CN115705250A (en) Monitoring stack usage to optimize programs
CN112612697A (en) Software defect testing and positioning method and system based on byte code technology
CN116719579A (en) AI model observability realization method and device, electronic equipment and storage medium
CN109344083B (en) Program debugging method, device and equipment and readable storage medium
CN112861138A (en) Software security analysis method and analysis device, electronic device, and storage medium
CN116521231A (en) Reference model for SPARC V8 instruction set dynamic simulation verification
EP2587380B1 (en) Runtime environment and method for non-invasive monitoring of software applications
CN112905474B (en) Hardware-based advanced program dynamic control flow tracking method and device
CN109062797B (en) Method and device for generating information
US11106522B1 (en) Process memory resurrection: running code in-process after death
Andrzejak et al. Confguru-A system for fully automated debugging of configuration errors
RU2390821C1 (en) Dynamic instrumentation technique
CN117555811B (en) Embedded software analysis method, device and storage medium based on static symbol execution
Lim et al. Modeling code manipulation in JIT compilers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination