CN116662134A - Linux kernel performance tracking tool based on eBPF - Google Patents
Linux kernel performance tracking tool based on eBPF Download PDFInfo
- Publication number
- CN116662134A CN116662134A CN202310564324.7A CN202310564324A CN116662134A CN 116662134 A CN116662134 A CN 116662134A CN 202310564324 A CN202310564324 A CN 202310564324A CN 116662134 A CN116662134 A CN 116662134A
- Authority
- CN
- China
- Prior art keywords
- kernel
- ebpf
- tracking
- data
- tool
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 claims abstract description 14
- 238000004458 analytical method Methods 0.000 claims abstract description 13
- 238000004891 communication Methods 0.000 claims abstract description 12
- 230000000007 visual effect Effects 0.000 claims description 7
- 238000007405 data analysis Methods 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 32
- 230000007246 mechanism Effects 0.000 description 15
- 238000013515 script Methods 0.000 description 12
- 238000013461 design Methods 0.000 description 8
- 239000010410 layer Substances 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 244000035744 Hura crepitans Species 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 239000000523 sample Substances 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 210000004081 cilia Anatomy 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000009812 interlayer coupling reaction Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000010223 real-time analysis Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
- G06F11/3476—Data logging
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Stored Programmes (AREA)
Abstract
The invention discloses a Linux kernel performance tracking tool based on eBPF, which comprises a user mode eperf tool frame and a kernel mode netlink comm communication module; the user mode eperf tool is used for converting a tracking target to be tracked into bpf byte codes executable by the kernel mode ebpf engine, and loading the bpf byte codes into the kernel ebpf engine through ebpf system call to execute tracking of relevant kprobe points; the kernel mode netlink comm communication module is responsible for collecting original kprobe data captured by epbf, transmitting the data to a user mode through a netlink interface, and storing the original binary tracking data to a file by a user mode eperf tool. The method can be applied to the fields of cloud computing, high-performance computing, embedded type and the like, and is used for kernel performance optimization, performance problem tracking, performance analysis and the like in the related fields.
Description
Technical Field
The invention particularly relates to an eBPF-based Linux kernel performance tracking tool, which belongs to the technical field of computer operating systems and is mainly applied to the fields of cloud computing, high-performance computing, embedded type and the like.
Background
The Linux operating system is widely applied to the fields of big data and cloud computing, high-performance computing, desktop and mobile operating systems, embedded systems and the like, and is the system software which is the most widely applied and also the most complex. Modern operating systems mostly consist of a user mode and a kernel mode, and user mode programs mostly interact with the operating system kernel through POSIX interface semantics. The kernel of the operating system provides a multitasking mechanism, a process virtual memory space, file abstraction and the like, and the kernel mode is strictly abstract and isolated from the user mode, so that the user mode program development does not need to pay attention to hardware details, and the processes are mutually independent, thereby facilitating the user mode development. In contrast, the development of the linux kernel function and module is very difficult, mainly because the linux is a macro kernel operating system, all kernel threads share a memory address space, the kernel threads all work in a "privileged" state, and any thread abnormality may cause the operating system to crash. In addition, the linux kernel consists of a plurality of subsystems, such as a process scheduling subsystem, a memory management subsystem, a storage subsystem, a network, a drive subsystem and the like; taking a storage subsystem as an example, the storage subsystem further comprises a VFS layer, a block layer, a device driving layer and the like. The linux kernel development and performance tuning mostly adopts tracking (tracing) and analysis (profiling) technology, but a gdb-like tool is rarely used, because the problems are mostly caused on the interlayer coupling of the system due to the complexity of the system, the problem concealment is strong, and the reproduction is difficult.
In order to facilitate tracking kernel function running states, the Linux kernel provides a set of performance tracking infrastructure: kprobe. The kprobe technology supports dynamic insertion of detection points into the detection objective function, provides possibility for collecting state information of an online running system, and does not influence the original execution flow. But kprobe only provides a set of kernel function interfaces, so that kernel function tracking is far from being completed. And ftrace is built into the linux kernel as a kprobe-based auxiliary tool kit, and exposes a set of user-state access interfaces through debug. However, ftrace only supports limited functions, such as obtaining a function_graph (function_graph) of kernel function call, and has poor flexibility, and cannot fully exert the capability provided by kprobe, and cannot complete any kernel function performance tracking.
To fully exploit kprobe capabilities, a method is needed to call directly through the kernel function interface provided by kprobe. The system map tool directly operates the kprobe function interface by generating the linux kernel module, so that more flexible kernel function tracking capability is realized. The system map provides a script programming interface with a class of C language, supports the user to trace any kernel function based on script writing, and has better flexibility. However, the system map has a fatal disadvantage in that it compiles the user script directly into a kernel module and then implements the call to the kprobe interface by inserting the kernel module form on-line. Since linux is a macro kernel (monolithic kernel), the kernel module is directly loaded into the kernel address space, and has very large authority and very large safety problem, any small flaw can directly cause the kernel to crash or jam. Therefore, the systomap tool is not accepted by the kernel community from the invention to the present, and is not suggested to be applied in the production environment, so that the application range of the systomap tool is greatly limited.
The eBPF is evolved from a BPF, the BPF is an original interface used for filtering network messages in a kernel, and the filtering function of the BPF realizes the support of dynamic packet filtering rules by interpreting and executing user instructions through a BPF language virtual machine. The eBPF is an expansion and enhancement of the BPF language, and the BPF is stripped from the network packet filter to become a general virtual machine language execution environment in the linux kernel. Through the eBPF, a user can execute any sandbox program in the kernel, and higher flexibility and infinite possibility are provided for the linux kernel.
The eBPF and kprobe are combined, so that new possibility is provided for kernel function behavior tracking, and the bpftrace tool is developed based on the thought. The bpftrace tool emulates a systomap implementation, but unlike the systomap compiling the trace script into a kernel module, bpftrace compiles the trace script into an ebpf executable program that is handed over to the kernel ebpf sandbox environment for execution via bpf () system calls. Because the ebpf sandbox environment only executes the preferential ebpf instruction set and can execute strict security check on the ebpf program to be executed, the executed ebpf program can be ensured not to influence the kernel running environment. Indeed, various schemes based on ebpf implementation (XDP, cilium, etc.) have been applied in large numbers to production environments, the security of which has been demonstrated. bpftrace supports kprobe tracking, the use mode of the bpftrace mimics a system map, a user is supported to specify a function tracking point (probe) through a bpftrace script, and after the tracking point is captured, the specified function in the corresponding tracking script is executed. The tool provides the capability of carrying out security tracking on any kernel function, and is suitable for carrying out stable positioning, information capturing and the like on kernel module problems. However, bpftrace is not suitable for kernel function performance tracking, and mainly has the following problems: 1. the bpftrace design is mainly used for replacing unsafe tracing tool design similar to a system map, and more flexible kernel tracing points (trace points), kprobe points and the like are traced through the system map script. The mechanism of bpftrace script makes it more flexible but not suitable for kernel performance tracking; 2. the execution state information of the bpftrace script is realized through the map of the ebpf when working, and the space is limited. Kernel function tracking generates large amounts of data (hundreds of MB or even GB levels), so bpftrace is only suitable for tracking a short, small number of kernel tracking points. And therefore is better suited for tracking specific kernel functions for problem localization or optimization rather than performance tracking.
Perf is a performance analysis tool commonly used under linux, and Perf is realized based on an event sampling mechanism. It may collect hardware events, software events, kernel predefined tracepoints. However, perf has a serious drawback: 1. the accuracy is poor. Because the method is realized based on an event sampling mechanism, the problem of sampling leakage exists; especially when the observation function performs fast scenes, it is difficult to acquire. Therefore, the perf results often cannot accurately reflect the performance tracking results of the observed objects; 2. the flexibility is poor. The supported event categories are limited and of a fixed type. For example, perf cannot be done to observe a particular kernel function that does not contain tracepoint.
English expression and Chinese translation of the English abbreviations are as follows:
eBPF: extended Berkeley Packet Filter expanded Berkeley packet filter
Linux: GNU/Linux free-to-use and free-to-spread UNIX-like operating system
blktrace: linux block layer IO performance tracking tool
perf: a linux performance analysis tool realized based on a target sampling technology,
kprobe: kernel probes linux kernel function tracking technique
Disclosure of Invention
Aiming at the problems of the existing linux performance tracking tool, the invention provides a design and implementation scheme of a linux kernel performance tracking tool based on ebpf.
The technical scheme of the invention is as follows:
the Linux kernel performance tracking tool based on the eBPF comprises a user mode eperf tool frame and a kernel mode netlink comm communication module;
the user mode eperf tool is used for converting a tracking target to be tracked into bpf byte codes executable by the kernel mode ebpf engine, and loading the bpf byte codes into the kernel ebpf engine through ebpf system call to execute tracking of relevant kprobe points; the user-state eprf tool also carries out asynchronous analysis on the obtained original binary tracking data and outputs the data in a user-understandable or visual form;
the kernel mode netlink comm communication module is responsible for collecting original kprobe data captured by epbf, transmitting the data to a user mode through a netlink interface, and storing the original binary tracking data to a file by a user mode eperf tool.
A Linux kernel performance tracking method based on eBPF comprises the following steps:
firstly, constructing the user mode eperf tool frame;
secondly, constructing a large data volume communication module netlink comm of the ebpf kernel space and the user space, and transmitting and storing information tracked by the ebpf to the user space through the netlink; the method overcomes the limitation of bpf native map, makes large data volume trace possible, and is one of key points constituting the patent scheme.
Thirdly, designing and data grabbing definitions and formats, including time stamps of kprobe and kretprobe points, function stacks, function parameter entering and return value information and the like, converting the information into binary data packets according to a specified format, and meanwhile, including verification information;
fourth, data analysis, through the user state performance data analysis engine, the tracking original data is output in an understandable or visual form.
The beneficial effects are that: the invention provides a design and implementation scheme of a linux kernel performance tracking tool based on ebpf, and the scheme has two points: 1. a perf-like tool is realized again by using bpf; 2. the map mechanism of the native bpf cannot support trace on a large amount of data due to space limitation, a data storage mechanism based on a netlink mechanism is introduced, and massive data generated in the trace is stored to a user state, so that the trace tool based on the bpf is possible to realize. Therefore, the method can support the performance tracking of any kernel function of the linux kernel; meanwhile, based on the tool framework, various analysis plug-ins can be flexibly designed, and various types of kernel tracking and performance analysis display are realized; based on the safety characteristics of ebpf, the tool can be directly applied to the production environment for problem tracking, and the scheme can be applied to the fields of cloud computing, high-performance computing, embedded type and the like, and is used for core performance optimization, performance problem tracking, performance analysis and the like in the related fields.
Drawings
Fig. 1 is a schematic diagram of the present invention.
Detailed Description
The present invention will be described in detail with reference to fig. 1.
The invention provides a design and implementation scheme of a linux kernel performance tracking tool based on ebpf. The tool is named eperf (ebpf perf) temporarily and mainly consists of two parts: the user mode eperf tool and the kernel mode netlink comm communication module. The user mode eperf tool is used for converting a kernel observation object (tracking target) to be tracked into bpf byte codes executable by the kernel ebpf engine and loading the bpf byte codes into the kernel ebpf engine through ebpf system call. The user-state eprf tool also parses the raw binary trace data obtained and outputs it in a form that is understandable or visual to the user. This step is performed asynchronously, i.e. the analysis is performed after the trace has obtained trace raw data. The kernel mode netlink comm communication module is responsible for collecting original kprobe data captured by epbf, transmitting the data to a user mode through a netlink interface, and storing the original binary tracking data to a file by a user mode eperf tool. The eperf tool supports asynchronous conversion of raw trace result parsing into final result output.
The specific implementation steps of the scheme are as follows:
first, we implement a framework that supports tracking arbitrary kernel symbols through ebpf scripts, i.e., eperf in fig. 1, based on ebpf and kprobe designs. The framework refers to bpftrace implementation, a function or event to be tracked is converted into an ebpf binary byte code, and then the ebpf binary byte code is loaded to a kernel for execution through the ebpf framework, so that the tracking of relevant kprobe points is realized. This process involves essentially translating, compiling the trace objectives into bytecodes that can be executed by the kernel ebpf engine. Taking the observation object as a designated kernel module as an example, the "execution engine" can analyze and obtain all kernel function lists of the kernel module, and convert the trace logic compilation into bpf byte codes. The execution engine will then pass the bytecode to the kernel ebpf engine for execution via ebpf system call.
Secondly, in order to overcome the limit of the map mechanism space limitation, a communication module netlink comm of the ebpf kernel space and the user space is designed based on the linux netlink, and information tracked by the ebpf is transmitted through the netlink and stored in the user space. The native linux ebpf framework interacts with the user control plane and data transfer mainly based on the map mechanism. However, the map mechanism is more focused on the interaction of the control plane, and for the performance trace scene, massive data of a large number of tracked points needs to be collected, and the map mechanism cannot meet the requirement. Therefore, our framework is realized based on a netlink mechanism, and when ebpf captures a related kprobe event in a kernel mode, the data is transferred to a user mode through the netlink; and saving the data to the file in the user mode.
Again, we design and data-grab definitions and formats, mainly including time stamps of kprobe and kretprobe points, function stacks, function entry and return value information, etc. And converts the information to binary data packets in a specified format, while containing verification information. This ensures more efficient transfer of data to user mode while ensuring data integrity.
Finally, we implement a user-state performance data analysis engine for exposing trace raw data in an understandable or visual form. The eperf makes the kernel performance data trace and analysis asynchronous in two phases, where there are two considerations: 1. under high pressure, a large number of kprobe events and thus a large number of raw trace data are generated, and real-time analysis is difficult to achieve, so that the trace stage mainly performs raw data collection. 2. The performance analysis of the completion requires the acquisition of completion data for each kprobe, and performance tracking is often in units of specific event segments. For example, we trace a specific module for 5 minutes and take the complete raw data before analyzing to get a complete view of the operation.
According to the Linux kernel performance tracking tool and method based on the ebpf, the safety characteristic of the ebpf and the kernel kprobe mechanism are fully utilized, so that the Linux kernel performance can be tracked through efficient and safe support, and the defects of the conventional trace tools are overcome. Firstly, a framework supporting tracking of any kernel symbol through an ebpf script is realized based on ebpf and kprobe design; secondly, in order to overcome the limit of limited space of an ebpf map mechanism, a communication mechanism of an ebpf kernel space and a user space is designed based on a linux netlink, and information tracked by the kernel is transmitted through the netlink and stored in the user space; thirdly, designing and data grabbing definitions and formats, mainly comprising time stamps of kprobe and kretprobe points, function stacks, function parameter entering and return value information and the like, and transmitting and storing the information to a user space in a binary form; finally, a user state performance data analysis and display tool is realized, and the tracking original data is displayed in an understandable or visual form. The method can be widely applied to performance tracking and analysis of Linux kernels in different fields such as cloud computing, high-performance computing and embedded type.
Claims (2)
1. The Linux kernel performance tracking tool based on the eBPF is characterized by comprising a user mode eperf tool framework and a kernel mode netlink comm communication module;
the user mode eperf tool is used for converting a tracking target to be tracked into bpf byte codes executable by the kernel mode ebpf engine, and loading the bpf byte codes into the kernel ebpf engine through ebpf system call to execute tracking of relevant kprobe points; the user-state eprf tool also carries out asynchronous analysis on the obtained original binary tracking data and outputs the data in a user-understandable or visual form;
the kernel mode netlink comm communication module is responsible for collecting original kprobe data captured by epbf, transmitting the data to a user mode through a netlink interface, and storing the original binary tracking data to a file by a user mode eperf tool.
2. An eBPF-based Linux kernel performance tracking method, which uses the eBPF-based Linux kernel performance tracking tool according to claim 1, comprises the following steps:
firstly, constructing the user mode eperf tool frame;
secondly, constructing a large data volume communication module netlink comm of the ebpf kernel space and the user space, and transmitting and storing information tracked by the ebpf to the user space through the netlink;
thirdly, designing and data grabbing definitions and formats, including time stamps of kprobe and kretprobe points, function stacks, function parameter entering and return value information and the like, converting the information into binary data packets according to a specified format, and meanwhile, including verification information;
fourth, data analysis, through the user state performance data analysis engine, the tracking original data is output in an understandable or visual form.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310564324.7A CN116662134A (en) | 2023-05-17 | 2023-05-17 | Linux kernel performance tracking tool based on eBPF |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310564324.7A CN116662134A (en) | 2023-05-17 | 2023-05-17 | Linux kernel performance tracking tool based on eBPF |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116662134A true CN116662134A (en) | 2023-08-29 |
Family
ID=87725270
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310564324.7A Pending CN116662134A (en) | 2023-05-17 | 2023-05-17 | Linux kernel performance tracking tool based on eBPF |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116662134A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117215901A (en) * | 2023-11-09 | 2023-12-12 | 华南师范大学 | Programming exercise evaluation method, system, equipment and medium based on dynamic tracking |
-
2023
- 2023-05-17 CN CN202310564324.7A patent/CN116662134A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117215901A (en) * | 2023-11-09 | 2023-12-12 | 华南师范大学 | Programming exercise evaluation method, system, equipment and medium based on dynamic tracking |
CN117215901B (en) * | 2023-11-09 | 2024-03-08 | 华南师范大学 | Programming exercise evaluation method, system, equipment and medium based on dynamic tracking |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8108839B2 (en) | Method and apparatus for tracing execution of computer programming code using dynamic trace enablement | |
US7150006B2 (en) | Techniques for managed code debugging | |
EP1170661A2 (en) | Method and system for improving performance of applications that employ a cross-language interface | |
US10614227B2 (en) | Method and system for identifying functional attributes that change the intended operation of a compiled binary extracted from a target system | |
CN112181833A (en) | Intelligent fuzzy test method, device and system | |
US20120151450A1 (en) | Platform-Agnostic Diagnostic Data Collection and Display | |
CN102508775A (en) | Interactive automation test system | |
CN101183332A (en) | Method and device for automatically generating testing datasets by program content | |
CN116662134A (en) | Linux kernel performance tracking tool based on eBPF | |
CN109542444B (en) | JAVA application monitoring method, device, server and storage medium | |
CN115328796A (en) | Software vulnerability auxiliary positioning method and system for ARM architecture | |
CN115705250A (en) | Monitoring stack usage to optimize programs | |
CN112612697A (en) | Software defect testing and positioning method and system based on byte code technology | |
CN116719579A (en) | AI model observability realization method and device, electronic equipment and storage medium | |
CN109344083B (en) | Program debugging method, device and equipment and readable storage medium | |
CN112861138A (en) | Software security analysis method and analysis device, electronic device, and storage medium | |
CN116521231A (en) | Reference model for SPARC V8 instruction set dynamic simulation verification | |
EP2587380B1 (en) | Runtime environment and method for non-invasive monitoring of software applications | |
CN112905474B (en) | Hardware-based advanced program dynamic control flow tracking method and device | |
CN109062797B (en) | Method and device for generating information | |
US11106522B1 (en) | Process memory resurrection: running code in-process after death | |
Andrzejak et al. | Confguru-A system for fully automated debugging of configuration errors | |
RU2390821C1 (en) | Dynamic instrumentation technique | |
CN117555811B (en) | Embedded software analysis method, device and storage medium based on static symbol execution | |
Lim et al. | Modeling code manipulation in JIT compilers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |