CN118075156A - EBPF-based RDMA network monitoring system and eBPF-based RDMA network monitoring method - Google Patents

EBPF-based RDMA network monitoring system and eBPF-based RDMA network monitoring method Download PDF

Info

Publication number
CN118075156A
CN118075156A CN202410054310.5A CN202410054310A CN118075156A CN 118075156 A CN118075156 A CN 118075156A CN 202410054310 A CN202410054310 A CN 202410054310A CN 118075156 A CN118075156 A CN 118075156A
Authority
CN
China
Prior art keywords
rdma
data
rdma network
network monitoring
monitoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410054310.5A
Other languages
Chinese (zh)
Inventor
黄昌盛
袁麒景
左海余
李天硕
邓梁
张璞
李俊
何益鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202410054310.5A priority Critical patent/CN118075156A/en
Publication of CN118075156A publication Critical patent/CN118075156A/en
Pending legal-status Critical Current

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a eBPF-based RDMA network monitoring system and a eBPF-based RDMA network monitoring method, wherein the system comprises the following steps: the control component module is used for sending out acquisition instructions, setting monitoring parameters and managing monitoring strategies, controlling the start and stop of data acquisition and the monitoring of a management RDMA network; eBPF the acquisition component module is used for acquiring RDMA network monitoring data; the aggregation component module is used for aggregating the received RDMA network monitoring data of the plurality of nodes and analyzing the RDMA network monitoring data to associate the RDMA network monitoring data with the corresponding application program; the index component module is used for analyzing and processing the aggregated RDMA network monitoring data, acquiring key performance indexes and visualizing the key performance indexes on the Prometaheus platform. The invention can realize deep monitoring of the RDMA network, greatly improve the flexibility, maintainability and performance of the monitoring system and effectively improve the overall performance and reliability of network service.

Description

EBPF-based RDMA network monitoring system and eBPF-based RDMA network monitoring method
Technical Field
The invention relates to the field of RDMA network monitoring, in particular to an RDMA network monitoring system and method based on eBPF.
Background
In the current network communication technology, RDMA (Remote Direct Memory Access ) is a key technology, and can realize efficient data transmission. RDMA allows one computer in the network to directly access the memory of another computer without data transmission through an operating system, which greatly reduces CPU load and improves data transmission efficiency. RDMA is mainly realized through InfiniBand, roCE (RDMA over Converged Ethernet) and other protocols, and is widely applied to data centers, high-performance computing and storage networks.
However, in conventional RDMA network monitoring methods, it is often necessary to rely on specific hardware or operating system level support. For example, some monitoring tools need to run on a Network Interface Card (NIC), or require an operating system to provide a specific monitoring interface. These methods tend to be inflexible and may have an impact on system performance.
In recent years, the advent of eBPF (Extended Berkeley PACKET FILTER) technology has provided new possibilities for solving this problem. eBPF is a powerful tool in the Linux kernel that allows a user to run predefined programs in the kernel without having to change the kernel code or reboot the system. eBPF can be used for various system level monitoring and network packet analysis, but its application in the field of RDMA monitoring is still in the preliminary stage.
While RDMA plays a vital role in the fields of high-performance computing, big data processing and the like, the monitoring method of RDMA has a plurality of obvious defects and limitations in practical application. The following is an intensive discussion of these drawbacks and an urgent need for solutions.
Limited monitoring viewing angle: the existing monitoring method mainly focuses on indexes of a network layer, such as flow statistics and equipment states, but often cannot go deep into a kernel layer of RDMA operation. For example, it is difficult for a monitoring system to obtain critical information such as fine-grained memory access patterns, processing status of data packets, etc., which limits the overall understanding and analysis of network behavior.
RDMA Verbs event relevance challenge: RDMA technology involves a variety of Verbs that are triggered at different times, resulting in the association between events becoming an urgent issue to be addressed. Since the timing and context of the occurrence of these events tends to be decentralized and complex, effectively associating these events is critical to understanding and optimizing RDMA operations, but is also extremely challenging.
The reaction timeliness is insufficient: under a high-speed network environment, a traditional monitoring system often cannot timely reflect real-time changes of network states. Particularly in the face of sudden network events or performance fluctuations, existing monitoring tools may not provide fast and accurate feedback, thereby delaying diagnosis and response of faults.
Hardware dependencies: most existing monitoring schemes rely on the hardware performance of the RDMA switch. This dependence limits the versatility and applicability of the monitoring solution, making it difficult to accommodate a wide variety of network environments and different vendor devices.
Performance versus monitoring trade-off: to achieve detailed monitoring, systems often need to sacrifice some network performance, especially in high traffic environments. This tradeoff is particularly acute in application scenarios requiring high performance, which may allow network administrators to face choices between monitoring accuracy and network performance.
In summary, while RDMA provides a powerful network communication function, existing monitoring methods have significant limitations in practical applications. These limitations not only affect the accuracy and efficiency of monitoring, but also increase the complexity and cost of network management. Therefore, it is particularly urgent and important to develop a new, more efficient and comprehensive RDMA monitoring method.
Disclosure of Invention
The invention aims at overcoming the defects of the prior art and provides an RDMA network monitoring system and method based on eBPF. The monitoring scheme of the invention is more comprehensive and efficient, and can realize deep monitoring of RDMA operation, and simultaneously keep the minimum influence on network performance.
The aim of the invention is realized by the following technical scheme: an embodiment of the present invention provides an RDMA network monitoring system based on eBPF, including:
the control component module is used for sending out acquisition instructions, setting monitoring parameters and managing monitoring strategies so as to control the start and stop of data acquisition and the monitoring of the RDMA network;
eBPF an acquisition component module, configured to acquire RDMA network monitoring data after receiving an acquisition instruction sent from the control component module; wherein the RDMA network monitor data includes RDMA control flow event, RDMA establish connection event and RDMA data flow event;
The aggregation component module comprises a multi-node aggregation module and an analysis module, wherein the multi-node aggregation module is used for aggregating the received RDMA network monitoring data of a plurality of acquisition nodes, and the analysis module is used for analyzing the aggregated RDMA network monitoring data so as to correlate the acquired RDMA network monitoring data with corresponding application programs; and
And the index component module is used for analyzing and processing the aggregated RDMA network monitoring data processed by the aggregation component module to acquire key performance indexes, converting the key performance indexes into visual representation forms and displaying the visual representation forms on the Prometaus platform.
Further, the control assembly module specifically includes:
(a1) In the monitoring task initialization stage, sending out an acquisition instruction to activate RDMA acquisition capability, specifically comprising mounting uprobe probes of user space and tracepoint probes of kernel space to code positions written by users who call Verbs of RDMA;
(a2) The management eBPF program loads and links the system kernel to control the start of data acquisition; wherein eBPF program is the executor of data acquisition, used to capture the key data of RDMA network operation in real time;
(a3) When data acquisition is not needed or system resources are required to be released, a user closes the uprobe probe which is not needed by modifying the configuration file so as to control the data acquisition to stop;
(a4) And carrying out fine-granularity fine control on the monitoring component according to a preset monitoring strategy and real-time decision to control the monitoring of the RDMA network.
Further, the critical data of the RDMA network operation includes RDMA setup connection data, RDMA control stream data and RDMA data stream data;
The monitoring component includes real-time regulation of eBPF programs, monitoring of routing of data streams, and management of temporary buffers.
Further, the operation flow of the eBPF acquisition component module specifically includes:
(b1) After receiving an acquisition instruction sent by a control component module, starting to acquire RDMA network monitoring data;
(b2) Monitoring an RDMA network in real time to collect RDMA control flow events, the RDMA control flow events including RDMA device lifecycle management events and queue pair management events; the device life cycle management events comprise activation and deactivation operation events of the device and all system call events related to device state change, and the queue management events comprise creation and destruction operation events of a queue pair;
(b3) Monitoring system calls related to connection establishment, collecting RDMA connection establishment events exchanged in a TCP handshake process of RDMA; wherein the RDMA establish connection event includes a global identifier, a key, and an address;
(b4) Monitoring RDMA operations to collect RDMA data stream events, wherein the RDMA operations include send and receive operations of data, the RDMA data stream events including send operation events, receive operation events, and completion queue events; collecting and sending operation events are called through monitoring all ibv _post_send, collecting and receiving operation events are called through monitoring all ibv _post_recv, and queue events are obtained through collecting ibv _poll_cq events;
(b5) Ringbuffer writing the collected RDMA control flow event, RDMA establishment connection event and RDMA data flow event into the kernel;
(b6) In a single machine environment, the collected RDMA network monitoring data is read from ringbuffer of the kernel, and the collected RDMA network monitoring data is primarily aggregated by using the identifier so as to realize single machine side data aggregation.
Further, the use identifier associates RDMA network monitoring data with a corresponding application or service to implement single-side data aggregation, which specifically includes:
File descriptor level aggregation: firstly, a system calls sys_enter_write and sys_exit_read, and aggregation is carried out based on the same file descriptor;
Queue pair number level aggregation: then further aggregating the aggregate data of sys_enter_write and sys_exit_read with ibv _post_send and ibv _post_recv events based on the same queue pair number;
Work request identifier level aggregation: finally, the aggregation data of ibv _post_send and ibv _post_recv events are aggregated with ibv _poll_cq events by using the same work request identifier, so that single-machine-side data aggregation is realized.
Further, the aggregate component module specifically includes:
(c1) The RDMA network monitoring data of a plurality of acquisition nodes are aggregated through a multi-node aggregation module, and in the process, the multi-node aggregation module receives the RDMA network monitoring data acquired by eBPF acquisition component modules of the plurality of acquisition nodes and fuses the plurality of RDMA network monitoring data into a coherent data set;
(c2) In the communication process, mapping global identifiers used by the RDMA network to IP addresses through an analysis module, and reserving mapping relations between all the IP addresses and the global identifiers so as to inquire and complement destination IP addresses related to RDMA network monitoring data according to the destination global identifiers;
(c3) Persistence of RDMA network monitoring data to a time sequence database through an analysis module so as to realize persistence of the data;
(c4) And mapping the collected thread group ID in each event to the corresponding application program through an analysis module.
Further, the key performance index includes a delay from sending out the data packet from the RDMA data packet to the network card, a delay from sending out the data packet from the network card to receiving the data packet by the opposite network card, and a state of the RDMA event, wherein the state of the RDMA event is success or failure.
Further, the index component module specifically includes:
(d1) Analyzing and processing the aggregated RDMA network monitoring data processed by the aggregation component module through the index component module to obtain standardized key performance indexes;
(d2) The key performance indicators are exported to Grafana and Prometheus platform, converted to visual representation form by Grafana, and displayed on Prometheus platform.
The second aspect of the embodiment of the invention provides a monitoring method based on the eBPF RDMA network monitoring system, which comprises the following steps:
(1) The user configures and sets monitoring parameters, and the control assembly module sends an acquisition instruction to the eBPF acquisition assembly module;
(2) After receiving the acquisition instruction sent by the control component module, the eBPF acquisition component module dynamically mounts the uprobe probe and the tracepoint probe according to user configuration so as to acquire RDMA network monitoring data; wherein the RDMA network monitor data includes RDMA control flow event, RDMA establish connection event and RDMA data flow event;
(3) Each event of the RDMA network monitoring data is triggered and then sent to an aggregation component module, the RDMA network monitoring data of a plurality of acquisition nodes of the same RDMA is aggregated through a multi-node aggregation module of the aggregation component module, and then the aggregated RDMA network monitoring data is analyzed by an analysis module so as to correlate the acquired RDMA network monitoring data with a corresponding application program and analyze the performance and state of an RDMA network;
(4) The index component module acquires the aggregated RDMA network monitoring data processed by the aggregation component module, analyzes and processes the aggregated RDMA network monitoring data to acquire key performance indexes, converts the key performance indexes into visual representation forms and displays the visual representation forms on the Prometaus platform.
Compared with the prior art, the invention has the beneficial effects that:
(1) The present invention innovatively uses a eBPF framework for dynamic plug uprobe of RDMAVerbs that allows users to dynamically add or remove uprobe probes according to actual needs without restarting the system or interrupting running services; the flexibility, maintainability and performance of the monitoring system are greatly improved, and the problem of trade-off between the performance of the observable system and monitoring is solved.
(2) The invention provides a unique multidimensional aggregation strategy, realizes data integration from single machine to multiple nodes, and breaks through the limitation of the traditional monitoring tool; the strategy not only associates parameters such as file descriptors, queue pair numbers, work request identifiers and the like, but also can automatically identify and integrate key performance indexes, and the pain points which are discrete and difficult to associate due to RDMAverbs events are solved.
(3) Aiming at the remote direct memory access protocol stack of the RDMA, the invention establishes the monitoring link from the RDMA control flow to the RDMA data flow, realizes the omnibearing transparent monitoring of the all-round link of the user space, and solves the great problem that the related event of the RDMA protocol is difficult to collect by bypassing the kernel, thereby obviously improving the monitoring capability of the key path of the network performance.
(4) The index component module not only supports the export of the aggregated data to Prometaus, but also realizes advanced data visualization through Grafana; the module provides deep network performance analysis and fault diagnosis capability through the customized visual instrument panel and the intelligent analysis algorithm, allows a user to customize and expand analysis functions according to specific requirements, and greatly improves the usability of data and the analysis depth.
(5) The invention obviously improves the observability and analysis depth of RDMA network operation; the invention breaks through the limitation of the traditional monitoring technology, and realizes the comprehensive monitoring of key links such as RDMA memory operation, data transmission and the like; the invention can instantly capture and accurately analyze various network events, including the transmission state and the memory access mode of the data packet; the time for problem diagnosis is greatly shortened, and the insight of the whole RDMA network operation is enhanced, so that the overall performance and reliability of network service are effectively improved.
Drawings
FIG. 1 is a block diagram of the RDMA network monitoring system based on eBPF of the present invention;
FIG. 2 is a fine grain deployment architecture diagram of FIG. 1 in accordance with the present invention;
FIG. 3 is a diagram of eBPF architecture of a detailed monitoring system on the single machine side of the present invention;
FIG. 4 is a flowchart illustrating operation of the RDMA network monitoring system of the present invention;
FIG. 5 is a graph of event aggregation in accordance with the present invention.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the application. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" depending on the context. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The present invention will be described in detail with reference to the accompanying drawings. The features of the examples and embodiments described below may be combined with each other without conflict.
The eBPF-based RDMA network monitoring system of the invention includes a control component module, a eBPF acquisition component module, an aggregation component module, and an index component module, as shown in FIG. 1. FIG. 2 is a more granular deployment architecture diagram of the framework of the present invention, with the core of multiple monitoring nodes, each of which is responsible for collecting data for local RDMA transfers. Within each node, RDMA probes and eBPF techniques are used in combination to capture network operation information in real-time. The aggregate component modules of these nodes aggregate the data and are managed uniformly by promethaus. Finally, the data is visualized through Grafana, providing an intuitive performance index.
In this embodiment, the control component module is configured to issue an acquisition instruction, set a monitoring parameter, and manage a monitoring policy, so as to control start and stop of data acquisition and monitor and control the RDMA network.
It should be appreciated that in the design of the overall architecture, the initial link is the control component module, which plays a vital role, which is responsible for the global policy that directs the monitoring activities, which is responsible for finely managing and regulating the overall data acquisition process, as shown in fig. 3.
It should be noted that embodiments of the present invention provide a eBPF framework for dynamic plug uprobe of RDMA Verbs that allows users to dynamically add or remove uprobe probes according to actual needs without restarting the system or interrupting running services; this feature greatly enhances the flexibility and maintainability of the monitoring system, enabling the user to conduct in-depth analysis and optimization for specific events or performance bottlenecks without interfering with existing network operations.
Further, the control assembly module specifically includes:
(a1) In the monitor task initialization phase, an acquisition instruction is issued to activate RDMA acquisition capabilities, specifically including mounting the uprobe probes of user space and tracepoint probes of kernel space to user-written code locations of RDMA-invoked Verbs.
It should be understood that the more uprobe probes are installed, the more serious the performance impact on the program is, so to ensure the performance of the user program, the user can dynamically install some non-critical uprobe probes, such as uprobe probes of RDMA control flow, according to his own needs, and switch off the unwanted uprobe probes by modifying the configuration file when no longer needed at a later stage.
(A2) The management eBPF program loads (loads) and links (links) the load to the system kernel to control the start of data acquisition; wherein eBPF programs are executors of data acquisition for capturing critical data for RDMA network operations in real time.
Further, critical data for RDMA network operations include RDMA setup connection data, RDMA control stream data, and RDMA data stream data.
(A3) When data acquisition is not needed or system resources are required to be released, a user closes the uprobe probe which is not needed by modifying the configuration file so as to control the stop of data acquisition, thereby being beneficial to ensuring the efficient utilization of the system resources.
(A4) And carrying out fine-granularity fine control on the monitoring component according to a preset monitoring strategy and real-time decision so as to control the monitoring of the RDMA network, thereby ensuring the smoothness of data flow and the continuity of monitoring.
Further, the monitoring components include, but are not limited to: real-time regulation of eBPF programs, monitoring of data flow routing and temporary buffer management.
It should be appreciated that through these precise and flexible management steps, the control component module ensures that the monitoring system maintains efficient and stable operation under a variety of workload and network conditions, and through a series of preset monitoring strategies and real-time decisions, it achieves precise control of the monitoring activities on the link, thereby maximizing the effectiveness of the monitoring activities while ensuring the economy of the system resources. The control component module can support comprehensive monitoring and deep analysis of RDMA network operation through the dynamic regulation mechanism.
In this embodiment, the eBPF acquisition component module is configured to acquire RDMA network monitoring data after receiving an acquisition instruction sent from the control component module. Wherein the RDMA network monitor data includes RDMA control flow events, RDMA establish connection events, and RDMA data flow events.
One of the key points of the invention is the design of eBPF acquisition component module, which uses eBPF technology to go deep into the core of RDMA protocol stack, and can accurately capture the detail information about memory access and data packet processing, wherein the detail information of memory access specifically comprises memory access address and memory content, and the detail information of data packet processing specifically comprises RDMA establishment connection event and RDMA data stream event (mainly memory content). This in-depth monitoring provides unprecedented transparency to network administrators, allowing for more accurate and comprehensive analysis of network behavior and performance.
It should be understood that the acquisition instruction sent by the control component module is passed to the next eBPF acquisition component module, which is the data capture engine of the system. The eBPF acquisition component module can go deep into the kernel layer of the system, monitor the RDMA network, and realize the complete flow from data acquisition to event internal aggregation by carrying out fine-grained monitoring on RDMA control flow events, RDMA establishment connection events and RDMA data flow events, as shown in fig. 4. By utilizing the high performance characteristics of eBPF, RDMA network events are captured in real time with minimal impact on system performance, providing raw data for subsequent analysis.
It should be noted that, in the field of RDMA network monitoring, the collection and aggregation of monitoring data is the basis for obtaining network performance insight. The embodiment of the invention provides a method for establishing a monitoring link connected to an RDMA data stream from an RDMA control stream to the RDMA, which realizes transparent monitoring of a user space full link, solves a great problem that the RDMA bypasses a kernel and is difficult to collect, and thereby remarkably improves the monitoring capability of a network performance critical path.
Further, as shown in fig. 4, the operation flow of the eBPF acquisition component module specifically includes:
(b1) After receiving the collection instruction sent by the control component module, the RDMA network monitoring data is started to be collected.
(B2) Monitoring an RDMA network in real-time to collect RDMA control flow events, including RDMA device lifecycle management events and Queue Pair (QP) management events; wherein the device lifecycle management events include activation and deactivation operation events of the device and all system call events related to device state changes, the queue management events include creation and destruction operation events of the queue pair, and the like. These events are critical to maintaining the stability of the network connection and the overall performance of the analysis system.
It should be noted that, the focus of the RDMA control flow event collection is to monitor the lifecycle management event of the RDMA device and the management event of the Queue Pair (QP). eBPF the acquisition component module plays a key role in the process and is responsible for tracking and recording the activation and the deactivation of equipment and system call events related to the change of the state of the equipment, so that the monitoring data can reflect the real-time state of the equipment. In addition, during the creation and destruction of the queue pair, the eBPF acquisition component module is also responsible for acquiring data, recording these key events, helping to analyze the establishment and termination of RDMA connections, the use of the queue pair, and the like.
(B3) Monitoring system calls related to connection establishment, collecting RDMA connection establishment events exchanged in a TCP handshake process of RDMA; the RDMA setup connection event includes Key information such as a global identifier (Global Identifier, GID), a Key (Key), and an Address (Address).
It should be appreciated that in RDMA communication, establishing a connection is a prerequisite for data transfer; the connection establishment phase involves a TCP/IP protocol stack that needs to exchange critical information such as global identifiers, keys and addresses. The eBPF acquisition component module carefully records the exchange process of key information such as GID, key and address, and provides a reliable data basis for successful establishment of RDMA connection.
(B4) Monitoring RDMA operations to collect RDMA data stream events, wherein the RDMA operations include send and receive operations of data, the RDMA data stream events including send operation events, receive operation events, and completion queue events; the collect send operation event is invoked by monitoring all ibv _post_send, the collect receive operation event is invoked by monitoring all ibv _post_recv, and the completion queue event is obtained by collecting ibv _poll_cq event.
It should be appreciated that the ibv _post_send call represents a data send request in the RDMA network, the ibv _post_recv call represents a data receive request on the RDMA network, and the ibv _poll_cq event is used to acknowledge completion status of send and receive operations.
(B5) The collected RDMA control stream event, RDMA setup connection event, and RDMA data stream event are written to ringbuffer of the kernel.
It should be noted that, the eBPF acquisition component module records the acquired data by means of ringbuffer, the ringbuffer mechanism of eBPF allows efficient data transmission, and when an RDMA event occurs, the eBPF acquisition component module captures and writes corresponding data, i.e., RDMA control flow event, RDMA setup connection event, or RDMA data flow event, into ringbuffer of the kernel, and the like, and when the use is needed subsequently, the user space program reads from the kernel.
(B6) In a single machine environment, the collected RDMA network monitoring data is read from ringbuffer of the kernel, and the collected RDMA network monitoring data is primarily aggregated by using the identifier so as to realize single machine side data aggregation.
It should be appreciated that after the RDMA network monitoring data is collected through the steps described above, a single-side aggregation link is performed, as shown in fig. 4. The module uses a series of identifiers, such as a Process ID (PID) and queue pair number (QpNum), to ensure that the data accurately reflects the specifics of RDMA network operations.
Further, in a stand-alone environment, the collected RDMA network monitoring data is first subjected to preliminary aggregation, as shown in FIG. 5. Namely: the RDMA network monitoring data is associated with a corresponding application program or service by using an identifier so as to realize single-side data aggregation, and the method specifically comprises the following steps:
File descriptor (fd) level aggregation: firstly, the system calls sys_enter_write and sys_exit_read, and aggregation is carried out based on the same file descriptor, so that the aggregation data can reflect the activity of a specific RDMA connection;
Queue pair number (QpNum) level aggregation: then further aggregating the aggregate data of sys_enter_write and sys_exit_read with ibv _post_send and ibv _post_recv events based on the same queue pair number to form a data view about the queue pair;
Work request identifiers (WrId) level aggregation: and finally, aggregating the aggregation data of ibv _post_send and ibv _post_recv events with ibv _poll_cq events by using the same work request identifier so as to realize single-machine side data aggregation and complete establishment of the integrity of RDMA package sending events.
Through the steps, the comprehensive collection and aggregation of RDMA network monitoring data are realized, and a large pain point which is discrete and difficult to associate with RDMA verbs is effectively solved.
It should be understood that the aggregate data of sys_enter_write and sys_exit_read is the aggregate data aggregated at the file descriptor level. Aggregate data for ibv _post_send and ibv _post_recv events are the aggregate data of the queue pair numbering stage aggregate.
In this embodiment, the aggregation component module includes a multi-node aggregation module and an analysis module, where the multi-node aggregation module is configured to aggregate RDMA network monitoring data of multiple received collection nodes, and the analysis module is configured to analyze the aggregated RDMA network monitoring data, so as to correlate the collected RDMA network monitoring data with a corresponding application program.
One of the key points of the invention is the introduction of an aggregation component module that aggregates and analyzes RDMA network monitoring data collected from eBPF probes to evaluate the efficiency of RDMA operations in real time, including monitoring of key performance indicators, such as delays in the delivery of data packets to network cards, to ensure optimal operating conditions of the network.
It should be appreciated that the aggregation component module is a data processing hub responsible for integrating and aggregating RDMA network monitoring data distributed across multiple collection nodes, and allows the system to perform comprehensive data analysis and insight not only on a single node but also across multiple nodes, and this centralized data processing manner provides a global view of the monitoring system.
Further, the aggregate component module specifically includes:
(c1) The RDMA network monitoring data of a plurality of acquisition nodes are aggregated through the multi-node aggregation module, and in the process, the multi-node aggregation module receives the RDMA network monitoring data acquired by the eBPF acquisition component modules of the plurality of acquisition nodes and fuses the plurality of RDMA network monitoring data into a coherent data set, so that a unified view can be provided for subsequent analysis.
(C2) In the communication process, the GID used by the RDMA network is mapped to the IP address through the analysis module, and the mapping relation between all the IP addresses and the GID is reserved, so that the destination IP address related to the RDMA network monitoring data is inquired and complemented according to the destination GID.
It should be noted that the aggregation component module also solves a unique problem in RDMA communications, namely, the mapping of GIDs to IP addresses. Because RDMA uses GID as a unique identification instead of a traditional IP address, the aggregator side can reserve the mapping relation between all the IP and the GID, so that destination IP information related to RDMA data can be queried and complemented according to the destination GID.
It should be appreciated that the GID is used as a unique identifier for a device in RDMA communications, which is a long number that uniquely identifies each device in the RDMA network. GID is similar to IP addresses in traditional networks, but it is specifically designed for RDMA. GIDs are devices or endpoints (endpoints) that belong to an RDMA network, each device using RDMA technology having one or more GIDs for uniquely identifying itself in the network.
(C3) The RDMA network monitoring data is persisted to the time sequence database through the analysis module so as to realize the persistence of the data, the persistence and the reliability of the data are ensured, and meanwhile, a foundation is provided for long-term performance monitoring and analysis.
(C4) The thread group ID (tgid) in each collected event is mapped to its corresponding application by an analysis module.
It should be appreciated that the eBPF acquisition component module, when acquiring RDMA network monitor data, includes RDMA control flow events, RDMA establish connection events, and RDMA data flow events, here mapping the thread group ID (tgid) in each event to its corresponding application. For example, tgid of the event may be obtained through bpf_get_current_pid_ tgid, and the corresponding application may be known through tgid. This mapping function is critical to understanding the relationships between the various events and applications, improving the availability of the monitoring data and the accuracy of the analysis.
In this embodiment, the index component module is configured to analyze and process the aggregated RDMA network monitoring data processed by the aggregation component module, so as to obtain a key performance index, and convert the key performance index into a visual representation and display the visual representation on the promethaus platform.
It should be understood that in the eBPF RDMA monitoring framework, the index component module plays a key role and is responsible for converting RDMA network monitoring data processed by the aggregation component module into standardized monitoring indexes and exporting the standardized monitoring indexes to the observables platform promethaus. The implementation of the index component module ensures seamless integration with powerful monitoring tools such as Prometheus, which not only facilitates the persistent storage of data, but also provides a basis for further data analysis and visualization.
Further, the key performance indexes include delay of sending out the data packet from the RDMA data packet to the network card, delay of sending out the data packet from the network card to the opposite network card, state of RDMA event, and the like, wherein the state of RDMA event is success or failure.
Further, the index component module specifically includes:
(d1) And analyzing and processing the aggregated RDMA network monitoring data processed by the aggregation component module through the index component module to acquire standardized key performance indexes.
(D2) The key performance indicators are exported to Grafana and Prometheus platforms, and are converted to visual representations, such as visual charts and dashboards, by Grafana and displayed on the Prometheus platform.
It should be appreciated that the Prometheus platform is an open-source system monitoring and alarm platform with powerful query language and memory capabilities that can support long-term trend analysis and immediate alerting of data. Grafana is a widely used data visualization platform, which can convert key performance indexes into visual charts and dashboards, and can also be customized and displayed according to user requirements. This visualization capability greatly enhances the ability of network administrators in monitoring, troubleshooting, and performance optimization, enabling quick identification of problems and making data-based decisions.
In addition, the Grafana can be customized according to the user demands by considering the diversity of different network environments and application scenes, and key performance indexes such as the type and frequency of monitoring events can be adjusted according to the specific demands of the user, so that the system disclosed by the invention is not only suitable for a wide network environment, but also can meet the specific demands of specific users.
In summary, the invention not only improves the efficiency and precision of network event monitoring and greatly improves the reliability, performance and maintenance efficiency of RDMA networks through the innovative components, but also ensures the expandability and sustainability of the system through a modularized architecture.
It should be noted that the embodiment of the present invention also provides a monitoring method, which is implemented based on the RDMA network monitoring system based on eBPF in the above embodiment.
The monitoring method specifically comprises the following steps:
(1) The user configures the set monitoring parameters, and the control component module sends an acquisition instruction to the eBPF acquisition component module.
(2) And after receiving the acquisition instruction sent by the control component module, the eBPF acquisition component module dynamically mounts the uprobe probe and the tracepoint probe according to user configuration so as to acquire RDMA network monitoring data. Wherein the RDMA network monitor data includes RDMA control flow events, RDMA establish connection events, and RDMA data flow events.
(3) Each event of the RDMA network monitoring data is triggered and then sent to the aggregation component module, the RDMA network monitoring data of a plurality of acquisition nodes of the same RDMA is aggregated through the multi-node aggregation module of the aggregation component module, and then the aggregated RDMA network monitoring data is analyzed by the analysis module so as to correlate the acquired RDMA network monitoring data with a corresponding application program and analyze the performance and state of the RDMA network.
(4) The index component module acquires the aggregated RDMA network monitoring data processed by the aggregation component module, analyzes and processes the aggregated RDMA network monitoring data to acquire key performance indexes, converts the key performance indexes into visual representation forms and displays the visual representation forms on the Prometaus platform.
Illustratively, in companies in some financial technology area, the traditional RDMA monitoring approach fails to provide adequate support in the face of low latency stringent demand scenarios. The company's network architecture is complex, requires extremely high data transmission efficiency and stability, and any minor delay may affect transaction decisions and execution. After the eBPF-based RDMA monitoring method is used, the company can realize real-time and deep monitoring of the operation of the RDMA network. The method not only can accurately track and analyze each key link in the data transmission process, but also can discover and quickly respond to potential network problems in real time. The result shows that after the monitoring method is applied, the data transmission is more efficient and stable, and the overall performance of the system is improved, so that the core business requirements of the company are effectively supported.
The invention obviously improves the observability and analysis depth of RDMA network operation; the invention breaks through the limitation of the traditional monitoring technology, and realizes the comprehensive monitoring of key links such as RDMA memory operation, data transmission and the like; the invention can instantly capture and accurately analyze various network events, including the transmission state and the memory access mode of the data packet; the time for problem diagnosis is greatly shortened, and the insight of the whole RDMA network operation is enhanced, so that the overall performance and reliability of network service are effectively improved.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1. EBPF-based RDMA network monitoring system, comprising:
the control component module is used for sending out acquisition instructions, setting monitoring parameters and managing monitoring strategies so as to control the start and stop of data acquisition and the monitoring of the RDMA network;
eBPF an acquisition component module, configured to acquire RDMA network monitoring data after receiving an acquisition instruction sent from the control component module; wherein the RDMA network monitor data includes RDMA control flow event, RDMA establish connection event and RDMA data flow event;
The aggregation component module comprises a multi-node aggregation module and an analysis module, wherein the multi-node aggregation module is used for aggregating the received RDMA network monitoring data of a plurality of acquisition nodes, and the analysis module is used for analyzing the aggregated RDMA network monitoring data so as to correlate the acquired RDMA network monitoring data with corresponding application programs; and
And the index component module is used for analyzing and processing the aggregated RDMA network monitoring data processed by the aggregation component module to acquire key performance indexes, converting the key performance indexes into visual representation forms and displaying the visual representation forms on the Prometaus platform.
2. The eBPF-based RDMA network monitoring system as claimed in claim 1, wherein the control component module specifically comprises:
(a1) In the monitoring task initialization stage, sending out an acquisition instruction to activate RDMA acquisition capability, specifically comprising mounting uprobe probes of user space and tracepoint probes of kernel space to code positions written by users who call Verbs of RDMA;
(a2) The management eBPF program loads and links the system kernel to control the start of data acquisition; wherein eBPF program is the executor of data acquisition, used to capture the key data of RDMA network operation in real time;
(a3) When data acquisition is not needed or system resources are required to be released, a user closes the uprobe probe which is not needed by modifying the configuration file so as to control the data acquisition to stop;
(a4) And carrying out fine-granularity fine control on the monitoring component according to a preset monitoring strategy and real-time decision to control the monitoring of the RDMA network.
3. The eBPF-based RDMA network monitoring system as recited in claim 2, wherein the critical data for RDMA network operations includes RDMA setup connection data, RDMA control flow data, and RDMA data flow data;
The monitoring component includes real-time regulation of eBPF programs, monitoring of routing of data streams, and management of temporary buffers.
4. The eBPF-based RDMA network monitoring system as claimed in claim 1, wherein the eBPF acquisition component module is configured to:
(b1) After receiving an acquisition instruction sent by a control component module, starting to acquire RDMA network monitoring data;
(b2) Monitoring an RDMA network in real time to collect RDMA control flow events, the RDMA control flow events including RDMA device lifecycle management events and queue pair management events; the device life cycle management events comprise activation and deactivation operation events of the device and all system call events related to device state change, and the queue management events comprise creation and destruction operation events of a queue pair;
(b3) Monitoring system calls related to connection establishment, collecting RDMA connection establishment events exchanged in a TCP handshake process of RDMA; wherein the RDMA establish connection event includes a global identifier, a key, and an address;
(b4) Monitoring RDMA operations to collect RDMA data stream events, wherein the RDMA operations include send and receive operations of data, the RDMA data stream events including send operation events, receive operation events, and completion queue events; collecting and sending operation events are called through monitoring all ibv _post_send, collecting and receiving operation events are called through monitoring all ibv _post_recv, and queue events are obtained through collecting ibv _poll_cq events;
(b5) Ringbuffer writing the collected RDMA control flow event, RDMA establishment connection event and RDMA data flow event into the kernel;
(b6) In a single machine environment, the collected RDMA network monitoring data is read from ringbuffer of the kernel, and the collected RDMA network monitoring data is primarily aggregated by using the identifier so as to realize single machine side data aggregation.
5. The eBPF-based RDMA network monitoring system as recited in claim 4, wherein the use identifier associates RDMA network monitoring data with a corresponding application or service to enable single-side data aggregation, in particular comprising:
File descriptor level aggregation: firstly, a system calls sys_enter_write and sys_exit_read, and aggregation is carried out based on the same file descriptor;
Queue pair number level aggregation: then further aggregating the aggregate data of sys_enter_write and sys_exit_read with ibv _post_send and ibv _post_recv events based on the same queue pair number;
Work request identifier level aggregation: finally, the aggregation data of ibv _post_send and ibv _post_recv events are aggregated with ibv _poll_cq events by using the same work request identifier, so that single-machine-side data aggregation is realized.
6. The eBPF-based RDMA network monitoring system as claimed in claim 1, wherein the aggregate component module specifically comprises:
(c1) The RDMA network monitoring data of a plurality of acquisition nodes are aggregated through a multi-node aggregation module, and in the process, the multi-node aggregation module receives the RDMA network monitoring data acquired by eBPF acquisition component modules of the plurality of acquisition nodes and fuses the plurality of RDMA network monitoring data into a coherent data set;
(c2) In the communication process, mapping global identifiers used by the RDMA network to IP addresses through an analysis module, and reserving mapping relations between all the IP addresses and the global identifiers so as to inquire and complement destination IP addresses related to RDMA network monitoring data according to the destination global identifiers;
(c3) Persistence of RDMA network monitoring data to a time sequence database through an analysis module so as to realize persistence of the data;
(c4) And mapping the collected thread group ID in each event to the corresponding application program through an analysis module.
7. The eBPF-based RDMA network monitoring system as claimed in claim 1, wherein the key performance indicators include a delay in sending out an RDMA packet to a network card, a delay in sending out a packet to a peer network card, and a state of an RDMA event, the state of the RDMA event being success or failure.
8. The eBPF-based RDMA network monitoring system as claimed in claim 1, wherein the index component module specifically comprises:
(d1) Analyzing and processing the aggregated RDMA network monitoring data processed by the aggregation component module through the index component module to obtain standardized key performance indexes;
(d2) The key performance indicators are exported to Grafana and Prometheus platform, converted to visual representation form by Grafana, and displayed on Prometheus platform.
9. A method of monitoring based on eBPF RDMA-based network monitoring system as claimed in any one of claims 1 to 8, comprising the steps of:
(1) The user configures and sets monitoring parameters, and the control assembly module sends an acquisition instruction to the eBPF acquisition assembly module;
(2) After receiving the acquisition instruction sent by the control component module, the eBPF acquisition component module dynamically mounts the uprobe probe and the tracepoint probe according to user configuration so as to acquire RDMA network monitoring data; wherein the RDMA network monitor data includes RDMA control flow event, RDMA establish connection event and RDMA data flow event;
(3) Each event of the RDMA network monitoring data is triggered and then sent to an aggregation component module, the RDMA network monitoring data of a plurality of acquisition nodes of the same RDMA is aggregated through a multi-node aggregation module of the aggregation component module, and then the aggregated RDMA network monitoring data is analyzed by an analysis module so as to correlate the acquired RDMA network monitoring data with a corresponding application program and analyze the performance and state of an RDMA network;
(4) The index component module acquires the aggregated RDMA network monitoring data processed by the aggregation component module, analyzes and processes the aggregated RDMA network monitoring data to acquire key performance indexes, converts the key performance indexes into visual representation forms and displays the visual representation forms on the Prometaus platform.
CN202410054310.5A 2024-01-15 2024-01-15 EBPF-based RDMA network monitoring system and eBPF-based RDMA network monitoring method Pending CN118075156A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410054310.5A CN118075156A (en) 2024-01-15 2024-01-15 EBPF-based RDMA network monitoring system and eBPF-based RDMA network monitoring method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410054310.5A CN118075156A (en) 2024-01-15 2024-01-15 EBPF-based RDMA network monitoring system and eBPF-based RDMA network monitoring method

Publications (1)

Publication Number Publication Date
CN118075156A true CN118075156A (en) 2024-05-24

Family

ID=91106597

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410054310.5A Pending CN118075156A (en) 2024-01-15 2024-01-15 EBPF-based RDMA network monitoring system and eBPF-based RDMA network monitoring method

Country Status (1)

Country Link
CN (1) CN118075156A (en)

Similar Documents

Publication Publication Date Title
AU2005249056B2 (en) System and method for performance management in a multi-tier computing environment
US7181743B2 (en) Resource allocation decision function for resource management architecture and corresponding programs therefor
US7603671B2 (en) Performance management in a virtual computing environment
US20060074946A1 (en) Point of view distributed agent methodology for network management
CN105224445A (en) Distributed tracking system
CN110855493B (en) Application topological graph drawing device for mixed environment
Kim et al. OFMon: OpenFlow monitoring system in ONOS controllers
CN108052358B (en) Distributed deployment system and method
CN111163150A (en) Distributed calling tracking system
CN107404417A (en) A kind of processing method of monitoring data, processing unit and processing system
CN112333020B (en) Network security monitoring and data message analysis system based on quintuple
CN117176802B (en) Full-link monitoring method and device for service request, electronic equipment and medium
Smith A system for monitoring and management of computational grids
CN118075156A (en) EBPF-based RDMA network monitoring system and eBPF-based RDMA network monitoring method
US20020040393A1 (en) High performance distributed discovery system
CN116204386A (en) Method, system, medium and equipment for automatically identifying and monitoring application service relationship
CN113157796A (en) Data acquisition display system based on micro-service
CN115664832A (en) Network connection processing method, device, equipment and storage medium
CN112398707B (en) Distributed automatic test management method, device, equipment and storage medium
CN114328093A (en) Hadoop-based monitoring method, system, storage medium and equipment
CN113810250B (en) Message tracking method, system and equipment
Dimova et al. An Innovative Approach of API Automation Testing Implemented on Cloud Environments Using Container Management Services
Dumitrescu Problems for resource brokering in large and dynamic grid environments
CN117811949A (en) High-performance network service monitoring system, method and device based on eBPF
CN115834324A (en) Switchboard data acquisition method and system based on high-availability prometheus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination