WO2023279815A1

WO2023279815A1 - Performance monitoring system and related method

Info

Publication number: WO2023279815A1
Application number: PCT/CN2022/089717
Authority: WO
Inventors: 彭大成; 鲁强
Original assignee: 华为技术有限公司
Priority date: 2021-07-08
Filing date: 2022-04-28
Publication date: 2023-01-12

Abstract

A performance monitoring system, which is used for monitoring the performance of an application on a collected node. The system comprises a collection node (100) and an analysis node (200), wherein the collection node (100) is connected to a collected node by means of a hardware debugging interface of the collected node, and the collection node (100) is used for reading performance information by means of the hardware debugging interface of the collected node; and the analysis node (200) is used for receiving the performance information sent by the collection node, and analyzing the performance information to obtain an analysis result. The collection node (100) collects performance information by means of the hardware debugging interface, and no collection software needs to be installed, such that software compatibility limitations are surmounted; and the method is applicable to performance monitoring of various types of collected nodes, which have hardware debugging interfaces, and has a higher availability.

Description

Performance monitoring system and related method

[Corrected 28.04.2022 under Rule 26]
This application is required to be submitted to the State Intellectual Property Office of China on July 8, 2021, the application number is 202110770996.4, and the title of the invention is "method and system for information collection", and it is submitted to the State Intellectual Property Office of China on September 7, 2021, with the application number The priority of the Chinese patent application 202111043661.9, titled "Performance Monitoring System and Related Methods", the entire contents of which are incorporated in this application by reference.

technical field

The present application relates to the field of computer technology, and in particular to a performance monitoring system, a performance monitoring method, a collection node, a computer-readable storage medium, and a computer program product.

Background technique

With the continuous development of computer technology, a large number of computer applications (also referred to as applications for short) for realizing different functions have been produced. As users have higher requirements on application performance, how to improve application performance has gradually become a major concern of the industry. An essential step in improving application performance is performance monitoring.

Currently, the industry provides some software-based performance monitoring solutions. These performance monitoring solutions require the installation of corresponding software on the collected nodes and analysis nodes. When the user has performance monitoring requirements, the collection software can collect performance information such as the number of occurrences of hardware events (such as specified instructions being called, cache misses, etc.), the function call stack when a certain hardware event occurs, etc. , and then the analysis node analyzes the above performance information through the analysis software, and presents the analysis results to the user through charts and other forms.

However, software usually has compatibility limitations. For example, when the collected node uses a specific operating system, the collection software may not work normally. Based on this, the industry urgently needs to provide a performance monitoring solution with high availability.

Contents of the invention

This application provides a performance monitoring system. The collection nodes in the performance monitoring system collect performance information through the hardware debugging interface of the collected node, without installing collection software, which breaks through the limitation of software compatibility, and is suitable for all kinds of hardware The collected nodes of the debug interface perform performance monitoring and have high availability. The present application also provides a corresponding performance monitoring method, a collection node, a computer-readable storage medium, and a computer program product.

In a first aspect, the present application provides a performance monitoring system. The performance monitoring system may be a hardware system with a performance monitoring function. The system is used to monitor the performance of the application on the collected nodes. Wherein, the collected node may be a device running an application, such as a server, or a personal computer such as a desktop computer, a notebook computer, or a smart phone.

The performance monitoring system includes collection nodes and analysis nodes. The collection node is connected with the collected node through the hardware debugging interface of the collected node. The collection node is used to read performance information through the hardware debugging interface of the collected node, and the analysis node is used to receive the performance information sent by the collection node, analyze the performance information, and obtain an analysis result.

In this system, the collection nodes collect performance information through the hardware debugging interface, without the need to install collection software, which breaks through the limitation of software compatibility, and is suitable for performance monitoring of various collected nodes with hardware debugging interfaces, with high availability. Moreover, the collection node reads the performance information from the hardware debugging interface in a read-only manner, which will not use the resources of the collected node and will not affect the operation of the application on the collected node. On the one hand, it ensures that the collected performance information is more accurate. The effect of performance monitoring is improved. On the other hand, the impact of the performance monitoring system on the environment of the collected nodes is completely isolated.

In some possible implementations, the collected node includes registers, such as registers related to performance or application running status, and the collection node can read the registers in the collected node through the hardware debugging interface of the collected node. data to obtain performance information.

Among them, the collection node reads the data in the relevant registers through the hardware debugging interface of the collected node to collect performance information. On the one hand, it breaks through the limitation of software compatibility, and on the other hand, it directly reads through the hardware debugging interface instead of collecting Software reading can realize high-speed data export and improve collection efficiency.

In some possible implementation manners, the data volume of the performance information in the collected node is relatively large, and the collected node may set an independent storage component for buffering the performance information. Wherein, the independent storage unit refers to a storage unit in the collected node that is independent from the main part of the central processing unit, and the independent storage unit may be, for example, a storage unit such as a cache or a buffer. The collection node can read the data transmitted from the register of the collected node to the independent storage unit of the collected node through the hardware debugging interface of the collected node to obtain performance information.

Among them, independent storage components can bridge the gap between high-speed devices and low-speed devices, reduce the limitations of low-speed devices, and increase the collection speed of performance information, thereby improving the efficiency of performance monitoring.

In some possible implementation manners, the analysis node is further configured to receive the collection prompt information input by the user, and the collection node is specifically configured to read the performance information through the hardware debugging interface of the collected node according to the collection prompt information. In this way, performance information can be collected on demand according to user requirements, and personalized performance monitoring can be realized.

In some possible implementation manners, the collection prompt information includes an application code collection scope and at least one collection item, and each collection item corresponds to a hardware event. The code collection range is used to indicate the code fragments that need to collect performance information for performance monitoring. It should be noted that the code fragment may be a compiled code fragment of the source code fragment. The collection item is used to indicate an indicator for performance monitoring of the above code fragment, and the indicator may be represented by a performance-related hardware event. The hardware events may include, for example, cache miss events and branch misprediction events.

Specifically, the user can first determine the source code segment that needs to be tracked from the application that needs to monitor the performance, and then view the compiled code to obtain the address of the compiled code segment of the above source code segment, so that the collection node can be based on the The address identifies the code fragment that needs to collect performance information. In addition, the user can configure the register address through the configuration interface provided by the analysis node, so that after the analysis node sends the register address to the collection node, the collection node can collect the data in the register corresponding to the register address according to the register address, so as to obtain performance information.

In some possible implementations, the collected nodes are provided with registers for performance monitoring, such as a set of registers in the performance monitoring unit, or another set of registers used for performance tuning to obtain hardware snapshots, etc. The collection node is specifically configured to configure the configuration register in the collected node according to the collection prompt information, so as to select the target register in the collected node and the line of the independent storage component of the collected node. Wherein, the target register includes a register matching the collection prompt information, and the target register may be at least one register among the above-mentioned registers for performance monitoring. When the application is running, the collection node can read the data transmitted from the target register to the independent storage unit through the hardware debugging interface of the collected node to obtain performance information.

Because the collection node of the performance monitoring system can configure the configuration register in the collected node according to the collection prompt information, so as to gate the target register in the collected node and the line of the independent storage component, therefore, the hardware debugging interface from the collected node can be realized Read performance information on demand.

In some possible implementations, the system includes a plurality of analysis nodes, and the collection node is further configured to receive an analysis node address list configured by a user, and send a join to the plurality of analysis nodes according to the analysis node address list. ask. Correspondingly, at least one analysis node among the plurality of analysis nodes is configured to send a join success notification to the acquisition node. In this way, flexible networking can be realized. Moreover, the collection node and the analysis node configure the network separately, so that the performance information being transmitted will not load the current network, reduce the pressure on the current network, and be able to transmit more and more detailed performance information.

In some possible implementation manners, before sending the joining request to the multiple analysis nodes, the collection node is further configured to add the multiple analysis nodes according to the analysis node address list, so that the collection nodes collect The performance information is shared by the plurality of analysis nodes.

Even if an individual analysis node fails, other analysis nodes can perform analysis based on the performance information collected by the collection node, thereby realizing performance monitoring and improving the robustness and reliability of the performance monitoring system. Moreover, as long as a remote analysis node is added to a collection node, the analysis node can access the collection node. For the collected nodes, there is no need to replace the collection nodes because there are new analysis nodes. In this way, the movement of nodes can be reduced and the flexibility of networking can be improved.

In some possible implementation manners, the collection node has a hardware debugging interface, and the hardware debugging interface of the collection node is connected to the hardware debugging interface of the collected node through a cable. Therefore, the collection node and the collected node can transmit performance information through the hardware debugging interface of the collected node, the cable, and the line of the hardware debugging interface of the collection node, thereby realizing the collection of performance information through hardware, breaking through the limitation of software compatibility , and improve the collection efficiency.

In some possible implementation manners, the collected nodes include multiple nodes, and a network topology of the collection node and the multiple collected nodes connected to the collection node is a daisy chain topology. Through the daisy chain topology, one collection node can collect the performance of multiple collected nodes. Specifically, the collection node can be connected to multiple collected nodes without plugging and unplugging. After the process of collecting performance information is triggered, the collection node can read the performance information of multiple collected nodes in time-sharing.

In this way, the number of collection nodes that need to be configured can be greatly reduced, and the workload of installation and debugging of collection nodes is simplified.

In some possible implementation manners, the collection node is powered by an alternating current or a battery, and the collection node powered by the alternating current is used to collect performance information applied in a fixed node to be collected, and the collection node powered by the battery The collection node is used to collect the performance information of the application in the mobile collected node. Among them, the fixed collected nodes can be computing devices such as servers in large data centers, and the mobile collected nodes can be robots, electric vehicles, drones, virtual reality wearable devices, etc.

That is to say, the performance monitoring system of the present application can be applied to different performance tuning scenarios according to different performance tuning environments, and has high usability.

Furthermore, some nodes to be collected may be sensitive to weight or electric energy. Therefore, the collection nodes can also be equipped with batteries with a smaller weight, and use energy-saving communication modules such as Zigbee communication modules to achieve low energy consumption. Complete the transmission of performance information.

In a second aspect, the present application provides a performance monitoring method. The method is applied to a performance monitoring system, the system is used to monitor the performance of the application on the collected node, the system includes a collection node and an analysis node, and the collection node communicates with the collected node through the hardware debugging interface of the collected node The collected nodes are connected, the method includes:

The collection node reads the performance information through the hardware debugging interface of the collected node;

The analysis node receives the performance information sent by the collection node, analyzes the performance information, and obtains an analysis result.

In some possible implementation manners, the collection node reads the performance information through the hardware debugging interface of the collected node, including:

The collection node reads the data in the register of the collected node through the hardware debugging interface of the collected node to obtain performance information.

The collection node reads the data transmitted from the register of the collected node to the independent storage component of the collected node through the hardware debugging interface of the collected node, and obtains performance information.

In some possible implementations, the method also includes:

The analysis node receives the collection prompt information input by the user;

The collection node reads the performance information through the hardware debugging interface of the collected node, including:

According to the collection prompt information, the performance information is read through the hardware debugging interface of the collected node.

In some possible implementation manners, the collection prompt information includes a code collection scope of the application and at least one collection item, and each collection item corresponds to a hardware event.

The collection node configures the configuration register in the collected node according to the collection prompt information, so as to select the target register in the collected node and the line of the independent storage component in the collected node;

When the application is running, the collection node reads the data transmitted from the target register to the independent storage unit through the hardware debugging interface of the collected node to obtain performance information.

In some possible implementations, the system includes multiple analysis nodes, and the method further includes:

The collection node receives a user-configured analysis node address list;

The collection node sends a join request to the plurality of analysis nodes according to the analysis node address list;

At least one analysis node among the plurality of analysis nodes sends a joining success notification to the collection node.

In some possible implementation manners, before the collection node sends joining requests to the multiple analysis nodes, the method further includes:

Adding the multiple analysis nodes according to the analysis node address list, so that the performance information collected by the collection node is shared by the multiple analysis nodes.

In some possible implementation manners, the collection node has a hardware debugging interface, and the hardware debugging interface of the collection node is connected to the hardware debugging interface of the collected node through a cable.

In some possible implementation manners, the collected nodes include multiple nodes, and a network topology of the collection node and the multiple collected nodes connected to the collection node is a daisy chain topology.

In some possible implementation manners, the collection node is powered by alternating current or battery;

Collecting performance information applied in a fixed collected node through the collection node powered by the alternating current; or,

The collection node powered by the battery collects the performance information of the application in the mobile collected node.

In a third aspect, the present application provides a collection node. The collection node is connected to the collected node through the hardware debugging interface of the collected node, and the collection node is used to perform performance monitoring as described in the second aspect or any implementation manner of the second aspect of the present application Steps in a method performed by the collection node.

In a fourth aspect, the present application provides a computer-readable storage medium, where an instruction is stored in the computer-readable storage medium, and the instruction instructs the device to execute the method described in the second aspect or any implementation manner of the second aspect. A step in a performance monitoring method performed by an analysis node.

In a fifth aspect, the present application provides a computer program product containing instructions, which, when run on a device, causes the device to execute the performance monitoring method described in the second aspect or any implementation manner of the second aspect described by The steps performed by the analysis node.

On the basis of the implementation manners provided in the foregoing aspects, the present application may further be combined to provide more implementation manners.

Description of drawings

In order to more clearly illustrate the technical methods of the embodiments of the present application, the following will briefly introduce the drawings required in the embodiments.

FIG. 1 is a schematic structural diagram of a performance monitoring system provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of a configuration interface provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a hardware structure of a collection node provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a hardware structure of a collection node provided in an embodiment of the present application;

FIG. 5 is a schematic diagram of a hardware structure of a collection node provided in an embodiment of the present application;

FIG. 6 is a schematic diagram of a hardware structure of a collection node provided in an embodiment of the present application;

FIG. 7 is a flowchart of a performance monitoring method provided in an embodiment of the present application;

FIG. 8 is a schematic diagram of a configuration code collection range provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of a configuration collection item provided by an embodiment of the present application.

detailed description

The terms "first" and "second" in the embodiments of the present application are used for description purposes only, and cannot be understood as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Thus, a feature defined as "first" and "second" may explicitly or implicitly include one or more of these features.

In order to facilitate the understanding of the embodiments of the present application, first, some terms involved in the present application are explained.

An application (APP), also called an application program, refers to a computer program written for a certain special purpose of a user. Applications are usually deployed on hardware devices, such as personal computers such as desktops, laptops, and smart phones, or servers. The hardware device runs the application by executing the program code of the application, and then realizes the functions of the application.

Application performance (performance), specifically refers to the running performance of the application. The running performance can be characterized by the resource utilization rate when the application is running. The resources may be of different types such as computing resources, storage resources, and network resources. Wherein, computing resources may include processor resources such as central processing unit (central processing unit, CPU), graphics processing unit (graphics processing unit, GPU), and storage resources may include internal memory (memory), external storage (also called auxiliary storage, Auxiliary storage (auxiliary storage), cache (cache) and other resources, network resources may include bandwidth and other resources. Based on this, running performance can be characterized by CPU utilization, memory utilization, and cache hit ratio.

Performance monitoring (performance monitor) refers to the continuous detection of the running performance of the application. Large-scale chip design companies provide a wealth of performance monitoring components, such as performance monitoring unit (performance monitor, PMU) registers, core view CoreSight ^TM chip monitoring components, extended debugging interface (eXtend Debug Port, XDP), and so on.

At present, the performance monitoring solutions provided by the industry are based on the support of the above performance monitoring components, develop performance monitoring related software, and then implement performance monitoring through the software. Specifically, the acquisition software is installed on the collected nodes, and the analysis software is installed on the analysis nodes. When the user needs performance monitoring, the acquisition software can analyze hardware events such as cache miss and branch misprediction according to the user's requirements, or performance information such as the function call stack when some specific hardware events occur. Collect, and then the analysis node analyzes the above performance information through the analysis software, and presents the analysis results to the user through charts and other forms. However, software usually has compatibility limitations. For example, when the collected node uses a specific operating system, the collection software may not work normally. Based on this, the industry urgently needs to provide a performance monitoring solution with high availability.

In view of this, an embodiment of the present application provides a performance monitoring system. The performance monitoring system may be a hardware system with a performance monitoring function. The system is used to monitor the performance of the application on the collected nodes. Wherein, the collected node may be a device running an application, such as a server or a PC.

In order to make the technical solution of the present application clearer and easier to understand, the performance monitoring system provided by the embodiment of the present application will be introduced below with reference to the accompanying drawings.

Referring to the structural diagram of the performance monitoring system shown in FIG. 1 , the performance monitoring system 10 includes at least one collection node 100 and at least one analysis node 200 . The collection node 100 has a hardware entity, such as a computing box. The computing box includes a hardware debugging interface adapted to the hardware debugging interface of the collected node. It should be noted that the hardware debugging interface of the collection node 100 and the hardware debugging interface of the collected node may be natively adapted, or may be adapted through an adapter. For example, if the collection node 100 supports hardware debugging interface A, and the collected node supports hardware debugging interface B, then the collection node 100 can use an adapter to convert the call of hardware debugging interface A into the call of hardware debugging interface B, so as to realize the adaptation to the collected node .

The analysis node 200 may be a hardware device deployed with analysis software (ie, performance analysis software), such as a server, a PC, a smart phone, and the like. Wherein, the server for deploying the analysis software may be a cloud server or a server in a local data center.

FIG. 1 illustrates an example in which a performance monitoring system 10 includes m collection nodes and n analysis nodes. Wherein, m and n are positive integers. Specifically, the collection node 100 and the analysis node 200 are connected through a network, and the user can configure different networking modes on the collection node 100 . Through the configuration at the collection node 100 side, one analysis node 200 can access multiple collection nodes 100 at the same time. If the permissions of multiple analysis nodes 200 are opened on the collection node 100, one collection node 100 can also be shared by multiple analysis nodes 200, that is, the performance information collected by one collection node 100 can be used by multiple shared analysis nodes 200.

The collecting node 100 may be installed on the collected node. The collected nodes can be large, medium or small devices. Among them, large-scale equipment can include computing clusters in data centers (for example, server clusters), medium-sized equipment can be robots, streaming media equipment, electric vehicles, etc., and small equipment can be drones and virtual reality (virtual reality, VR). The collected node has a hardware debugging interface, which may be, for example, a tracing (tracing) interface. The collection node 100 may be connected to the collected node through the hardware debugging interface of the collected node.

The collection node 100 is configured to read performance information through a hardware debugging interface of the collected node. The performance information may include one or more of the number of hardware events counted by PMU registers, application snapshot (snapshot) and tracing (tracing) information.

Among them, the CPU can count various hardware events through the PMU. For example, the CPU can access the first PMU register through the PMU to obtain the number of cache miss events. For example, the CPU can access the second PMU register through the PMU to obtain branch prediction errors. The number of times the event occurred. Application snapshots are used to reflect the state of the application at a certain time. In the performance tuning scenario, an application snapshot refers to a data group formed by a combination of information generated by the chip's internal registers (different from the PMU registers). This data group can reverse the state of the chip executing the application program code at runtime. The state can be, for example, any of running, hibernating, waiting, resident, or monitoring. The trace information includes any one or more of the running status of the application and the function call stack when the hardware event occurs.

The collection node 100 can read the data in the register of the collected node through the hardware debugging interface of the collected node, so as to obtain the performance information. Further, considering that the data volume of the performance information is relatively large, an independent storage unit may also be set in the collected node for buffering the data in the register. Wherein, the independent storage component is a storage component independent of the CPU main body in the collected node, and the independent storage component may be a cache cache or a buffer buffer or the like. The collection node 100 can read the data transmitted from the register of the collected node to the independently stored component through the hardware debugging interface of the collected node to obtain performance information.

The analysis node 200 is configured to receive the performance information sent by the collection node 100, analyze the performance information, and obtain an analysis result. Specifically, the analysis node 200 can analyze the received performance information in a statistical manner, for example, the analysis node 200 can determine the sum of the number of occurrences of the cache hit event and the number of occurrences of the cache miss event, and then determine the number of occurrences of the cache hit event. The ratio of the number of times in the above sum value, so as to obtain the cache hit rate. The analysis result may include the cache hit ratio described above.

Further, the analysis node 200 can also present the analysis results in a graph form. Specifically, the analysis node 200 may present the analysis results to the user through any one or more of a line graph, a histogram, a flame graph, or a table. The analysis node 200 may also generate a performance monitoring report according to at least one of a line graph, a histogram, a flame graph or a table of the analysis results, and output the performance monitoring report.

In some possible implementation manners, the collection node 100 may perform preparatory work first, for example, the collection node 100 may perform network configuration, and then collect performance information. Specifically, the collection node 100 may receive a user-configured analysis node address list. Wherein, the analysis node address list includes the address of at least one analysis node 200, and the address may include a uniform resource locator (uniform resource locator, URL) address, an Internet protocol (Internet protocol, IP) address, a message queue telemetry transmission (message queuing telemetry transport, MQTT) address at least one. The collection node 100 performs network configuration by adding the address of at least one analysis node 200 . Further, the collection node 100 may add addresses of multiple analysis nodes 200 , so that the performance information collected by the collection node 100 may be shared by multiple analysis nodes 200 .

It should be noted that the analysis node address list may include an identifier of at least one analysis node 200 and an address of the analysis node 200 . Wherein, the identifier of the analysis node 200 may be, for example, a universally unique identifier (UUID) of the analysis node 200 .

After the network configuration of the collection node 100 is completed, it can be restarted. Then the collection node 100 may send a join request to the analysis node 200 according to the address of the added analysis node 200 . When the collection node 100 sends a joining request to multiple analysis nodes 200 , at least one analysis node 200 among the multiple analysis nodes 200 may save the identification and address of the collection node 100 , and then return a join success notification to the collection node 100 . Similarly, the identifier of the collection node 100 may be the UUID of the collection node, and the address of the collection node may be at least one of URL address, IP address or MQTT address.

When the collection node 100 joins successfully, the analysis node 200 may instruct the collection node 100 to collect performance information. In some possible implementation manners, the analysis node 200 may receive collection prompt information input by a user. The collection prompt information is used to prompt the collection node 100 to collect performance information, for example, the collection prompt information may include an application code collection range and at least one collection item. Each acquisition item can correspond to a hardware event. Then the analysis node 200 can read the performance information through the hardware debugging interface of the collected node according to the collection prompt information.

Wherein, the analysis node 200 may provide a configuration interface, and the configuration interface is an interactive interface (user interface, UI) supporting user interaction. The UI may be a graphical user interface (graphical user interface, GUI) or a command user interface (command user interface, CUI). The following uses the configuration interface as a GUI for an example description.

Referring to the schematic diagram of the configuration interface 20 shown in FIG. 2 , the configuration interface 20 includes a first input box 202 for configuring a code collection range and a second input box 206 for configuring a collection item. The user can input the address of the code segment to be collected in the first input box 202, thereby configuring the code collection range. Similarly, the user can input the hardware events to be collected in the second input box 206, so as to configure the collection items.

In some possible implementations, the configuration interface 20 further includes a browsing control 204, and the user can trigger the browsing control 204 to browse the code file, and then select a code segment in the code file to configure the code collection range. The configuration interface 20 may also include a drop-down control 208. When the drop-down control 208 is triggered, the configuration interface 20 displays a drop-down box 210, and the user can select hardware events to be collected from the drop-down box 210 to configure the collection items.

The configuration interface 20 also includes a confirm control 212 and a cancel control 214. When the confirm control 212 is triggered, the analysis node 200 can generate a configuration file according to the code collection scope and collection items configured by the user. Further, the analysis node 200 may deliver the configuration file to the collection node 100 .

The collection node 100 can read the configuration file, configure the configuration register in the collected node according to the collection prompt information carried in the configuration file, and select the connection between the target register in the collected node and the independent storage unit in the collected node. Wherein, the target register includes a register matching the collection prompt information, and the independent storage component refers to a storage component independent of the CPU main body in the collected node. When the application is running, the collection node 100 can read the data transmitted from the target register to the independent storage unit through the hardware debugging interface of the collected node to obtain performance information.

The key to implementing performance monitoring by the performance monitoring system 10 lies in the collection node 100 , and the hardware structure of the collection node 100 will be described in detail below.

Referring to the schematic diagram of the hardware structure of the collection node 100 shown in FIG. 3 , the collection node 100 includes a hardware debugging interface 102 , a control unit 104 , a network transceiver unit 106 and a network output interface 108 . Wherein, the hardware debugging interface 102 and the network transceiver unit 106 are respectively connected to the control unit 104 , and the network output interface 108 is connected to the network transceiver unit 106 . In some possible implementation manners, the collection node 100 further includes a configuration interface 103 . The configuration interface 103 is connected to the control unit 104 .

The hardware debugging interface 102 is adapted to the hardware debugging interface of the collected node, so as to transmit information between the collecting node 100 and the collected node, for example, transmit performance information. Wherein, the hardware debugging interface 102 of the collection node 100 can be connected with the hardware debugging interface of the collected node through a cable. Further, the collection node 100 may be connected to multiple collected nodes in the form of a daisy chain topology, so as to implement performance monitoring on multiple collected nodes.

The configuration interface 103 is used to receive an analysis node address list for network configuration. The configuration interface 103 is also used to receive the configuration file sent by the analysis node 200, for example, to receive the collection prompt information carried in the configuration file. The information transmission of the configuration interface 103 may be implemented in various manners. For example, the configuration interface 103 can be configured via a universal synchronous asynchronous receiver and transmitter (USART), a universal asynchronous receiver and transmitter (UART), a serial peripheral interface (serial peripheral interface bus, SPI) , IC bus (Inter-Integrated Circuit, I2C) or buttons and other electronic protocol methods to realize information transmission. For another example, the configuration interface 103 can implement information transmission through infrared communication technology proposed by the Infrared Data Association (IrDA), or implement information transmission through ultrasonic waves, lasers, and the like.

The control unit 104 is used to read the performance information from the collected nodes through the hardware debugging interface 102 , forward the read performance information to the network transceiver unit 106 , and output it through the network output interface 108 . Wherein, the control unit 104 is also used to establish a connection with the analysis node 200 according to the configuration of the configuration interface 103 when starting up for the first time after the network configuration is completed, so as to transmit information with the analysis node 200 . For example, the control unit 104 may receive a configuration file, and configure the collection node 100 and the collected node, so that the collection node 100 collects performance information according to the collection prompt information.

The network transceiver unit 106 is used to receive the performance information from the control unit 104, and encapsulate the performance information through a protocol, so that the network output interface 108 forwards the encapsulated performance information to the analysis node 200. Wherein, the network transceiver unit 106 can support at least one protocol, such as Transmission Control Protocol/Internet Protocol (Transmission Control Protocol/Internet Protocol, TCP/IP), User Datagram Protocol (UDP) protocol, Zigbee (zigbee) , Bluetooth, wireless communication (Wi-Fi) protocol, MQTT, wireless hart, modbus, industry standard architecture (Industry Standard Architecture, ISA), etc. at least one.

The network output interface 108 is used to forward the performance information (for example, the encapsulated performance information) to the analysis node 200 in a wireless or wired manner. Among them, the wired method includes optical fiber, network cable, etc., and the wireless method includes the fifth generation (the fifth generation, 5G) mobile communication, the fourth generation (the forth generation, 4G) mobile communication, the third generation (the third generation, 3G), The second generation (the second generation, 2G) mobile communication, etc.

Considering that there are various environments where performance tuning is required, in order to adapt to different performance tuning scenarios, the collection node 100 may have multiple implementation forms. For example, the collection node 100 may be powered by alternating current or batteries. When the collected node is a fixed collected node, the performance information of the application in the collection node can be collected through the collection node 100 powered by AC power; when the collected node is a mobile collected node, the collection node 100 powered by a battery can The performance information of the application in the collected node is collected.

The implementation manners of the collection node 100 in different scenarios are introduced respectively below.

In a large data center scenario, the collected nodes are computing devices such as servers in the large data center, and these computing devices are usually fixed, so the collecting node 100 can be fixed, and one collecting node 100 can support collecting multiple collected Performance information for applications in the node. For example, one collecting node 100 may support collecting performance information of applications in 64 collected nodes. Specifically, the collection node 100 can be connected to 64 collected nodes without plugging and unplugging. After the process of collecting performance information is triggered, the collection node 100 can read the performance information of the 64 collected nodes in time-sharing.

Referring to the schematic structural diagram of the collection node 100 shown in FIG. 4, in a large-scale data center scenario, the collection node 100 can be powered by an AC power supply. Therefore, the network transceiver unit 106 and the network output interface 108 do not need to consider power consumption, and can be connected through the 5G module accomplish. The main part of the 5G module is used to realize the function of the network transceiver unit 106 , and the 5G antenna of the 5G module is used to realize the function of the network output interface 108 . The hardware debugging interface 102 may be a joint test action group (jtag) interface, and the control unit 104 may be realized by a coresight component, for example, the control unit 104 may be a coresight specification reading module.

In medium mobile device scenarios, such as robots and electric vehicles, in order to avoid loss of power supply to the collected nodes (ie, robots or electric vehicles), the collection node 100 usually has a battery. Referring to the schematic structural diagram of the collection node 100 shown in FIG. 5 , the collection node 100 is powered by a battery, the network transceiver unit 106 and the network output interface 108 of the collection node 100 are realized by a 5G module, and the hardware debugging interface of the collection node 100 can be jtag interface, the control unit 104 of the collection node 100 may be implemented by a coresight component, for example, the control unit 104 may be a coresight specification reading module.

In lightweight device scenarios, such as drones or wearable devices, considering that drones or wearable devices are sensitive to power or weight, they are generally not equipped with batteries that are too heavy, and other means of energy saving. Referring to the schematic structural diagram of the collection node 100 shown in FIG. 6 , the collection node 100 is powered by a battery with a relatively small weight, the hardware debugging interface 102 of the collection node 100 can be a jtag interface, and the control unit 104 of the collection node 100 can pass the coresight component Implementation, for example, the control unit 104 may be a coresight specification reading module. The network transceiver unit 106 and the network output interface 108 of the collection node 100 are realized by an energy-saving communication module such as a Zigbee communication module.

1 to 6 illustrate the structure of the performance monitoring system 10 and the collection node 100 in the embodiment of the present application in detail. Next, the performance monitoring method of the embodiment of the present application is introduced from the perspective of the performance monitoring system 10 .

Referring to the flow chart of the performance monitoring method shown in Figure 7, the method includes:

S702: The collection node 100 receives the analysis node address list configured by the user, and adds the address of the analysis node 200.

The analysis node address list includes the address of at least one analysis node 200 . The address of the analysis node 200 may be any one or more of URL address, IP address or MQTT address. The collection node 100 can be powered on first, and then configure the network of the current collection node 100 through the configuration interface 103 of the collection node 100 . Specifically, the collection node 100 receives the analysis node address list through the configuration interface 103 , and then adds the address of at least one analysis node 200 in the analysis node address list, so as to configure the network of the collection node 100 .

When the collection node 100 adds addresses of multiple analysis nodes 200 , the performance information collected by the collection node 100 can be shared by the multiple analysis nodes 200 . In this embodiment, the collection node 100 adds addresses of multiple analysis nodes 200 for illustration. Furthermore, the collection node 100 can connect multiple collected nodes in a daisy chain form, and as long as a remote analysis node 200 is added to a collection node 100, the analysis node 200 can access this collection node 100. For the node to be collected, there is no need to replace the collection node 100 because there is a new analysis node 200 . In this way, the movement of nodes can be reduced and the flexibility of networking can be improved.

In this embodiment, each record in the address list of the analysis node 200 may include the identifier and address of the analysis node 200 . Wherein, the identifier of the analysis node 200 may be the UUID of the analysis node 200 , and the UUID of the analysis node 200 is written into the analysis node 200 before leaving the factory, so as to distinguish it from other analysis nodes 200 .

S704: The collection node 100 sends a join request to multiple analysis nodes 200 according to the analysis node address list.

Specifically, after the network configuration is completed, the collection node 100 can be restarted. Then the collection node 100 may send a join request to the multiple analysis nodes 200 according to the addresses of the multiple analysis nodes 200 added in the analysis node address list. The join request is specifically used to join the network where the analysis node 200 is located. The joining request may carry the identification and address of the collection node 100, so that the analysis node 200 may add the collection node 100 to the network of the analysis node 200 based on the identification and address.

Wherein, when the collection node 100 sends a join request to multiple analysis nodes 200, the join request may be sent in a polling manner. In other possible implementation manners of this embodiment of the present application, the collection node 100 may send the join request in other manners, for example, in a concurrent manner.

S706: At least one analysis node 200 sends a join success response.

At least one analysis node 200 among the plurality of analysis nodes 200 can add the address of the collection node 100, for example, add the identification and address of the collection node 100, so as to add the collection node 100 to the network of the analysis node 200, and then the analysis node 200 sends The collection node 100 returns a join success response. The joining success response is used to notify the collection node 100 of joining success.

S708: The analysis node 200 receives the collection prompt information input by the user.

The collection prompt information is used to prompt the collection node 100 to collect performance information. Based on this, the collection prompt information may include a code collection range and at least one collection item. Wherein, the code collection scope is used to indicate the code fragments that need to collect performance information, and then perform performance monitoring. The range of code collection can be represented by the address of the code fragment. Each acquisition item can correspond to a hardware event. For example, one collection item may correspond to a cache miss event, and another collection item may correspond to a branch misprediction event.

Next, take the scenario of performance monitoring of servers in a large data center as an example to illustrate the process of configuring the collection range of collection codes and collection items for users.

Referring to the schematic diagram of configuring the code collection range shown in Figure 8, when configuring the code collection range, the user can first determine the source code fragment 802 to be tracked from the application that needs to monitor performance, and then view the compiled code to obtain the above source code. The address of the code segment 804 after the code segment is compiled, so that the collection node 100 can determine the code segment that needs to collect performance information according to the address. Wherein, the address of the compiled code segment 804 of the source code segment can be characterized by a start offset and a length.

Next, referring to the schematic diagram of the configuration acquisition project shown in FIG. 9 , in this example, a tracing module is added inside the CPU chip of the server. That is, the CPU chip of the server includes two parts: the CPU main body and the tracing module. Processes or threads usually run on the main part of the CPU and occupy resources. The tracing module is connected to some registers of the CPU main body through an independent tracing channel to obtain performance information.

Specifically, when the CPU chip is manufactured, registers related to performance or program operation are connected to the tracing channel, and the user can configure the register address through the configuration interface provided by the analysis node 200, and the analysis node 200 sends the register address to the collection node 100, so The collection node 100 can directly connect the register corresponding to the above register address to the independent storage unit in the tracing module through the tracing channel through the configuration module in the tracing module. In this way, when the CPU chip is running, the performance information in the above registers will be automatically pushed to the corresponding independent storage components. Since the tracing module is independent of the CPU main body, when the CPU chip runs the current application, the tracing module is in a state of passively receiving performance information, and as an independent circuit, it will not affect the operation of the CPU main body at all. Furthermore, users can also configure parameters such as collection frequency and overflow value, so as to collect performance information according to these parameters.

It should be noted that the tracing-related pins in the CPU chip of the server can be connected to an external hardware debugging interface on the main board, and the collection node 100 has a corresponding hardware debugging interface connected to the hardware debugging interface of the server, so as to facilitate hardware debugging. The interface collects performance information.

S710: The analysis node 200 generates a configuration file according to the collected prompt information.

Specifically, the analysis node 200 may assemble different collection prompt information configured by the user into a configuration file. In some embodiments, the analysis node 200 may acquire a configuration file template, and then fill different collection prompt information into corresponding positions of the configuration file template, thereby generating a configuration file. The configuration file usually has a specific format, for example, a format recognizable by the collection node 100 .

S712: The analysis node 200 sends the configuration file to the collection node 100.

The analysis node 200 may automatically send the configuration file to the collection node 100 after generating the configuration file, or may send the configuration file to the collection node 100 in response to a user-triggered configuration file download operation, which is not limited in this embodiment.

S714: The collection node 100 collects performance information from the hardware debugging interface of the collected node according to the configuration file.

Specifically, the collection node 100 receives the configuration file, and can configure the configuration register in the collection node according to the collection prompt information in the configuration file, for example, the configuration register in the configuration module (as shown in FIG. 9 ) of the collected node, so as to A line that gates the target register in the node being harvested to the independent storage element in the node being harvested. The target register may be a register matching the collection prompt information, such as a register corresponding to a register address in the collection prompt information. Wherein, when the target register is a plurality of registers, the plurality of registers may be connected to different storage spaces of the independent storage unit. In this way, when the application is running, the collection node 100 can respectively read different performance information from different storage spaces of the independent storage component through the hardware debugging interface of the collected node.

Wherein, the collection node 100 may read the data in the corresponding storage space at intervals according to the collection frequency in the configuration file, so as to collect performance information.

S716: The collection node 100 sends performance information.

S718: The collection node 100 analyzes the performance information, and obtains an analysis result.

Specifically, the collection node 100 may analyze the performance information through a statistical method, so as to obtain an analysis result. For example, the collection node 100 may determine the cache hit ratio based on the number of cache event misses and the number of cache event hits.

Further, the collection node 100 can also present the analysis results to the user in the form of graphs. For example, the collection node 100 may generate at least one of a line graph, a histogram, a flame graph or a table according to the analysis result, and then present the line graph, histogram, flame graph or table to the user.

The above S702 to S706 is a specific implementation of networking the collection node 100 and the analysis node 200, and the above S702 to S706 may not be executed to execute the performance monitoring method of the embodiment of the present application. The above S708 to S714 are a specific implementation of the collection node 100 reading the performance information through the hardware debugging interface of the collected node. The hardware debug interface for reading performance information.

In addition, the collection node 100 may also include any one or more of a position sensor, an acceleration sensor, an air temperature sensor, or an air pressure sensor, etc., so that the collection node 100 may also return position information, acceleration information, air temperature information, or air pressure information, so that The analysis node 200 analyzes the operation status of the application under complex external conditions.

Based on the above description, the performance monitoring method provided by the embodiment of the present application reads the performance information by using the hardware debugging interface of the collected node without installing the collection software, which breaks through the limitation of the software and can realize the monitoring of the collected nodes of different operating systems. Performance monitoring with high availability. Moreover, when the method collects performance information, no additional process or thread will be started, and the application on the collected node will not be affected. The analysis results obtained by analyzing the performance information based on the performance information are more accurate, which improves the reliability of performance monitoring .

In addition, the method supports free configuration of the network, and the transmission performance information of the collection node 100 will not generate load on the current network, which reduces the pressure on the current network and can transmit more and more detailed performance information. In addition, one collection node 100 can be connected to one or more collected nodes, thus reducing the number of collection nodes 100 that need to be configured, and simplifying the workload of installation and debugging of collection nodes 100 .

The performance monitoring system 10, the collection node 100 and the performance monitoring method performed by the performance monitoring system 10 provided by the embodiment of the present application are introduced above in conjunction with FIG. 1 to FIG. The program product is described.

The embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium may be any available medium that a computing device can store, or a data storage device such as a data center that includes one or more available media. The available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state hard disk), etc. The computer-readable storage medium includes instructions, and the instructions instruct the computing device to execute the steps performed by the analysis node 200 in the above performance monitoring method applied to the performance monitoring system 10 .

The embodiment of the present application also provides a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computing device, the processes or functions according to the embodiments of the present application will be generated in whole or in part.

The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g. (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wirelessly (such as infrared, wireless, microwave, etc.) to another website site, computer or data center.

The computer program product may be a software installation package that can be downloaded and executed on a computing device if any of the aforementioned performance monitoring methods are required.

The description of the process or structure corresponding to each of the above drawings has its own emphasis. For the part that is not described in detail in a certain process or structure, you can refer to the relevant description of other processes or structures.

Claims

A performance monitoring system, characterized in that the system is used to monitor the performance of the application on the collected node, the system includes a collection node and an analysis node, and the collection node uses the hardware debugging interface of the collected node Connect with the collected node:

The collection node is configured to read performance information through a hardware debugging interface of the collected node;

The analysis node is configured to receive the performance information sent by the collection node, analyze the performance information, and obtain an analysis result.
The system according to claim 1, wherein the collection node is specifically used for:

The performance information is obtained by reading the data in the register of the collected node through the hardware debugging interface of the collected node.
The system according to claim 1, wherein the collection node is specifically used for:

The performance information is obtained by reading the data transmitted from the register of the collected node to the independent storage unit of the collected node through the hardware debugging interface of the collected node.
The system according to any one of claims 1 to 3, wherein the analysis node is also used for:

Receive the collection prompt information input by the user;

The collection node is specifically used for:

According to the collection prompt information, the performance information is read through the hardware debugging interface of the collected node.
The system according to claim 4, wherein the collection prompt information includes a code collection range of the application and at least one collection item, and each collection item corresponds to a hardware event.
The system according to claim 4, wherein the collection node is specifically used for:

According to the collection prompt information, configure the configuration register in the collected node to gate the target register in the collected node and the line of the independent storage component of the collected node, the target register includes the The register that matches the collection prompt information;

When the application is running, the data transmitted from the target register to the independent storage unit is read through the hardware debugging interface of the collected node to obtain performance information.
The system according to any one of claims 1 to 3, wherein the system includes a plurality of analysis nodes, and the collection node is also used for:

Receive user-configured analysis node address list;

Send a join request to the plurality of analysis nodes according to the analysis node address list;

At least one analysis node in the plurality of analysis nodes is used for:

Send a join success notification to the collection node.
The system according to claim 7, wherein the collection node is also used for:

Before sending the joining request to the multiple analysis nodes, add the multiple analysis nodes according to the analysis node address list, so that the performance information collected by the collection node is shared by the multiple analysis nodes.
The system according to any one of claims 1 to 3, wherein the collection node has a hardware debugging interface, and the hardware debugging interface of the collection node is connected to the hardware debugging interface of the collected node through a cable.
The system according to claim 9, wherein the collected nodes include a plurality of collected nodes, and the network topology of the collected nodes and the multiple collected nodes connected to the collected nodes is a daisy chain topology.
The system according to any one of claims 1 to 3, wherein the collection node is powered by alternating current or battery, and the collection node powered by the alternating current is used to collect the performance of applications in fixed collected nodes information, the collection node powered by the battery is used to collect the performance information of the application in the mobile collected node.
A performance monitoring method, characterized in that it is applied to a performance monitoring system, the system is used to monitor the performance of the application on the collected node, the system includes a collection node and an analysis node, and the collection node passes the collected node The hardware debugging interface of the collection node is connected with the collected node, and the method includes:

The collection node reads the performance information through the hardware debugging interface of the collected node;

The analysis node receives the performance information sent by the collection node, analyzes the performance information, and obtains an analysis result.
The method according to claim 12, wherein the collection node reads the performance information through the hardware debugging interface of the collected node, comprising:

The collection node reads the data in the register of the collected node through the hardware debugging interface of the collected node to obtain performance information.
The method according to claim 12, wherein the collection node reads the performance information through the hardware debugging interface of the collected node, comprising:

The collection node reads the data transmitted from the register of the collected node to the independent storage component of the collected node through the hardware debugging interface of the collected node, and obtains performance information.
The method according to any one of claims 12 to 14, further comprising:

The analysis node receives the collection prompt information input by the user;

The collection node reads the performance information through the hardware debugging interface of the collected node, including:

According to the collection prompt information, the performance information is read through the hardware debugging interface of the collected node.
The method according to claim 15, wherein the collection prompt information includes the code collection range of the application and at least one collection item, and each collection item corresponds to a hardware event.
The method according to claim 15, wherein the collection node reads the performance information through the hardware debugging interface of the collected node, comprising:

The collection node configures the configuration register in the collected node according to the collection prompt information, so as to select the target register in the collected node and the line of the independent storage component in the collected node;

When the application is running, the collection node reads the data transmitted from the target register to the independent storage unit through the hardware debugging interface of the collected node to obtain performance information.
The method according to any one of claims 12 to 14, wherein the system comprises a plurality of analysis nodes, and the method further comprises:

The collection node receives a user-configured analysis node address list;

The collection node sends a join request to the plurality of analysis nodes according to the analysis node address list;

At least one analysis node among the plurality of analysis nodes sends a joining success notification to the acquisition node.
The method according to claim 18, wherein, before the collection node sends a joining request to the plurality of analysis nodes, the method further comprises:

Adding the multiple analysis nodes according to the analysis node address list, so that the performance information collected by the collection node is shared by the multiple analysis nodes.
The method according to any one of claims 12 to 14, wherein the collection node has a hardware debugging interface, and the hardware debugging interface of the collection node is connected to the hardware debugging interface of the collected node through a cable.
The method according to any one of claims 12 to 14, wherein the collected nodes include multiple nodes, and the network topology of the collected nodes and the multiple collected nodes connected to the collected nodes is a chrysanthemum chain topology.
The method according to any one of claims 12 to 14, wherein the collection node is powered by alternating current or battery;

The collection node reads the performance information through the hardware debugging interface of the collected node, including:

Collecting performance information applied in a fixed collected node through the collection node powered by the alternating current; or,

The collection node powered by the battery collects the performance information of the application in the mobile collected node.
A collection node, characterized in that the collection node is connected to the collection node through a hardware debugging interface of the collection node, and the collection node is used to execute the method described in any one of claims 12 to 22. Steps performed by the collection node in the performance monitoring method.
A computer-readable storage medium, characterized in that it includes computer-readable instructions, and when the computer-readable instructions are run on a computing device or a computing device cluster, the computing device or the computing device cluster is executed as claimed in claim 12. Steps performed by the analysis node in the performance monitoring method described in any one of to 22.
A computer program product, characterized in that it comprises computer-readable instructions which, when run on a computing device or a cluster of computing devices, cause the computing device or cluster of computing devices to perform the A step performed by an analysis node in any one of the performance monitoring methods.