CN111858243A

CN111858243A - Multi-hardware event monitoring count value estimation method based on exponential increase

Info

Publication number: CN111858243A
Application number: CN202010678027.1A
Authority: CN
Inventors: 王一超; 王杰; 文敏华; 韦建文; 林新华
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2020-07-15
Filing date: 2020-07-15
Publication date: 2020-10-30
Anticipated expiration: 2040-07-15
Also published as: CN111858243B

Abstract

A multi-hardware event monitoring count value estimation method based on exponential growth is characterized in that a life cycle of a working process is maintained through a main thread, a related data structure is created and initialized, a slave thread control signal is sent, a monitored application is operated, hardware event scheduling, timing alternate monitoring and post-processing estimation are carried out through the slave thread responding to the life cycle signal of the main thread, and a multi-hardware event monitoring count value is obtained through reading a hardware event counting register built in a CPU. According to the invention, the hardware event count value on the non-monitoring time slice is filled through the exponential growth estimation algorithm, so that the accuracy of the multi-hardware event monitoring count library based on the MPX technology can be improved, and the usability of the monitoring result under the MPX is enhanced.

Description

Multi-hardware event monitoring count value estimation method based on exponential increase

Technical Field

The invention relates to a technology in the field of semiconductor performance optimization, in particular to a method for estimating a monitoring count value of a multi-hardware event (hardware event) based on exponential growth.

Background

Currently, there are two modes of collecting data in the hardware event register. One is called OCOE (one counter onevent). In this mode, a register only records one hardware event during the entire program run. The recording mode can completely record the occurrence of each hardware event. One is called Multiplexing (MPX). The MPX technology divides the available time of each register into different time slices by a time division multiplexing method, different hardware events are monitored on the different time slices in turn, and the registers can only provide data of the hardware events in the time slices on the registers.

Disclosure of Invention

The invention provides a multi-hardware event monitoring counting value estimation method based on exponential increase, aiming at the problems that in the prior art, the unmonitored missing value and the precision of MPX are insufficient, all hardware event behavior modes cannot be completely covered, and the precision of MPX measurement results is reduced, and the event is supposed to gradually evolve from an initial value to a tail value through constant multiple increase on an unmonitored time slice between two continuous sampling values, and the hardware event counting value on the unmonitored time slice is filled through an exponential increase estimation algorithm, so that the accuracy of a multi-hardware event monitoring counting library based on MPX technology can be improved, and the usability of the monitoring results under MPX is enhanced.

The invention is realized by the following technical scheme:

the invention relates to a multi-hardware event monitoring count value estimation method based on exponential growth, which respectively maintains the life cycle of a working process through a main thread, creates and initializes a related data structure and a slave thread, sends a slave thread control signal and runs a monitored application, responds to the life cycle signal of the main thread through the slave thread, carries out hardware event scheduling, timing alternate monitoring and post-processing estimation, and obtains the multi-hardware event monitoring count value by reading a hardware event counting register built in a CPU.

Technical effects

The invention integrally solves the defect of insufficient precision caused by lack of metadata due to the mechanism intermittent monitoring of hardware events in the existing MPX; compared with the prior art, the method can obviously improve the working precision of the hardware event monitoring software in the MPX mode, thereby overcoming the reliability problem caused by insufficient precision of the MPX mode in the actual industrial production.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a schematic diagram of a hardware event scheduling process;

FIG. 3 is a diagram of the original MPX estimation strategy;

fig. 4 is a schematic diagram of the MPX estimation strategy in the present method.

Detailed Description

As shown in fig. 1, the present embodiment relates to a method for estimating a cpu multiple hardware event monitoring count value based on exponential growth, which includes the following steps:

step 1: the PAPI hardware performance collection framework master thread initializes the current state and creates a slave thread for monitoring hardware events.

The state initialization comprises the following steps: initializing the PAPI internal global variables, obtaining current operating system information, creating a file descriptor for recording hardware event count values, enabling MPX mode, and setting a monitoring slot length.

Step 2: and after the global initialization is completed, the master thread creates an event set (eventset) for saving the events to be monitored, binds the event set to the slave thread, and starts the life cycle of the workflow.

And step 3: the main thread adds the hardware event to be monitored to the step 2 event set.

And 4, step 4: and (3) the main thread sends a monitoring start signal to the slave thread, the slave thread starts to read the monitoring result from a hardware event counting register built in the CPU periodically according to the time slice length set in the step (1), and the monitoring result is written into a file descriptor corresponding to the current hardware event.

The monitoring result is that: the monitored hardware event count value, the program running time length and the current hardware event monitoring time length sequence data.

And 5: and (4) starting the monitored program in the main thread, monitoring the hardware events to be monitored added in the step (3) in turn by the slave thread according to the monitoring time slice set by initialization, and collecting the monitoring result and writing the monitoring result into the file descriptor to obtain the hardware event time sequence data.

As shown in fig. 2, the specific steps of the timed alternate monitoring include:

step 5.1: the scheduling system creates queues equal in number to the event registers, each queue corresponding to an event register one-to-one.

Step 5.2: and storing any hardware event into a queue corresponding to all event registers capable of monitoring the event.

Step 5.3: all queues are randomly ordered.

Step 5.4: and checking the current events of all the head of the queue, and when the head of the queue events are repeated, putting the head of the queue event at the back of the queue, and sequencing the later events to push forward by one bit.

Step 5.5: repeat step 5.4 until all head of line events are not repeated. All non-head-of-line events that repeat with the current head-of-line event are put to the tail of the line, and the sequence of events after the repeat event is pushed forward by one bit.

Step 5.6: and putting the current head of line event on a corresponding event register for monitoring.

Step 5.7: and after the current time slice expires, taking down the event on the event register and placing the event at the tail of the corresponding queue.

Step 5.8: and repeating the step 5.4 to the step 5.7 until receiving the end signal of the main thread.

Step 6: after the monitored program ends, the master thread sends a stop signal to the slave thread, and the slave thread stops monitoring, and the process proceeds to step 7.

And 7: and (4) the slave thread performs post-processing estimation on the hardware event time series data collected in the step (5), sends the result to the master thread, and the master thread outputs the result.

The hardware event time sequence data comprises: the actual hardware event accumulated count value c, the accumulated running time r of the monitored program, and the accumulated monitored time e of the monitored hardware event.

The post-processing estimation comprises the following specific steps:

step 7.1) performing first-order difference on all the hardware event time sequence data from the thread to obtain a difference value of the hardware event time sequence data, namely: the actual single-time slice count value C, the single-time slice running time length R of the monitored program and the single-time slice monitored time length E of the monitored hardware event.

Step 7.2) reading time series data of a hardware event from the thread, where the i-th differential counter value C of the time series data is defined_iI +1 th differential count value C_i+1I th differential run-time length R_iI th differential monitored duration E_iAnd sequentially calculating the ratio of the (i + 1) th count value to the (i) th count value

Multiple of growth of a single time slice

Number of unmonitored time slices

Step 7.3) repeat estimation: a jth unmonitored count value between the ith count value and the (i + 1) th count value

Until the count values over all n unmonitored time slices are estimated. And accumulating all the monitoring values and the estimated values to obtain the total count value of the current hardware event.

And 7.4) repeating the step 7.2-7.3 until the total count value of all the monitored hardware events is obtained, and finishing the processing work after the hardware events are monitored and counted.

And 8: and the slave thread stops exiting, and the main thread destroys the data structure memory in the running process and stops exiting.

And step 9: the hardware event monitor counting operation ends.

On the basis of the embodiment, the correctness of the method is verified through a related Benchmark program rondia Benchmark Suite, and meanwhile, the accuracy of the method is improved to a different extent in different Benchmark kernel programs compared with that of MPX of the original edition.

The specific development of the embodiment is the secondary packaging development based on the PAPI, and libraries and software such as Linux Perf, HPCToolkit, Intel Vtune, Gooda and the like which use MPX technology to monitor and count multiple hardware events are also applicable. The specific operating platform of this example is a common rack-mounted Intel X86 server, which is equipped with a CentOS 7.664 bit operating system and is equipped with two Intel Xeon Gold 6248 processors and 192GB main memory. Firstly, a library based on the PAPI is required to be created, so that the library intercepts the intermediate output file descriptor of the PAPI, and the recorded data is estimated and processed by the method: with 100ms as a monitoring period, all, cond, brmis, all, brmis, cond, dtlm, m, l1lh, l1lm, l2lh, l2lm, ldam, ich, icm, uishall, urshall and inst16 hardware events in five types of Rodinia Benchmark Suite applications are monitored, and the accuracy improvement of 5% -59% is obtained.

Compared to the master MPX post-processing strategy in fig. 3, the present embodiment integrates the monitored values around the interpolation point. Extreme estimation values caused when two continuous monitoring values have large difference are avoided, and an exponential growth multiplying power method is introduced to cover wider change rules, so that the accuracy of MPX estimation is improved.

Compared with the prior art, the method obtains the data estimation which is closer to the real data distribution through the variation trend based on exponential growth, thereby obtaining the precision improvement.

The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims

1. A multi-hardware event monitoring count value estimation method based on exponential growth is characterized in that a life cycle of a working process is maintained through a main thread, a related data structure and a slave thread are created and initialized, a slave thread control signal is sent and a monitored application is operated, the slave thread responds to the life cycle signal of the main thread, hardware event scheduling, timing alternate monitoring and post-processing estimation are carried out, and a multi-hardware event monitoring count value is obtained by reading a hardware event counting register built in a CPU.

2. The method for estimating the multiple hardware event supervision count value based on exponential growth according to claim 1, wherein the post-processing estimation specifically comprises:

step 1) performing first-order difference on all hardware event time sequence data from a thread to obtain a difference value of the hardware event time sequence data, namely: actual single-time slice count value C, single-time slice running time length R of a monitored program, and single-time slice monitored time length E of a monitored hardware event;

step 2) reading time series data of a hardware event from the thread, wherein the ith differential counter value C of the time series data is defined_iI +1 th differential count value C_i+1I th differential run-time length R_iI th differential monitored duration E_iAnd sequentially calculating the ratio of the (i + 1) th count value to the (i) th count value

Multiple of growth of a single time slice

Number of unmonitored time slices

Step 3) repeated estimation: the ith count value and the (i + 1) th count valueJ-th unmonitored count value in between

Until the count values on all n unmonitored time slices are estimated; accumulating all the monitoring values and the estimated values to obtain a total count value of the current hardware event;

and 4) repeating the step 2 to the step 3 until the total count value of all the monitored hardware events is obtained, and finishing the processing work after the hardware events are monitored and counted.

3. The method for estimating the multi-hardware event monitoring count value based on exponential growth according to claim 1, wherein the timing alternate monitoring specifically comprises:

step 1: the scheduling system creates queues with the same number as the event registers, and each queue corresponds to the event register in a one-to-one mode;

step 2: storing any hardware event into a queue corresponding to all event registers capable of monitoring the event;

and step 3: randomly sequencing all queues;

and 4, step 4: checking the events of all the current head of the queue, when the events of the head of the queue are repeated, putting the events of the head of the queue at the back of the queue, and sequencing the events at the back of the queue and pushing the events one bit forward;

and 5: repeating the step 4 until all the head-of-line events are not repeated, putting all the non-head-of-line events which are repeated with the current head-of-line event at the tail of the line, and sequencing the events after the repeated events to push forward by one bit;

step 6: putting the current head of line event on a corresponding event register for monitoring;

and 7: taking down the event on the event register and placing the event at the tail of the corresponding queue after the current time slice expires;

and 8: and repeating the steps 4 to 7 until an end signal of the main thread is received.

4. The method for estimating the multi-hardware event monitoring count value based on exponential growth according to any one of claims 1 to 3, characterized by comprising:

step 1, a main thread of a PAPI hardware performance acquisition framework initializes the current state and creates a slave thread for monitoring hardware events;

step 2, after the global initialization is completed by the main thread, an event set used for storing the event to be monitored is created, the event set is bound to the slave thread, and the life cycle of the workflow is started;

step 3, the main thread adds the hardware event to be monitored to the event set in the step 2;

step 4, the main thread sends a monitoring start signal to the slave thread, the slave thread starts to read the monitoring result from the hardware event counting register arranged in the CPU regularly according to the time slice length set in the step 1, and the monitoring result is written into the file descriptor corresponding to the current hardware event;

step 5, the monitored program is started in the main thread, the slave thread monitors the hardware events to be monitored added in the step 3 in turn according to the monitoring time slice set by initialization, and the monitoring result is collected in the step 4 and written into the file descriptor to obtain the hardware event time sequence data;

step 6, after the monitored program is finished, the main thread sends a stop signal to the slave thread, the slave thread stops monitoring, and the process goes to step 7;

And 7, the slave thread performs post-processing estimation on the hardware event time series data collected in the step 5, sends the result to the main thread, and the main thread outputs the result.

5. The method of claim 4, wherein the hardware event time series data comprises: the actual hardware event accumulated count value c, the accumulated running time r of the monitored program, and the accumulated monitored time e of the monitored hardware event.

6. The method of claim 5, wherein the state initialization comprises: initializing the PAPI internal global variables, obtaining current operating system information, creating a file descriptor for recording hardware event count values, enabling MPX mode, and setting a monitoring slot length.

7. The method as claimed in claim 5, wherein the monitoring result is: the monitored hardware event count value, the program running time length and the current hardware event monitoring time length sequence data.