CN111858243B

CN111858243B - Multi-hardware event monitoring count value estimation method based on exponential growth

Info

Publication number: CN111858243B
Application number: CN202010678027.1A
Authority: CN
Inventors: 王一超; 王杰; 文敏华; 韦建文; 林新华
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2020-07-15
Filing date: 2020-07-15
Publication date: 2024-03-19
Anticipated expiration: 2040-07-15
Also published as: CN111858243A

Abstract

A multi-hardware event monitoring count value estimation method based on exponential increase is characterized in that a related data structure and a slave thread are respectively maintained through a life cycle of a work flow of a main thread, the related data structure and the slave thread are created and initialized, a slave thread control signal is sent, a monitored application is operated, hardware event scheduling, timing alternate monitoring and post-processing estimation are carried out through the slave thread in response to the life cycle signal of the main thread, and the multi-hardware event monitoring count value is obtained through reading a hardware event counting register arranged in a CPU. According to the invention, the hardware event count value on the non-monitoring time slice is filled through the exponential growth estimation algorithm, so that the accuracy of the multi-hardware event monitoring count library based on the MPX technology can be improved, and the usability of the monitoring result under the MPX is enhanced.

Description

Multi-hardware event monitoring count value estimation method based on exponential growth

Technical Field

The invention relates to a technology in the field of semiconductor performance optimization, in particular to a multi-hardware event (hardware event) monitoring count value estimation method based on exponential growth.

Background

There are two modes of data collection for hardware event registers. One is known as OCOE (one counter one event). In this mode, one register only records one hardware event during the entire program run. This way of recording can fully record each occurrence of a hardware event. One is called Multiplexing (MPX). MPX techniques divide the available time of each register into different time slices by time division multiplexing, and alternately monitor different hardware events on the different time slices, where the registers can only provide data of the hardware events in the time slices on the registers.

Disclosure of Invention

Aiming at the problems that the unmonitored missing value and the precision of MPX in the prior art are insufficient, all hardware event behavior modes cannot be covered completely, and the precision of MPX measurement results is reduced, the invention provides an exponential-growth-based multi-hardware event monitoring count value estimation method, which is characterized in that on an unmonitored time slice in the middle of two continuous sampling values, events gradually evolve from a start value to an end value through constant multiple growth, the hardware event count value on the unmonitored time slice is filled through an exponential-growth estimation algorithm, the accuracy of an MPX-based multi-hardware event monitoring count library can be improved, and the usability of monitoring results under the MPX is enhanced.

The invention is realized by the following technical scheme:

the invention relates to a multi-hardware event monitoring count value estimation method based on exponential growth, which is characterized in that a slave thread control signal is sent and monitored application is operated through a life cycle of a main thread maintenance workflow, a related data structure is created and initialized and the slave thread is started, the slave thread responds to the life cycle signal of the main thread, hardware event scheduling, timing alternate monitoring and post-processing estimation are carried out, and the multi-hardware event monitoring count value is obtained through reading a hardware event counting register arranged in a CPU.

Technical effects

The invention integrally solves the defect of insufficient precision caused by the lack of metadata due to intermittent monitoring of hardware events by the mechanism of the existing MPX; compared with the prior art, the method can obviously improve the working precision of the hardware event monitoring software in the MPX mode, thereby overcoming the reliability problem caused by insufficient precision of the MPX mode in actual industrial production.

Drawings

FIG. 1 is a schematic flow chart of the present invention;

FIG. 2 is a schematic diagram of a hardware event scheduling flow;

FIG. 3 is a schematic diagram of an original MPX estimation strategy;

fig. 4 is a schematic diagram of the MPX estimation strategy of the method.

Detailed Description

As shown in fig. 1, this embodiment relates to a method for estimating a multi-hardware event monitoring count value of a central processing unit based on exponential growth, which includes the following steps:

step 1: the PAPI hardware performance acquisition framework master thread initializes the current state and creates a slave thread for monitoring hardware events.

The state initialization includes: initializing PAPI internal global variables, obtaining current operating system information, creating file descriptors for recording hardware event count values, enabling MPX mode, and setting a monitoring time slice length.

Step 2: after the global initialization is completed, the master thread creates an event set (eventset) for storing the events to be monitored, binds the event set to the slave thread, and starts the life cycle of the workflow.

Step 3: the main thread adds hardware events to be monitored to the event set of step 2.

Step 4: the main thread sends a monitoring start signal to the slave thread, and a hardware event count register built in the slave CPU from the start of the slave thread periodically reads a monitoring result according to the time slice length set in the step 1 and writes the monitoring result into a file descriptor corresponding to the current hardware event.

The monitoring result is as follows: the monitored hardware event count value, the program running time length and the current hardware event monitoring time length sequence data.

Step 5: the monitored program is started on the main thread, the slave thread monitors the hardware events to be monitored added in the step 3 in turn according to the monitoring time slices set by initialization, and the step 4 collects the monitoring results and writes the monitoring results into the file descriptors to obtain the hardware event time sequence data.

As shown in fig. 2, the specific steps of the timing alternate monitoring include:

step 5.1: the scheduling system creates a number of queues equal to the number of event registers, each queue corresponding one-to-one to an event register.

Step 5.2: for any hardware event, storing the hardware event into a queue corresponding to all event registers capable of monitoring the event.

Step 5.3: and (5) randomly sequencing all the queues.

Step 5.4: checking all the current queue head events, when the queue head events are repeated, putting the queue head events which are ranked later to the queue tail, and sequencing the later events to push forward by one bit.

Step 5.5: repeat step 5.4 until all queue head events are not repeated. And putting all non-head-of-queue events which are repeated with the current head-of-queue event at the tail of the queue, and pushing the event sequence after the repeated events forward by one bit.

Step 5.6: and putting the current queue head event on a corresponding event register for monitoring.

Step 5.7: and after the current time slice expires, the event on the event register is taken down and put to the tail of the corresponding queue.

Step 5.8: and repeating the steps 5.4 to 5.7 until an end signal of the main thread is received.

Step 6: after the monitored program is finished, the main thread sends a stop signal to the slave thread, the slave thread stops monitoring, and the step 7 is carried out.

Step 7: and (5) the slave thread carries out post-processing estimation on the hardware event time sequence data collected in the step (5), and sends the result to the master thread, and the master thread outputs the result.

The hardware event time sequence data comprises: an actual hardware event cumulative count value c, a cumulative run time r of the monitored program, and a cumulative monitored time e of the monitored hardware event.

The specific steps of the post-processing estimation are as follows:

step 7.1) performing first order difference on all the hardware event time sequence data by the slave thread to obtain a difference value of the hardware event time sequence data, namely: an actual single-time-slice count value C, a single-time-slice run duration R of the monitored program, and a single-time-slice monitored duration E of the monitored hardware event.

Step 7.2) reading time series data of a hardware event from the thread, where an i-th differential count value C of the time series data is defined _i I+1th differential count value C _i+1 The i-th differential run length R _i The i-th differential monitored duration E _i And sequentially calculating the ratio of the (i+1) th count value to the (i) th count valueMultiple of growth of individual time slicesThe number of unmonitored time slices +.>

Step 7.3) repeated estimation: the j-th unmonitored count value between the i-th count value and the i+1-th count valueUntil the count values over all n unmonitored time slices are estimated. And accumulating all the monitoring values and the estimated values to obtain the total count value of the current hardware event.

Step 7.4) repeating the steps 7.2-7.3 until the total count value of all the monitored hardware events is obtained, and ending the processing work after the hardware event monitoring count.

Step 8: stopping the exit of the slave thread, and destroying the data structure memory in the running process by the master thread and stopping the exit.

Step 9: the hardware event monitoring and counting operation ends.

The correctness of the invention is verified by the related benchmark program Rodinia Benchmark Suite on the basis of the embodiment, and meanwhile, compared with the original MPX, the accuracy of the invention is improved to different degrees on different benchmark kernel programs.

The specific development of the embodiment is the secondary packaging development based on PAPI, and libraries and software for performing multi-hardware event monitoring and counting by using MPX technology such as Linux Perf, HPCToolkit, intel Vtune, gooda and the like are also applicable. The specific operation platform of the example is a common rack-mounted Intel X86 server, and the server is provided with a CentOS 7.6 64bit operation system, two Intel Xeon Gold 6248 processors and 192GB main memory. Firstly, a library based on PAPI needs to be created, the intermediate output file descriptor of the PAPI is intercepted, and the recorded data is estimated and processed by the method: with 100ms as a monitoring period, five types of Rodinia Benchmark Suite application of the system are monitored SRAD, BFS, LU, KNN, LMD, wherein the five types of the system comprise five hardware events, namely all, brins, cond, bris, all, bris, cond, dtlm, m, l1lh, l1lm, l2lh and l2lm, ldram, ich, icm, uistall, urstall, inst, and 5% -59% of precision improvement is obtained.

In comparison with the master MPX post-processing strategy in fig. 3, since this embodiment comprehensively considers the interpolation point vicinity monitor values. The extreme estimated value caused when the two continuous monitoring values have a larger difference is avoided, and meanwhile, an exponential growth multiplying power method is introduced to cover a wider change rule, so that the accuracy of MPX estimated value is improved.

Compared with the prior art, the method obtains the data estimation which is closer to the real data distribution through the change trend based on the exponential growth, thereby obtaining the precision improvement.

The foregoing embodiments may be partially modified in numerous ways by those skilled in the art without departing from the principles and spirit of the invention, the scope of which is defined in the claims and not by the foregoing embodiments, and all such implementations are within the scope of the invention.

Claims

1. The method is characterized in that a related data structure and a slave thread are maintained and initialized through a life cycle of a work flow of the master thread, a slave thread control signal is sent and a monitored application is operated, hardware event scheduling, timing alternate monitoring and post-processing estimation are carried out through the slave thread in response to the life cycle signal of the master thread, and the multi-hardware event monitoring count value is obtained through reading a hardware event counting register arranged in a CPU;

the post-processing estimation specifically comprises the following steps:

step 1) performing first-order difference on all the hardware event time sequence data by the slave thread to obtain a difference value of the hardware event time sequence data, namely: the actual single-time slice count value C, the single-time slice running duration R of the monitored program and the single-time slice monitored duration E of the monitored hardware event;

step 2) Slave threadReading time series data of a hardware event, wherein an ith differential count value C of the time series data is defined _i I+1th differential count value C _i+1 The i-th differential run length R _i The i-th differential monitored duration E _i And sequentially calculating the ratio of the (i+1) th count value to the (i) th count valueMultiple of growth of individual time slicesThe number of unmonitored time slices +.>

Step 3) repeating the estimation: the j-th unmonitored count value between the i-th count value and the i+1-th count valueUntil the count values over all n unmonitored time slices are estimated; accumulating all monitoring values and estimated values to obtain the total count value of the current hardware event;

step 4), repeating the steps 2-3 until the total count value of all the monitored hardware events is obtained, and ending the processing work after the hardware event is monitored and counted;

the timing alternate monitoring specifically comprises the following steps:

step 1: the scheduling system creates queues with the same number as the event registers, and each queue corresponds to the event registers one to one;

step 2: for any hardware event, storing the hardware event into a queue corresponding to all event registers capable of monitoring the event;

step 3: randomly sequencing all queues;

step 4: checking all the current queue head events, when the queue head events are repeated, putting the queue head events with the rear ranking to the queue tail, and sequencing the rear events forward by one bit;

step 5: repeating the step 4 until all the queue head events are not repeated, putting all the non-queue head events which are repeated with the current queue head event to the queue tail, and pushing the event sequence after the repeated events forward by one bit;

step 6: the current queue head event is put on a corresponding event register for monitoring;

step 7: after the current time slice expires, taking down the event on the event register and putting the event on the tail of the corresponding queue;

step 8: and repeating the steps 4 to 7 until an end signal of the main thread is received.

2. The method for estimating the monitoring count value based on the exponentially growing multi-hardware event according to claim 1, characterized by comprising the following specific steps:

step 1, initializing the current state of a main thread of a PAPI hardware performance acquisition framework and creating a slave thread for monitoring hardware events;

step 2, creating an event set for storing events to be monitored after global initialization of the master thread is completed, binding the event set to the slave thread, and starting a life cycle of a workflow;

step 3, adding hardware events to be monitored to the event set in the step 2 by the main thread;

step 4, the main thread sends a monitoring start signal to the slave thread, and the slave thread starts a hardware event counting register built in the slave CPU to periodically read a monitoring result according to the time slice length set in the step 1 and write the monitoring result into a file descriptor corresponding to the current hardware event;

step 5, starting the monitored program in the main thread, periodically and alternately monitoring the hardware events to be monitored added in the step 3 by the slave thread according to the monitoring time slices set by initialization, and collecting the monitoring results in the step 4 and writing the monitoring results into a file descriptor to obtain hardware event time sequence data;

step 6, after the monitored program is finished, the main thread sends a stop signal to the auxiliary thread, the auxiliary thread stops monitoring, and the step 7 is carried out;

and 7, carrying out post-processing estimation on the hardware event time sequence data collected in the step 5 by the slave thread, and sending the result to the main thread, and outputting the result by the main thread.

3. The method for estimating a count value based on exponentially growing multi-hardware event monitoring of claim 2, wherein the hardware event time-series data includes: an actual hardware event cumulative count value c, a cumulative run time r of the monitored program, and a cumulative monitored time e of the monitored hardware event.

4. The method for estimating a multi-hardware event monitor count based on exponential growth of claim 3, wherein said state initialization comprises: initializing PAPI internal global variables, obtaining current operating system information, creating file descriptors for recording hardware event count values, enabling MPX mode, and setting a monitoring time slice length.

5. The method for estimating a monitoring count value based on exponentially growing multi-hardware events of claim 3, wherein the monitoring result is: the monitored hardware event count value, the program running time length and the current hardware event monitoring time length sequence data.