CN107153604A - Parallel program performance method for monitoring and analyzing based on PMU - Google Patents
Parallel program performance method for monitoring and analyzing based on PMU Download PDFInfo
- Publication number
- CN107153604A CN107153604A CN201710346738.7A CN201710346738A CN107153604A CN 107153604 A CN107153604 A CN 107153604A CN 201710346738 A CN201710346738 A CN 201710346738A CN 107153604 A CN107153604 A CN 107153604A
- Authority
- CN
- China
- Prior art keywords
- performance
- pmu
- counter
- sampling
- program
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
- G06F11/3476—Data logging
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The present invention relates to a kind of parallel program performance method for monitoring and analyzing based on PMU, belong to computer software technical field.The present invention is based on performance event, and microbody system performance event feature caused by target program operation is provided for program developer.It is meanwhile, it is capable to reference to the methods of sampling and technique of compiling, the data characteristics of extraction is corresponding with its position in application code, the problem of helper developer self-examination programming itself.The present invention is not related to any direct information on algorithm in itself, thus any obvious interference is hardly caused to the execution of program in itself.This method provides technical guarantee and application support for the performance monitoring of concurrent program.
Description
Technical field
The present invention relates to computer software technical field, and in particular to a kind of parallel program performance monitoring point based on PMU
Analysis method.
Background technology
With the development of VLSI Design technology, it is always conventional microprocessor to improve single core processor performance
The target of structure design, people reach all the time by the quantity for improving chip transistor for many years puies forward high performance purpose.However,
Transistor increase make it that the power consumption of processor becomes big and frequency also reaches limitation while performance is improved, and semiconductor technology is almost
The limit of physics is reached, it is difficult to improve the performance of processor by improving the dominant frequency of processor again.However, as mobile phone is logical
News, embedded system, the development of Aero-Space cause, propose new requirement to processor architecture, and increasingly complicated
Multiple application fields such as multimedia, scientific algorithm appeal the computer of a more powerful calculating performance.At the same time, parallel
The design of program also becomes more and more important.Yet with the difference of hardware configuration, software platform, concurrent program is put down in different
There is larger difference in debugging technique, effectiveness of performance when being run on platform etc..In concurrent program programming practice, how reality is obtained
The high-performance on border beyond traditional analysis based on algorithm complex, is monitored on-line by actual operation procedure
The method of Properties Analysis becomes particularly important.
Currently, instrumentation is generally used to the on-line monitoring of program operation conditions.The technology by
The practical operation situation of extra code area procedures of observation is statically or dynamically inserted in program, can be managed with helper developer
The perform track of solution program and the interbehavior with system.
Existing Parallel Program Debugging and performance analysis tool are broadly divided into based on PVM parallel tables, based on MPI and put down parallel
Platform and cross-platform three kinds.Wherein, external more famous parallel debugging and performance analysis tool have XPVM, Paradyn, XMPI,
SCALEA and TotalView etc.;The famous concurrent program of domestic contrast is that visualization tool is run in dawn series
ParaVision and DCDB etc..
Although instrumentation technologies can observe the practical operation situation of concurrent program, it is monitored,
It is that this method can cause larger interference to the execution of program in itself, cause due to inserting extra code into original program
The unstability of monitoring result.In addition, the dependence of existing domestic and international Parallel Program Debugging and performance analysis tool to parallel environment
Property is stronger, has certain limitation in the portability of system platform, function expansibility and the robustness face of putting.For example,
TotalView adds the window of multithreading in symbolic debugging, it is possible to realize the visualization of array, but array is checked only
Individual process can be directed to.Equally, Guide-View is used for the performance for aiding in user to understand OpenMP programs, but lacks automatic performance
Bottleneck analysis ability.
The content of the invention
(1) technical problem to be solved
The technical problem to be solved in the present invention is:How to design it is a kind of program operation is not interfered in itself, realize letter
Single parallel program performance method for monitoring and analyzing.
(2) technical scheme
In order to solve the above-mentioned technical problem, analysis side is monitored the invention provides a kind of parallel program performance based on PMU
Method, comprises the following steps:
The first step:Design performance driver and performance analyser, the performance driving device, which is used to realize, is based on performance count
The sampling of device PMU specified process, the performance analyser is used for order and the parameter for parsing user's input, according to the order and
Parameter determines PMU parameters, and PMU parameters are packaged into data structure, and PMU parameters are passed into performance in the way of system is called drives
Dynamic device, recalls performance driving device and opens PMU;It is additionally operable to when system calls return, the sampling that reading performance driver is preserved
Result data;
Second step:Analysis of running performance device, the order of parsing user's input and parameter, are determined according to the order and parameter
PMU parameters, data structure is packaged into by PMU parameters, and PMU parameters are passed into performance driving device in the way of system is called, then
Invocation performance driver opens PMU;
3rd step:Runnability driver, realizes the sampling of the specified process based on performance counter PMU;
4th step:Performance driving device transmits sampled result data to performance analyser.
Preferably, in the 3rd step, the step of performance driving device realizes the sampling of the specified process based on performance counter PMU
Specially:
S31, registration PMU interrupt handling routines, the interrupt handling routine are used for the processing sampling knot in counter overflow
Fruit data;
S32, control register according to the PMU parameter configurations performance event to be monitored as specified process, it is and right
The sampling period of counter is initialized, and the interval range for setting PMCter is 0~SAV-1, and wherein SAV is the sampling period;
S33, opening counter, run concurrent program, counter is started counting up, and monitoring event often occurs once, counter
Value+1;
S34, when counter reaches the sampling period, trigger interrupt handling routine, preserve the Counter Value of counter, as adopting
Sample result data;
After S35, interrupt processing are completed, the value of counter is reset ,~0-SAV-1 is reset to, jumping to step S33 makes meter
Number, which is thought highly of, newly to be started counting up.
Preferably, in step S33, counter uses the accurate sampling configuration PEBS statistical monitoring events based on event,
Realize and count.
Preferably, in the 4th step, performance driving device sampling following manner transmits sampled result data to performance analyser:Ginseng
Number transmission or internal memory mapping mode.
(3) beneficial effect
The present invention is based on performance event, and it is special to provide microbody system performance event caused by target program operation for program developer
Levy.Meanwhile, it is capable to reference to the methods of sampling and technique of compiling, by the data characteristics of extraction and its position in application code
It is corresponding, the problem of helper developer introspects programming itself.The present invention is not related to any on algorithm straight in itself
Information is connect, thus any obvious interference is hardly caused to the execution of program in itself.This method is the performance of concurrent program
Monitoring provides technical guarantee and application support.
Brief description of the drawings
Fig. 1 is PMU workflow diagrams of the invention;
Fig. 2 is the sample mode workflow diagram based on event of the invention.
Embodiment
To make the purpose of the present invention, content and advantage clearer, with reference to the accompanying drawings and examples, to the present invention's
Embodiment is described in further detail.
The present invention is directed to the deficiency and shortcoming of existing code debugging and method for analyzing performance instrument, using concurrent program as research
Object, devises a kind of parallel program performance method for monitoring and analyzing based on PMU.
The parallel program performance method for monitoring and analyzing based on PMU that the present invention is provided, by taking Godson 3A processors as an example, to fortune
Row being analyzed based on PMU performance monitoring processes on Godson 3A multi-core platforms, is comprised the following steps:
The first step:Design performance driver and performance analyser, the performance driving device, which is used to realize, is based on performance count
The sampling of device PMU specified process, the performance analyser is used for order and the parameter for parsing user's input, according to the order and
Parameter determines PMU parameters, and PMU parameters are packaged into data structure, and PMU parameters are passed into performance in the way of system is called drives
Dynamic device, recalls performance driving device and opens PMU;It is additionally operable to when system calls return, reads, analytical performance driver is preserved
Sampled result data, present in the readable form of user, facilitate user's positioning performance focus;
Performance driving device is run in system kernel.Performance driving device can be divided into system correlative code and system unrelated generation
Code.Wherein, system correlative code is the code for manipulating PMU, such as:Open and close performance counter, initialization performance count
Device, read/write performance counter etc.;System independent code does not manipulate hardware directly, is responsible for transmission user's layer parameter donor system related
Code sets performance counter, obtains sample information by system correlative code, then it is delivered into user's space.Performance driving
Device is supported to specify the sampling configuration of process.Compared to the sampling of system-wide, the sampling of process is specified only to specifying process to enter
Row sampling, its sampled result is more accurate.
PMU is by paired PMCter (Performance Monitoring Counter) and PMCtrl (Performance
Monitoring Control) register composition.Wherein, PMCtrl is control register, for configuring the performance to be monitored
Event, that is, monitor event (such as performing cycle, instruction number, cache miss rates and analysis misprediction rate), PMCter is then meter
Number device, the frequency for recording monitoring event, when monitoring event occurs every time, from increasing 1, when the highest order of counter becomes
When 1, counter overflow is represented, counter will trigger an interruption.
Second step:Analysis of running performance device, the order of parsing user's input and parameter, are determined according to the order and parameter
PMU parameters, data structure is packaged into by PMU parameters, and PMU parameters are passed into performance driving device in the way of system is called, then
Invocation performance driver opens PMU;
3rd step:Runnability driver, realizes the sampling of the specified process based on performance counter PMU;
As shown in figure 1, in the 3rd step, performance driving device realizes the sampling of the specified process based on performance counter PMU
Step is specially:
S31, registration PMU interrupt handling routines, register interrupt number, and the interrupt handling routine is used in counter overflow
Handle sampled result data;
S32, control register according to the PMU parameter configurations performance event to be monitored as specified process, it is and right
The sampling period of counter is initialized (so that when the event that performance counter can be monitored occurs, counter PMCter is pressed
Started counting up according to certain sampling period), the interval range for setting PMCter is (0~SAV-1), wherein SAV
(SamplingAfter Value) is the sampling period;
S33, opening counter, run concurrent program, counter is started counting up, and monitoring event often occurs once, that is, specify
When process (monitoring event) context is cut or cut out, Counter Value+1;
The direct statistical monitoring event of sample mode based on event is used in the present invention, by configuring what is monitored in PMU
Particular event, monitoring event information is obtained when concurrent program is run by collecting the value of counter.Adopted when counter reaches
In the sample cycle, interrupt handling routine is just triggered, interrupt handling routine is used to collect system status information at that time.Adopting based on event
Sample loading mode workflow is as shown in Figure 2.
In step S33, counter realizes meter using the accurate sampling configuration PEBS statistical monitoring events based on event
Number.
The accuracy of sampled result is the key of later stage code analysis.Therefore, in order to improve the accurate of performance monitoring results
Property, it is " sampling configuration (PEBS) accurately based on event " to set all patterns for monitoring event.The function of PEBS hardware
Definition:
When being changed into 1 in the first place of counter register, Counter Value is reset;
Preserve PEBS Buffer of the system mode into internal memory.
Registration PEBS Buffer are completely interrupted, and correspondence interrupt processing function preserves PEBS Buffer contents to specified text
Part.
PEBS patterns are different from routine sampling pattern, with advantages below:One, system mode preserves timely, PC value deviations
It is at most 1;Two, because each counter overflow all enters under interrupt processing, PEBS patterns no longer as routine sampling pattern
Expense can also reduce;Three, after counter register overflows, value can be configured to initial value again, continue to count, only work as PEBS
Just triggering interruption when Buffer is full, reduces interruption times, the behavior when unlatching that also reduce further PMU is run to program
Interference, it is ensured that the validity of sample information.
S34, reach that sampling period, i.e. counter once overflow when counter, system call interrupt processing function is touched
Interrupt handling routine is sent out, the Counter Value of counter is preserved, as sampled result data, interrupt processing function preserves system shape
State;
After S35, interrupt processing are completed, the value of counter is reset ,~0-SAV-1 is reset to, jumping to step S33 makes meter
Number, which is thought highly of, newly to be started counting up.
4th step:Performance driving device transmits sampled result data to performance analyser.
In 4th step, performance driving device sampling following manner transmits sampled result data to performance analyser:Parameter is transmitted
Or internal memory mapping mode.During if necessary to return to the complex informations such as calling figure to performance analyser, the mode mapped using internal memory,
File content is mapped on certain block internal memory by internal memory mapping function, can be realized by reading and modification to this block internal memory
Reading and modification to being mapped file.The mode of internal memory mapping is applied to the transmission of more complicated sampled result, and simply advises
Sampled result then is encapsulated into data structure by the way of parameter transmission.
As can be seen that being based on PMU performance monitor analysis methods using the present invention, program is obtained by way of PMU samples
Behavior characteristic information during operation, is subject to after comprehensive analysis to sampled result, and the focus code for influenceing program feature is positioned,
So as to carry out program optimization.This method can help program in machine code person according to results of performance analysis, and focus code is modified
And optimization, program run time behaviour is got a promotion, so as to improve the runnability of whole system.The present invention, which is realized, simply to be had
Effect, has reached the requirement of application.
Described above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, without departing from the technical principles of the invention, some improvement and deformation can also be made, these improve and deformed
Also it should be regarded as protection scope of the present invention.
Claims (4)
1. a kind of parallel program performance method for monitoring and analyzing based on PMU, it is characterised in that comprise the following steps:
The first step:Design performance driver and performance analyser, the performance driving device, which is used to realize, is based on performance counter PMU
Specified process sampling, the performance analyser be used for parse user input order and parameter, according to the order and parameter
PMU parameters are determined, PMU parameters are packaged into data structure, PMU parameters are passed into performance driving in the way of system is called
Device, recalls performance driving device and opens PMU;It is additionally operable to when system calls return, the sampling knot that reading performance driver is preserved
Fruit data;
Second step:Analysis of running performance device, the order of parsing user's input and parameter, determine that PMU joins according to the order and parameter
Number, is packaged into data structure by PMU parameters, PMU parameters is passed into performance driving device in the way of system is called, recalling property
Can driver unlatching PMU;
3rd step:Runnability driver, realizes the sampling of the specified process based on performance counter PMU;
4th step:Performance driving device transmits sampled result data to performance analyser.
2. the method as described in claim 1, it is characterised in that in the 3rd step, performance driving device, which is realized, is based on performance counter
The step of sampling of PMU specified process is specially:
S31, registration PMU interrupt handling routines, the interrupt handling routine are used to handle sampled result number in counter overflow
According to;
S32, control register according to the PMU parameter configurations performance event to be monitored as specified process, and to count
The sampling period of device is initialized, and the interval range for setting PMCter is 0~SAV-1, and wherein SAV is the sampling period;
S33, opening counter, run concurrent program, counter is started counting up, and monitoring event often occurs once, Counter Value+1;
S34, when counter reaches the sampling period, trigger interrupt handling routine, preserve the Counter Value of counter, be used as sampling knot
Fruit data;
After S35, interrupt processing are completed, the value of counter is reset ,~0-SAV-1 is reset to, jumping to step S33 makes counter
Restart to count.
3. method as claimed in claim 2, it is characterised in that in step S33, counter is using accurate adopting based on event
Original mold formula PEBS statistical monitoring events, realize and count.
4. the method as described in claim 1 or 2 or 3, it is characterised in that in the 4th step, performance driving device sampling following manner
Sampled result data are transmitted to performance analyser:Parameter is transmitted or internal memory mapping mode.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710346738.7A CN107153604B (en) | 2017-05-17 | 2017-05-17 | PMU-based parallel program performance monitoring and analyzing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710346738.7A CN107153604B (en) | 2017-05-17 | 2017-05-17 | PMU-based parallel program performance monitoring and analyzing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107153604A true CN107153604A (en) | 2017-09-12 |
CN107153604B CN107153604B (en) | 2020-02-07 |
Family
ID=59794084
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710346738.7A Active CN107153604B (en) | 2017-05-17 | 2017-05-17 | PMU-based parallel program performance monitoring and analyzing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107153604B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112069029A (en) * | 2020-09-04 | 2020-12-11 | 北京计算机技术及应用研究所 | Performance acquisition monitoring system of domestic platform PMU self-adaptation |
CN112540899A (en) * | 2019-09-20 | 2021-03-23 | 无锡江南计算技术研究所 | Analysis device based on performance data space-time characteristics |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090007134A1 (en) * | 2007-06-26 | 2009-01-01 | International Business Machines Corporation | Shared performance monitor in a multiprocessor system |
CN105700998A (en) * | 2016-01-13 | 2016-06-22 | 浪潮(北京)电子信息产业有限公司 | Method and device for monitoring and analyzing performance of parallel programs |
CN106126384A (en) * | 2016-06-12 | 2016-11-16 | 华为技术有限公司 | A kind of method and device of acquisition performance monitor unit PMU event |
-
2017
- 2017-05-17 CN CN201710346738.7A patent/CN107153604B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090007134A1 (en) * | 2007-06-26 | 2009-01-01 | International Business Machines Corporation | Shared performance monitor in a multiprocessor system |
CN105700998A (en) * | 2016-01-13 | 2016-06-22 | 浪潮(北京)电子信息产业有限公司 | Method and device for monitoring and analyzing performance of parallel programs |
CN106126384A (en) * | 2016-06-12 | 2016-11-16 | 华为技术有限公司 | A kind of method and device of acquisition performance monitor unit PMU event |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112540899A (en) * | 2019-09-20 | 2021-03-23 | 无锡江南计算技术研究所 | Analysis device based on performance data space-time characteristics |
CN112540899B (en) * | 2019-09-20 | 2022-10-04 | 无锡江南计算技术研究所 | Analysis device based on performance data space-time characteristics |
CN112069029A (en) * | 2020-09-04 | 2020-12-11 | 北京计算机技术及应用研究所 | Performance acquisition monitoring system of domestic platform PMU self-adaptation |
CN112069029B (en) * | 2020-09-04 | 2023-11-14 | 北京计算机技术及应用研究所 | Domestic platform PMU self-adaptive performance acquisition monitoring system |
Also Published As
Publication number | Publication date |
---|---|
CN107153604B (en) | 2020-02-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Capra et al. | Is software “green”? Application development environments and energy efficiency in open source applications | |
US9483376B2 (en) | System and methods for precise microprocessor event counting | |
Arnold et al. | QVM: An efficient runtime for detecting defects in deployed systems | |
WO2021057057A1 (en) | Target-code coverage testing method, system, and medium of operating system-level program | |
Pourghassemi et al. | What-if analysis of page load time in web browsers using causal profiling | |
Chung et al. | Aneprof: Energy profiling for android java virtual machine and applications | |
US8798962B2 (en) | Virtualized abstraction with built-in data alignment and simultaneous event monitoring in performance counter based application characterization and tuning | |
CN104704474A (en) | Hardware based run-time instrumentation facility for managed run-times | |
Yang et al. | Computer performance microscopy with shim | |
Sridharan et al. | Using pvf traces to accelerate avf modeling | |
CN107153604A (en) | Parallel program performance method for monitoring and analyzing based on PMU | |
Zhang | Power, Performance Modeling and Optimization for Mobile System and Applications. | |
US20080010555A1 (en) | Method and Apparatus for Measuring the Cost of a Pipeline Event and for Displaying Images Which Permit the Visualization orf Said Cost | |
Ginny et al. | Smartphone processor architecture, operations, and functions: current state-of-the-art and future outlook: energy performance trade-off: Energy–performance trade-off for smartphone processors | |
Mytkowicz et al. | Inferred call path profiling | |
Ilbeyi et al. | Cross-layer workload characterization of meta-tracing JIT VMs | |
CN105573885A (en) | Method and device for monitoring and counting bottom hardware behaviours | |
Gottschall et al. | TEA: Time-Proportional Event Analysis | |
Hazott et al. | DSA monitoring framework for HW/SW partitioning of application kernels leveraging VPs | |
Sartor et al. | Androprof: A profiling tool for the android platform | |
Motakis et al. | Introduction on performance analysis and profiling methodologies for KVM on ARM virtualization | |
Su et al. | Reconfigurable vertical profiling framework for the android runtime system | |
Tong et al. | Profiling CAD tools: A proposed classification | |
Gottschall | Time-Proportional Performance Analysis for Out-of-Order Processors | |
Ahmed | Relyzer+: An open source tool for application-level soft error resiliency analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |