CN107153604A - Parallel program performance method for monitoring and analyzing based on PMU - Google Patents

Parallel program performance method for monitoring and analyzing based on PMU Download PDF

Info

Publication number
CN107153604A
CN107153604A CN201710346738.7A CN201710346738A CN107153604A CN 107153604 A CN107153604 A CN 107153604A CN 201710346738 A CN201710346738 A CN 201710346738A CN 107153604 A CN107153604 A CN 107153604A
Authority
CN
China
Prior art keywords
performance
pmu
counter
sampling
program
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710346738.7A
Other languages
Chinese (zh)
Other versions
CN107153604B (en
Inventor
蒋欣欣
瞿秋薏
张记强
张杨
孟庆磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Computer Technology and Applications
Original Assignee
Beijing Institute of Computer Technology and Applications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Computer Technology and Applications filed Critical Beijing Institute of Computer Technology and Applications
Priority to CN201710346738.7A priority Critical patent/CN107153604B/en
Publication of CN107153604A publication Critical patent/CN107153604A/en
Application granted granted Critical
Publication of CN107153604B publication Critical patent/CN107153604B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present invention relates to a kind of parallel program performance method for monitoring and analyzing based on PMU, belong to computer software technical field.The present invention is based on performance event, and microbody system performance event feature caused by target program operation is provided for program developer.It is meanwhile, it is capable to reference to the methods of sampling and technique of compiling, the data characteristics of extraction is corresponding with its position in application code, the problem of helper developer self-examination programming itself.The present invention is not related to any direct information on algorithm in itself, thus any obvious interference is hardly caused to the execution of program in itself.This method provides technical guarantee and application support for the performance monitoring of concurrent program.

Description

Parallel program performance method for monitoring and analyzing based on PMU
Technical field
The present invention relates to computer software technical field, and in particular to a kind of parallel program performance monitoring point based on PMU Analysis method.
Background technology
With the development of VLSI Design technology, it is always conventional microprocessor to improve single core processor performance The target of structure design, people reach all the time by the quantity for improving chip transistor for many years puies forward high performance purpose.However, Transistor increase make it that the power consumption of processor becomes big and frequency also reaches limitation while performance is improved, and semiconductor technology is almost The limit of physics is reached, it is difficult to improve the performance of processor by improving the dominant frequency of processor again.However, as mobile phone is logical News, embedded system, the development of Aero-Space cause, propose new requirement to processor architecture, and increasingly complicated Multiple application fields such as multimedia, scientific algorithm appeal the computer of a more powerful calculating performance.At the same time, parallel The design of program also becomes more and more important.Yet with the difference of hardware configuration, software platform, concurrent program is put down in different There is larger difference in debugging technique, effectiveness of performance when being run on platform etc..In concurrent program programming practice, how reality is obtained The high-performance on border beyond traditional analysis based on algorithm complex, is monitored on-line by actual operation procedure The method of Properties Analysis becomes particularly important.
Currently, instrumentation is generally used to the on-line monitoring of program operation conditions.The technology by The practical operation situation of extra code area procedures of observation is statically or dynamically inserted in program, can be managed with helper developer The perform track of solution program and the interbehavior with system.
Existing Parallel Program Debugging and performance analysis tool are broadly divided into based on PVM parallel tables, based on MPI and put down parallel Platform and cross-platform three kinds.Wherein, external more famous parallel debugging and performance analysis tool have XPVM, Paradyn, XMPI, SCALEA and TotalView etc.;The famous concurrent program of domestic contrast is that visualization tool is run in dawn series ParaVision and DCDB etc..
Although instrumentation technologies can observe the practical operation situation of concurrent program, it is monitored, It is that this method can cause larger interference to the execution of program in itself, cause due to inserting extra code into original program The unstability of monitoring result.In addition, the dependence of existing domestic and international Parallel Program Debugging and performance analysis tool to parallel environment Property is stronger, has certain limitation in the portability of system platform, function expansibility and the robustness face of putting.For example, TotalView adds the window of multithreading in symbolic debugging, it is possible to realize the visualization of array, but array is checked only Individual process can be directed to.Equally, Guide-View is used for the performance for aiding in user to understand OpenMP programs, but lacks automatic performance Bottleneck analysis ability.
The content of the invention
(1) technical problem to be solved
The technical problem to be solved in the present invention is:How to design it is a kind of program operation is not interfered in itself, realize letter Single parallel program performance method for monitoring and analyzing.
(2) technical scheme
In order to solve the above-mentioned technical problem, analysis side is monitored the invention provides a kind of parallel program performance based on PMU Method, comprises the following steps:
The first step:Design performance driver and performance analyser, the performance driving device, which is used to realize, is based on performance count The sampling of device PMU specified process, the performance analyser is used for order and the parameter for parsing user's input, according to the order and Parameter determines PMU parameters, and PMU parameters are packaged into data structure, and PMU parameters are passed into performance in the way of system is called drives Dynamic device, recalls performance driving device and opens PMU;It is additionally operable to when system calls return, the sampling that reading performance driver is preserved Result data;
Second step:Analysis of running performance device, the order of parsing user's input and parameter, are determined according to the order and parameter PMU parameters, data structure is packaged into by PMU parameters, and PMU parameters are passed into performance driving device in the way of system is called, then Invocation performance driver opens PMU;
3rd step:Runnability driver, realizes the sampling of the specified process based on performance counter PMU;
4th step:Performance driving device transmits sampled result data to performance analyser.
Preferably, in the 3rd step, the step of performance driving device realizes the sampling of the specified process based on performance counter PMU Specially:
S31, registration PMU interrupt handling routines, the interrupt handling routine are used for the processing sampling knot in counter overflow Fruit data;
S32, control register according to the PMU parameter configurations performance event to be monitored as specified process, it is and right The sampling period of counter is initialized, and the interval range for setting PMCter is 0~SAV-1, and wherein SAV is the sampling period;
S33, opening counter, run concurrent program, counter is started counting up, and monitoring event often occurs once, counter Value+1;
S34, when counter reaches the sampling period, trigger interrupt handling routine, preserve the Counter Value of counter, as adopting Sample result data;
After S35, interrupt processing are completed, the value of counter is reset ,~0-SAV-1 is reset to, jumping to step S33 makes meter Number, which is thought highly of, newly to be started counting up.
Preferably, in step S33, counter uses the accurate sampling configuration PEBS statistical monitoring events based on event, Realize and count.
Preferably, in the 4th step, performance driving device sampling following manner transmits sampled result data to performance analyser:Ginseng Number transmission or internal memory mapping mode.
(3) beneficial effect
The present invention is based on performance event, and it is special to provide microbody system performance event caused by target program operation for program developer Levy.Meanwhile, it is capable to reference to the methods of sampling and technique of compiling, by the data characteristics of extraction and its position in application code It is corresponding, the problem of helper developer introspects programming itself.The present invention is not related to any on algorithm straight in itself Information is connect, thus any obvious interference is hardly caused to the execution of program in itself.This method is the performance of concurrent program Monitoring provides technical guarantee and application support.
Brief description of the drawings
Fig. 1 is PMU workflow diagrams of the invention;
Fig. 2 is the sample mode workflow diagram based on event of the invention.
Embodiment
To make the purpose of the present invention, content and advantage clearer, with reference to the accompanying drawings and examples, to the present invention's Embodiment is described in further detail.
The present invention is directed to the deficiency and shortcoming of existing code debugging and method for analyzing performance instrument, using concurrent program as research Object, devises a kind of parallel program performance method for monitoring and analyzing based on PMU.
The parallel program performance method for monitoring and analyzing based on PMU that the present invention is provided, by taking Godson 3A processors as an example, to fortune Row being analyzed based on PMU performance monitoring processes on Godson 3A multi-core platforms, is comprised the following steps:
The first step:Design performance driver and performance analyser, the performance driving device, which is used to realize, is based on performance count The sampling of device PMU specified process, the performance analyser is used for order and the parameter for parsing user's input, according to the order and Parameter determines PMU parameters, and PMU parameters are packaged into data structure, and PMU parameters are passed into performance in the way of system is called drives Dynamic device, recalls performance driving device and opens PMU;It is additionally operable to when system calls return, reads, analytical performance driver is preserved Sampled result data, present in the readable form of user, facilitate user's positioning performance focus;
Performance driving device is run in system kernel.Performance driving device can be divided into system correlative code and system unrelated generation Code.Wherein, system correlative code is the code for manipulating PMU, such as:Open and close performance counter, initialization performance count Device, read/write performance counter etc.;System independent code does not manipulate hardware directly, is responsible for transmission user's layer parameter donor system related Code sets performance counter, obtains sample information by system correlative code, then it is delivered into user's space.Performance driving Device is supported to specify the sampling configuration of process.Compared to the sampling of system-wide, the sampling of process is specified only to specifying process to enter Row sampling, its sampled result is more accurate.
PMU is by paired PMCter (Performance Monitoring Counter) and PMCtrl (Performance Monitoring Control) register composition.Wherein, PMCtrl is control register, for configuring the performance to be monitored Event, that is, monitor event (such as performing cycle, instruction number, cache miss rates and analysis misprediction rate), PMCter is then meter Number device, the frequency for recording monitoring event, when monitoring event occurs every time, from increasing 1, when the highest order of counter becomes When 1, counter overflow is represented, counter will trigger an interruption.
Second step:Analysis of running performance device, the order of parsing user's input and parameter, are determined according to the order and parameter PMU parameters, data structure is packaged into by PMU parameters, and PMU parameters are passed into performance driving device in the way of system is called, then Invocation performance driver opens PMU;
3rd step:Runnability driver, realizes the sampling of the specified process based on performance counter PMU;
As shown in figure 1, in the 3rd step, performance driving device realizes the sampling of the specified process based on performance counter PMU Step is specially:
S31, registration PMU interrupt handling routines, register interrupt number, and the interrupt handling routine is used in counter overflow Handle sampled result data;
S32, control register according to the PMU parameter configurations performance event to be monitored as specified process, it is and right The sampling period of counter is initialized (so that when the event that performance counter can be monitored occurs, counter PMCter is pressed Started counting up according to certain sampling period), the interval range for setting PMCter is (0~SAV-1), wherein SAV (SamplingAfter Value) is the sampling period;
S33, opening counter, run concurrent program, counter is started counting up, and monitoring event often occurs once, that is, specify When process (monitoring event) context is cut or cut out, Counter Value+1;
The direct statistical monitoring event of sample mode based on event is used in the present invention, by configuring what is monitored in PMU Particular event, monitoring event information is obtained when concurrent program is run by collecting the value of counter.Adopted when counter reaches In the sample cycle, interrupt handling routine is just triggered, interrupt handling routine is used to collect system status information at that time.Adopting based on event Sample loading mode workflow is as shown in Figure 2.
In step S33, counter realizes meter using the accurate sampling configuration PEBS statistical monitoring events based on event Number.
The accuracy of sampled result is the key of later stage code analysis.Therefore, in order to improve the accurate of performance monitoring results Property, it is " sampling configuration (PEBS) accurately based on event " to set all patterns for monitoring event.The function of PEBS hardware Definition:
When being changed into 1 in the first place of counter register, Counter Value is reset;
Preserve PEBS Buffer of the system mode into internal memory.
Registration PEBS Buffer are completely interrupted, and correspondence interrupt processing function preserves PEBS Buffer contents to specified text Part.
PEBS patterns are different from routine sampling pattern, with advantages below:One, system mode preserves timely, PC value deviations It is at most 1;Two, because each counter overflow all enters under interrupt processing, PEBS patterns no longer as routine sampling pattern Expense can also reduce;Three, after counter register overflows, value can be configured to initial value again, continue to count, only work as PEBS Just triggering interruption when Buffer is full, reduces interruption times, the behavior when unlatching that also reduce further PMU is run to program Interference, it is ensured that the validity of sample information.
S34, reach that sampling period, i.e. counter once overflow when counter, system call interrupt processing function is touched Interrupt handling routine is sent out, the Counter Value of counter is preserved, as sampled result data, interrupt processing function preserves system shape State;
After S35, interrupt processing are completed, the value of counter is reset ,~0-SAV-1 is reset to, jumping to step S33 makes meter Number, which is thought highly of, newly to be started counting up.
4th step:Performance driving device transmits sampled result data to performance analyser.
In 4th step, performance driving device sampling following manner transmits sampled result data to performance analyser:Parameter is transmitted Or internal memory mapping mode.During if necessary to return to the complex informations such as calling figure to performance analyser, the mode mapped using internal memory, File content is mapped on certain block internal memory by internal memory mapping function, can be realized by reading and modification to this block internal memory Reading and modification to being mapped file.The mode of internal memory mapping is applied to the transmission of more complicated sampled result, and simply advises Sampled result then is encapsulated into data structure by the way of parameter transmission.
As can be seen that being based on PMU performance monitor analysis methods using the present invention, program is obtained by way of PMU samples Behavior characteristic information during operation, is subject to after comprehensive analysis to sampled result, and the focus code for influenceing program feature is positioned, So as to carry out program optimization.This method can help program in machine code person according to results of performance analysis, and focus code is modified And optimization, program run time behaviour is got a promotion, so as to improve the runnability of whole system.The present invention, which is realized, simply to be had Effect, has reached the requirement of application.
Described above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, without departing from the technical principles of the invention, some improvement and deformation can also be made, these improve and deformed Also it should be regarded as protection scope of the present invention.

Claims (4)

1. a kind of parallel program performance method for monitoring and analyzing based on PMU, it is characterised in that comprise the following steps:
The first step:Design performance driver and performance analyser, the performance driving device, which is used to realize, is based on performance counter PMU Specified process sampling, the performance analyser be used for parse user input order and parameter, according to the order and parameter PMU parameters are determined, PMU parameters are packaged into data structure, PMU parameters are passed into performance driving in the way of system is called Device, recalls performance driving device and opens PMU;It is additionally operable to when system calls return, the sampling knot that reading performance driver is preserved Fruit data;
Second step:Analysis of running performance device, the order of parsing user's input and parameter, determine that PMU joins according to the order and parameter Number, is packaged into data structure by PMU parameters, PMU parameters is passed into performance driving device in the way of system is called, recalling property Can driver unlatching PMU;
3rd step:Runnability driver, realizes the sampling of the specified process based on performance counter PMU;
4th step:Performance driving device transmits sampled result data to performance analyser.
2. the method as described in claim 1, it is characterised in that in the 3rd step, performance driving device, which is realized, is based on performance counter The step of sampling of PMU specified process is specially:
S31, registration PMU interrupt handling routines, the interrupt handling routine are used to handle sampled result number in counter overflow According to;
S32, control register according to the PMU parameter configurations performance event to be monitored as specified process, and to count The sampling period of device is initialized, and the interval range for setting PMCter is 0~SAV-1, and wherein SAV is the sampling period;
S33, opening counter, run concurrent program, counter is started counting up, and monitoring event often occurs once, Counter Value+1;
S34, when counter reaches the sampling period, trigger interrupt handling routine, preserve the Counter Value of counter, be used as sampling knot Fruit data;
After S35, interrupt processing are completed, the value of counter is reset ,~0-SAV-1 is reset to, jumping to step S33 makes counter Restart to count.
3. method as claimed in claim 2, it is characterised in that in step S33, counter is using accurate adopting based on event Original mold formula PEBS statistical monitoring events, realize and count.
4. the method as described in claim 1 or 2 or 3, it is characterised in that in the 4th step, performance driving device sampling following manner Sampled result data are transmitted to performance analyser:Parameter is transmitted or internal memory mapping mode.
CN201710346738.7A 2017-05-17 2017-05-17 PMU-based parallel program performance monitoring and analyzing method Active CN107153604B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710346738.7A CN107153604B (en) 2017-05-17 2017-05-17 PMU-based parallel program performance monitoring and analyzing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710346738.7A CN107153604B (en) 2017-05-17 2017-05-17 PMU-based parallel program performance monitoring and analyzing method

Publications (2)

Publication Number Publication Date
CN107153604A true CN107153604A (en) 2017-09-12
CN107153604B CN107153604B (en) 2020-02-07

Family

ID=59794084

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710346738.7A Active CN107153604B (en) 2017-05-17 2017-05-17 PMU-based parallel program performance monitoring and analyzing method

Country Status (1)

Country Link
CN (1) CN107153604B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069029A (en) * 2020-09-04 2020-12-11 北京计算机技术及应用研究所 Performance acquisition monitoring system of domestic platform PMU self-adaptation
CN112540899A (en) * 2019-09-20 2021-03-23 无锡江南计算技术研究所 Analysis device based on performance data space-time characteristics

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090007134A1 (en) * 2007-06-26 2009-01-01 International Business Machines Corporation Shared performance monitor in a multiprocessor system
CN105700998A (en) * 2016-01-13 2016-06-22 浪潮(北京)电子信息产业有限公司 Method and device for monitoring and analyzing performance of parallel programs
CN106126384A (en) * 2016-06-12 2016-11-16 华为技术有限公司 A kind of method and device of acquisition performance monitor unit PMU event

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090007134A1 (en) * 2007-06-26 2009-01-01 International Business Machines Corporation Shared performance monitor in a multiprocessor system
CN105700998A (en) * 2016-01-13 2016-06-22 浪潮(北京)电子信息产业有限公司 Method and device for monitoring and analyzing performance of parallel programs
CN106126384A (en) * 2016-06-12 2016-11-16 华为技术有限公司 A kind of method and device of acquisition performance monitor unit PMU event

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112540899A (en) * 2019-09-20 2021-03-23 无锡江南计算技术研究所 Analysis device based on performance data space-time characteristics
CN112540899B (en) * 2019-09-20 2022-10-04 无锡江南计算技术研究所 Analysis device based on performance data space-time characteristics
CN112069029A (en) * 2020-09-04 2020-12-11 北京计算机技术及应用研究所 Performance acquisition monitoring system of domestic platform PMU self-adaptation
CN112069029B (en) * 2020-09-04 2023-11-14 北京计算机技术及应用研究所 Domestic platform PMU self-adaptive performance acquisition monitoring system

Also Published As

Publication number Publication date
CN107153604B (en) 2020-02-07

Similar Documents

Publication Publication Date Title
Capra et al. Is software “green”? Application development environments and energy efficiency in open source applications
US9483376B2 (en) System and methods for precise microprocessor event counting
Arnold et al. QVM: An efficient runtime for detecting defects in deployed systems
WO2021057057A1 (en) Target-code coverage testing method, system, and medium of operating system-level program
Pourghassemi et al. What-if analysis of page load time in web browsers using causal profiling
Chung et al. Aneprof: Energy profiling for android java virtual machine and applications
US8798962B2 (en) Virtualized abstraction with built-in data alignment and simultaneous event monitoring in performance counter based application characterization and tuning
CN104704474A (en) Hardware based run-time instrumentation facility for managed run-times
Yang et al. Computer performance microscopy with shim
Sridharan et al. Using pvf traces to accelerate avf modeling
CN107153604A (en) Parallel program performance method for monitoring and analyzing based on PMU
Zhang Power, Performance Modeling and Optimization for Mobile System and Applications.
US20080010555A1 (en) Method and Apparatus for Measuring the Cost of a Pipeline Event and for Displaying Images Which Permit the Visualization orf Said Cost
Ginny et al. Smartphone processor architecture, operations, and functions: current state-of-the-art and future outlook: energy performance trade-off: Energy–performance trade-off for smartphone processors
Mytkowicz et al. Inferred call path profiling
Ilbeyi et al. Cross-layer workload characterization of meta-tracing JIT VMs
CN105573885A (en) Method and device for monitoring and counting bottom hardware behaviours
Gottschall et al. TEA: Time-Proportional Event Analysis
Hazott et al. DSA monitoring framework for HW/SW partitioning of application kernels leveraging VPs
Sartor et al. Androprof: A profiling tool for the android platform
Motakis et al. Introduction on performance analysis and profiling methodologies for KVM on ARM virtualization
Su et al. Reconfigurable vertical profiling framework for the android runtime system
Tong et al. Profiling CAD tools: A proposed classification
Gottschall Time-Proportional Performance Analysis for Out-of-Order Processors
Ahmed Relyzer+: An open source tool for application-level soft error resiliency analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant