CN105426296B - Internuclear collaboration multithreading PMU event monitoring methods based on inserting label - Google Patents

Internuclear collaboration multithreading PMU event monitoring methods based on inserting label Download PDF

Info

Publication number
CN105426296B
CN105426296B CN201510826916.7A CN201510826916A CN105426296B CN 105426296 B CN105426296 B CN 105426296B CN 201510826916 A CN201510826916 A CN 201510826916A CN 105426296 B CN105426296 B CN 105426296B
Authority
CN
China
Prior art keywords
core
label
pmu
performance
event
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510826916.7A
Other languages
Chinese (zh)
Other versions
CN105426296A (en
Inventor
刘勇
彭超
陈华蓉
王敬宇
冯赟龙
王雯霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201510826916.7A priority Critical patent/CN105426296B/en
Publication of CN105426296A publication Critical patent/CN105426296A/en
Application granted granted Critical
Publication of CN105426296B publication Critical patent/CN105426296B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring

Abstract

The invention provides a kind of internuclear collaboration multithreading PMU event monitoring methods based on inserting label, for isomery many-core processor.Isomery many-core processor includes being used to perform the arithmetic core for calculating operation and the operation control core for performing control and service operations.Wherein, operation control core sets the thread performance event of concern run on each arithmetic core;Initialize the PMU of the thread run on each arithmetic core;Label is inserted on the thread run on each arithmetic core;The data that inserting label on the thread that operation control core is run on each arithmetic core is pellucidly collected from the background returns in real time;The data that finishing analysis return in operation control core set are with execution performance monitoring record, so as to form the performance monitoring of unified full processor.

Description

Internuclear collaboration multithreading PMU event monitoring methods based on inserting label
Technical field
The present invention relates to field of computer technology, it is more particularly related to a kind of based on the internuclear of inserting label Cooperate with multithreading PMU event monitoring methods.
Background technology
The designer of hardware architecture implants many hardware performances for their structure design of evaluation and test in processing Counter, this way come the Performance Influence Factor of analysis program using hardware supported also to provide possibility.With processor The continuous lifting updated with performance of design, it is single that modern processors are mostly integrated with a kind of special hardware performance monitoring Member, i.e. PMU (performance monitoring unit, are referred to as " performance monitoring unit " or " hardware performance counter "), come Performance event in collecting and treating apparatus.
For example, being missed the target event in the event of once command Cache (cache), then PMU corresponding registers are by adding One records this event.By PMU monitoring function, the actual performance event occurred in processor can be disclosed, by right The statistics and analysis of these performance events, it is hard that programmer is recognized that what kind of bottom is different program coding modes can produce Part behavior.Meanwhile according to these behaviors, it is performance that what hardware event have impact on program that can further analyze, so as to refer to Lead programmer and carry out the program update of algorithm aspect, and prompt compiler to carry out code optimization, and then help operating system real Existing more efficient resource management.
During actual development, many optimisation strategies of system have all used PMU Monitoring Datas.By PMU performance Monitoring function, system can provide the user information when being run than more comprehensive bottom.In terms of effect, due to PMU monitoring Data can truly reflect actual motion effect of the program in particular hardware platform, therefore procedural using PMU mechanism optimization There can be many natural advantages.
As many-core processor is increasingly becoming the capital equipment of high-performance calculation, the hardware of many-core processor how has been given play to Potentiality, performance monitoring technique play more and more important effect.By taking Godson 3A platforms as an example, prior art utility counter The performance analysis tool Tprofiler of a one process sampling is realized, its realization is divided into two modules:Front-end and back-end.Its Middle front end runs on client layer, is responsible for the performance information of analysis backend collection, tutorial program person's Optimized code;In rear end is run on Stratum nucleare, is responsible for control performance counter, caused hardware event information in collection procedure running.
However, the above-mentioned existing synergisticing performance monitoring technology on many-core processor, there is problems with:Firstly, because Each monitoring, data distribution, complicated communication is needed during collaboration;Secondly, there are certain expense, a large amount of threads in each monitoring point Monitor simultaneously and certain expense be present.
Specifically, it is however generally that, traditional performance monitoring for monokaryon, polycaryon processor is often each processor core Independent performance monitoring voluntarily is carried out using the PMU of each processor core, this can only just reflect that the performance of some processor core uses Efficiency, the performance behaviour in service of whole processor can not be reflected, in order to the whole processor of concentrated expression performance, it is necessary to certain These Monitoring Datas are carried out information fusion by data interaction mechanism on piece, are formed overall unified performance monitoring effect, are reached true The real overall performance service efficiency for effectively reflecting processor.
The content of the invention
The technical problems to be solved by the invention are for drawbacks described above in the prior art be present, for a kind of isomery many-core A kind of processor, there is provided multithreading synergisticing performance monitoring technology with low overhead and lightweight.
In order to realize above-mentioned technical purpose, according to the present invention, there is provided a kind of internuclear collaboration based on inserting label is multi-thread Journey PMU event monitoring methods, the internuclear collaboration multithreading PMU event monitorings method are used for isomery many-core processor, and isomery is many Core processor includes being used to perform the arithmetic core for calculating operation and the operation control core for performing control and service operations.
The internuclear collaboration multithreading PMU event monitoring methods include:Operation control core is set on each arithmetic core The thread performance event of concern of operation;Initialize the PMU of the thread run on each arithmetic core;In each arithmetic core Label is inserted on the thread of upper operation;On the thread that operation control core is run on each arithmetic core is pellucidly collected from the background The data that return in real time of inserting label;The data that finishing analysis return in operation control core set are monitored with execution performance.
Preferably, the internuclear collaboration multithreading PMU event monitorings method based on inserting label also includes:According to data Analysis result forming properties counting event records.
Preferably, the label is inserted into the precalculated position of each thread.
Preferably, the label of the inserting is used for the configuration information for registering performance count event.
Preferably, the label of the inserting is additionally operable to perceive the perform track of arithmetic core program.
The invention provides a kind of internuclear collaboration multithreading PMU event monitoring methods based on inserting label, it can be On a kind of isomery many-core processor, using PMU Monitoring Datas, program feature prison that is accurate and efficiently realizing whole chip Survey.
Brief description of the drawings
With reference to accompanying drawing, and by reference to following detailed description, it will more easily have more complete understanding to the present invention And be more easily understood its with the advantages of and feature, wherein:
Fig. 1 schematically shows the internuclear collaboration multithreading according to the preferred embodiment of the invention based on inserting label The schematic diagram of PMU event monitoring methods.
Fig. 2 schematically shows the internuclear collaboration multithreading according to the preferred embodiment of the invention based on inserting label The flow chart of PMU event monitoring methods.
It should be noted that accompanying drawing is used to illustrate the present invention, it is not intended to limit the present invention.Pay attention to, represent that the accompanying drawing of structure can It can be not necessarily drawn to scale.Also, in accompanying drawing, same or similar element indicates same or similar label.
Embodiment
In order that present disclosure is more clear and understandable, with reference to specific embodiments and the drawings in the present invention Appearance is described in detail.
Isomery many-core processor refers generally to the processor core with two kinds of difference in functionalitys, a kind of processor core on chip be present It is absorbed in and calculates, logical design is simple, and quantity is more, and calculating peaking capacity is strong, is generally used for accelerating intensive calculations, referred to as computing Core;A kind of processor core is absorbed in control and service, and logical design is complicated, negligible amounts, is generally used for realizing various functions Control and service operations, referred to as operation control core.
Internuclear collaboration multithreading PMU event monitorings technology based on inserting label perceives mechanism using control core timing, Inserting label based on arithmetic core program, service is monitored using the thread-level PMU in operation control core, in operation core scheming While calculation, record of the control core to arithmetic core PMU events is realized.On the one hand the technology realizes accurately can The arithmetic core PMU event monitorings of user program are corresponded to, inserting mechanism is on the other hand also reduced and arithmetic core program is held Capable interference.
As shown in figure 1, the insertion method based on compiler, arithmetic core journey of the performance monitoring functional module in multiple tasks Specified location insertion positioning label, for registering the configuration information of performance count event, and perceives arithmetic core program in sequence Perform track.Meanwhile lightweight monitoring scan service is established in the relatively light operation control core of processor active task, reasonable Positioning label in the period of configuration in " touch " arithmetic core program.According to the information of positioning label record, control core On monitoring scan service carry out multiple tasks arithmetic core performance counter configuration and read-write, and by Monitoring Data at Reason service carries out statistical disposition to the performance counter values got, ultimately forms accurate performance count logout.
Fig. 2 schematically shows the internuclear collaboration multithreading according to the preferred embodiment of the invention based on inserting label The flow chart of PMU event monitoring methods.This method is used to include calculating the arithmetic core of operation and for performing control for execution The isomery many-core processor of system and the operation control core of service operations.
As shown in Fig. 2 the internuclear collaboration multithreading PMU events prison according to the preferred embodiment of the invention based on inserting label Survey method includes:
First step S1:Operation control core sets the thread performance event of concern run on each arithmetic core;
Second step S2:Initialize the PMU of the thread run on each arithmetic core;
Third step S3:Label is inserted on the thread run on each arithmetic core;Preferably, the label is populated In the precalculated position of each thread;Preferably, the label of the inserting is used for the configuration information for registering performance count event, and feels Know the perform track of arithmetic core program.
Four steps S4:Inserting on the thread that operation control core is run on each arithmetic core is pellucidly collected from the background The data that dress label returns in real time;For example, the data of the return include but is not limited to:The configuration information of performance count event The perform track of (including information relevant with PMU etc.) and arithmetic core program.
5th step S5:The data that finishing analysis return in operation control core set are monitored with execution performance, and according to Data results forming properties counting event records;Thus, the performance monitoring of unified full processor is formd.
In summary, the internuclear collaboration multithreading PMU event monitoring devices of the invention based on inserting label, using transporting Calculate and label is inserted in core, the method that control core establishes PMU monitoring services, realize the flexible control of performance monitoring, lifted The monitoring efficiencies of concurrent program PMU events.
The present invention inserts label on arithmetic core, establishes thread-level PMU monitoring services, and take principal and subordinate's collaboration Performance data monitoring scheme.The internuclear collaboration multithreading PMU event monitoring devices based on inserting label of the present invention, utilize the free time Control core realize synergisticing performance monitoring function to whole chip, data communication overhead is small, the application to arithmetic core The influence of performance is substantially negligible.
Furthermore, it is necessary to explanation, unless stated otherwise or is pointed out, the otherwise term in specification " first ", " the Two ", the description such as " 3rd " is used only for distinguishing each component in specification, element, step etc., each without being intended to indicate that Logical relation or ordinal relation between component, element, step etc..
It is understood that although the present invention is disclosed as above with preferred embodiment, but above-described embodiment and it is not used to Limit the present invention.For any those skilled in the art, without departing from the scope of the technical proposal of the invention, Many possible changes and modifications are all made to technical solution of the present invention using the technology contents of the disclosure above, or are revised as With the equivalent embodiment of change.Therefore, every content without departing from technical solution of the present invention, the technical spirit pair according to the present invention Any simple modifications, equivalents, and modifications made for any of the above embodiments, still fall within the scope of technical solution of the present invention protection It is interior.

Claims (1)

1. a kind of internuclear collaboration multithreading PMU event monitoring methods based on inserting label, the internuclear collaboration multithreading PMU things Part monitoring method is used for the program feature monitoring for realizing whole chip using PMU Monitoring Datas on isomery many-core processor, different Structure many-core processor includes being used to perform the arithmetic core for calculating operation and the operation control for performing control and service operations Core;It is characterized in that the internuclear collaboration multithreading PMU event monitoring methods include:
The specified location insertion positioning label in the arithmetic core program of multiple tasks, for registering the configuration of performance count event Information, and perceive the perform track of arithmetic core program;Arithmetic core is touched within a predetermined period of time in operation control core Positioning label in program, according to the information of positioning label record, the monitoring scan service on arithmetic core is controlled to carry out multiple The configuration and read-write of the arithmetic core performance counter of task, and the performance count serviced to getting is handled by Monitoring Data Device value carries out statistical disposition, forming properties counting event record;
Operation control core sets the thread performance event of concern run on each arithmetic core;
Initialize the PMU of the thread run on each arithmetic core;
Label is inserted on the thread run on each arithmetic core, wherein the label is inserted into the pre-determined bit of each thread Put, and the label of the inserting is used for the execution for the configuration information and perception arithmetic core program for registering performance count event Track;
Inserting label on the thread that operation control core is run on each arithmetic core is pellucidly collected from the background returns in real time The data returned;
The data that finishing analysis return in operation control core set are monitored with execution performance;
Recorded according to data results forming properties counting event.
CN201510826916.7A 2015-11-24 2015-11-24 Internuclear collaboration multithreading PMU event monitoring methods based on inserting label Active CN105426296B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510826916.7A CN105426296B (en) 2015-11-24 2015-11-24 Internuclear collaboration multithreading PMU event monitoring methods based on inserting label

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510826916.7A CN105426296B (en) 2015-11-24 2015-11-24 Internuclear collaboration multithreading PMU event monitoring methods based on inserting label

Publications (2)

Publication Number Publication Date
CN105426296A CN105426296A (en) 2016-03-23
CN105426296B true CN105426296B (en) 2018-04-10

Family

ID=55504514

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510826916.7A Active CN105426296B (en) 2015-11-24 2015-11-24 Internuclear collaboration multithreading PMU event monitoring methods based on inserting label

Country Status (1)

Country Link
CN (1) CN105426296B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112445547A (en) * 2019-09-02 2021-03-05 无锡江南计算技术研究所 Low-disturbance performance data acquisition method for heterogeneous many-core processor

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012133682A (en) * 2010-12-22 2012-07-12 Nec Corp Computer, core allocation method and program
CN103226487A (en) * 2013-04-25 2013-07-31 中国人民解放军信息工程大学 Data distribution and local optimization method for heterogeneous many-core architecture multi-level storage structure
CN104303156A (en) * 2012-05-14 2015-01-21 高通股份有限公司 Monitoring behavioral features in mobile multiprocessor platform

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2012127580A (en) * 2012-07-02 2014-01-10 ЭлЭсАй Корпорейшн A MULTI-STAGE PLANNING APPROACH AT THE LEVEL OF SOURCE CODES FOR THE DEVELOPMENT AND TESTING OF SOFTWARE FOR MULTIProcessor environments

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012133682A (en) * 2010-12-22 2012-07-12 Nec Corp Computer, core allocation method and program
CN104303156A (en) * 2012-05-14 2015-01-21 高通股份有限公司 Monitoring behavioral features in mobile multiprocessor platform
CN103226487A (en) * 2013-04-25 2013-07-31 中国人民解放军信息工程大学 Data distribution and local optimization method for heterogeneous many-core architecture multi-level storage structure

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种面向通用众核CPU的软件调试器设计;王敬宇 等;《计算机工程与科学》;20141015;第36卷(第10期);1854-1858 *

Also Published As

Publication number Publication date
CN105426296A (en) 2016-03-23

Similar Documents

Publication Publication Date Title
Fields et al. Focusing processor policies via critical-path prediction
Trümper et al. Understanding complex multithreaded software systems by using trace visualization
CN108469984B (en) Virtual machine introspection function level-based dynamic detection system and method for inner core of virtual machine
CN102955737A (en) Program debugging method and system of heterogeneous processor system
CN104169886B (en) The method and system indirectly sampled by the operation time detecting of address
CN109976989A (en) Cross-node application performance monitoring method, device and high performance computing system
Becker et al. Demystifying power and performance bottlenecks in autonomous driving systems
Ryckbosch et al. Fast, accurate, and validated full-system software simulation of x86 hardware
CN105426296B (en) Internuclear collaboration multithreading PMU event monitoring methods based on inserting label
CN103092759A (en) Code dynamic property parser in embedded environment
Taniça et al. Schedmon: A performance and energy monitoring tool for modern multi-cores
CN103455364B (en) A kind of multi-core environment concurrent program Cache performance online obtains system and method
Wesolowski et al. Datacenter-scale analysis and optimization of gpu machine learning workloads
UDDIN et al. High level simulation of SVP many-core systems
CN110109811B (en) A kind of source tracing method towards GPU calculated performance problem
CN104461832A (en) Method and device for monitoring resources of application server
Diaz et al. VIPPE, parallel simulation and performance analysis of multi-core embedded systems on multi-core platforms
Uddin et al. Collecting signatures to model latency tolerance in high-level simulations of microthreaded cores
Giorgi et al. Trace factory: Generating workloads for trace-driven simulation of shared-bus multiprocessors
Jiang et al. Hpc ai500: The methodology, tools, roofline performance models, and metrics for benchmarking hpc ai systems
CN115328731A (en) eBPF-based parallel program online performance data acquisition method
Uddin et al. Signature-based high-level simulation of microthreaded many-core architectures
Badr et al. A high-level model for exploring multi-core architectures
Nagendra Improving instruction cache performance for modern processors with growing workloads
Grass et al. Sampled simulation of task-based programs

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant