CN105426296B

CN105426296B - Internuclear collaboration multithreading PMU event monitoring methods based on inserting label

Info

Publication number: CN105426296B
Application number: CN201510826916.7A
Authority: CN
Inventors: 刘勇; 彭超; 陈华蓉; 王敬宇; 冯赟龙; 王雯霞
Original assignee: Wuxi Jiangnan Computing Technology Institute
Current assignee: Wuxi Jiangnan Computing Technology Institute
Priority date: 2015-11-24
Filing date: 2015-11-24
Publication date: 2018-04-10
Anticipated expiration: 2035-11-24
Also published as: CN105426296A

Abstract

The invention provides a kind of internuclear collaboration multithreading PMU event monitoring methods based on inserting label, for isomery many-core processor.Isomery many-core processor includes being used to perform the arithmetic core for calculating operation and the operation control core for performing control and service operations.Wherein, operation control core sets the thread performance event of concern run on each arithmetic core；Initialize the PMU of the thread run on each arithmetic core；Label is inserted on the thread run on each arithmetic core；The data that inserting label on the thread that operation control core is run on each arithmetic core is pellucidly collected from the background returns in real time；The data that finishing analysis return in operation control core set are with execution performance monitoring record, so as to form the performance monitoring of unified full processor.

Description

Internuclear collaboration multithreading PMU event monitoring methods based on inserting label

Technical field

The present invention relates to field of computer technology, it is more particularly related to a kind of based on the internuclear of inserting label Cooperate with multithreading PMU event monitoring methods.

Background technology

The designer of hardware architecture implants many hardware performances for their structure design of evaluation and test in processing Counter, this way come the Performance Influence Factor of analysis program using hardware supported also to provide possibility.With processor The continuous lifting updated with performance of design, it is single that modern processors are mostly integrated with a kind of special hardware performance monitoring Member, i.e. PMU (performance monitoring unit, are referred to as " performance monitoring unit " or " hardware performance counter "), come Performance event in collecting and treating apparatus.

For example, being missed the target event in the event of once command Cache (cache), then PMU corresponding registers are by adding One records this event.By PMU monitoring function, the actual performance event occurred in processor can be disclosed, by right The statistics and analysis of these performance events, it is hard that programmer is recognized that what kind of bottom is different program coding modes can produce Part behavior.Meanwhile according to these behaviors, it is performance that what hardware event have impact on program that can further analyze, so as to refer to Lead programmer and carry out the program update of algorithm aspect, and prompt compiler to carry out code optimization, and then help operating system real Existing more efficient resource management.

During actual development, many optimisation strategies of system have all used PMU Monitoring Datas.By PMU performance Monitoring function, system can provide the user information when being run than more comprehensive bottom.In terms of effect, due to PMU monitoring Data can truly reflect actual motion effect of the program in particular hardware platform, therefore procedural using PMU mechanism optimization There can be many natural advantages.

As many-core processor is increasingly becoming the capital equipment of high-performance calculation, the hardware of many-core processor how has been given play to Potentiality, performance monitoring technique play more and more important effect.By taking Godson 3A platforms as an example, prior art utility counter The performance analysis tool Tprofiler of a one process sampling is realized, its realization is divided into two modules：Front-end and back-end.Its Middle front end runs on client layer, is responsible for the performance information of analysis backend collection, tutorial program person's Optimized code；In rear end is run on Stratum nucleare, is responsible for control performance counter, caused hardware event information in collection procedure running.

However, the above-mentioned existing synergisticing performance monitoring technology on many-core processor, there is problems with：Firstly, because Each monitoring, data distribution, complicated communication is needed during collaboration；Secondly, there are certain expense, a large amount of threads in each monitoring point Monitor simultaneously and certain expense be present.

Specifically, it is however generally that, traditional performance monitoring for monokaryon, polycaryon processor is often each processor core Independent performance monitoring voluntarily is carried out using the PMU of each processor core, this can only just reflect that the performance of some processor core uses Efficiency, the performance behaviour in service of whole processor can not be reflected, in order to the whole processor of concentrated expression performance, it is necessary to certain These Monitoring Datas are carried out information fusion by data interaction mechanism on piece, are formed overall unified performance monitoring effect, are reached true The real overall performance service efficiency for effectively reflecting processor.

The content of the invention

The technical problems to be solved by the invention are for drawbacks described above in the prior art be present, for a kind of isomery many-core A kind of processor, there is provided multithreading synergisticing performance monitoring technology with low overhead and lightweight.

In order to realize above-mentioned technical purpose, according to the present invention, there is provided a kind of internuclear collaboration based on inserting label is multi-thread Journey PMU event monitoring methods, the internuclear collaboration multithreading PMU event monitorings method are used for isomery many-core processor, and isomery is many Core processor includes being used to perform the arithmetic core for calculating operation and the operation control core for performing control and service operations.

The internuclear collaboration multithreading PMU event monitoring methods include：Operation control core is set on each arithmetic core The thread performance event of concern of operation；Initialize the PMU of the thread run on each arithmetic core；In each arithmetic core Label is inserted on the thread of upper operation；On the thread that operation control core is run on each arithmetic core is pellucidly collected from the background The data that return in real time of inserting label；The data that finishing analysis return in operation control core set are monitored with execution performance.

Preferably, the internuclear collaboration multithreading PMU event monitorings method based on inserting label also includes：According to data Analysis result forming properties counting event records.

Preferably, the label is inserted into the precalculated position of each thread.

Preferably, the label of the inserting is used for the configuration information for registering performance count event.

Preferably, the label of the inserting is additionally operable to perceive the perform track of arithmetic core program.

The invention provides a kind of internuclear collaboration multithreading PMU event monitoring methods based on inserting label, it can be On a kind of isomery many-core processor, using PMU Monitoring Datas, program feature prison that is accurate and efficiently realizing whole chip Survey.

Brief description of the drawings

With reference to accompanying drawing, and by reference to following detailed description, it will more easily have more complete understanding to the present invention And be more easily understood its with the advantages of and feature, wherein：

Fig. 1 schematically shows the internuclear collaboration multithreading according to the preferred embodiment of the invention based on inserting label The schematic diagram of PMU event monitoring methods.

Fig. 2 schematically shows the internuclear collaboration multithreading according to the preferred embodiment of the invention based on inserting label The flow chart of PMU event monitoring methods.

It should be noted that accompanying drawing is used to illustrate the present invention, it is not intended to limit the present invention.Pay attention to, represent that the accompanying drawing of structure can It can be not necessarily drawn to scale.Also, in accompanying drawing, same or similar element indicates same or similar label.

Embodiment

In order that present disclosure is more clear and understandable, with reference to specific embodiments and the drawings in the present invention Appearance is described in detail.

Isomery many-core processor refers generally to the processor core with two kinds of difference in functionalitys, a kind of processor core on chip be present It is absorbed in and calculates, logical design is simple, and quantity is more, and calculating peaking capacity is strong, is generally used for accelerating intensive calculations, referred to as computing Core；A kind of processor core is absorbed in control and service, and logical design is complicated, negligible amounts, is generally used for realizing various functions Control and service operations, referred to as operation control core.

Internuclear collaboration multithreading PMU event monitorings technology based on inserting label perceives mechanism using control core timing, Inserting label based on arithmetic core program, service is monitored using the thread-level PMU in operation control core, in operation core scheming While calculation, record of the control core to arithmetic core PMU events is realized.On the one hand the technology realizes accurately can The arithmetic core PMU event monitorings of user program are corresponded to, inserting mechanism is on the other hand also reduced and arithmetic core program is held Capable interference.

As shown in figure 1, the insertion method based on compiler, arithmetic core journey of the performance monitoring functional module in multiple tasks Specified location insertion positioning label, for registering the configuration information of performance count event, and perceives arithmetic core program in sequence Perform track.Meanwhile lightweight monitoring scan service is established in the relatively light operation control core of processor active task, reasonable Positioning label in the period of configuration in " touch " arithmetic core program.According to the information of positioning label record, control core On monitoring scan service carry out multiple tasks arithmetic core performance counter configuration and read-write, and by Monitoring Data at Reason service carries out statistical disposition to the performance counter values got, ultimately forms accurate performance count logout.

Fig. 2 schematically shows the internuclear collaboration multithreading according to the preferred embodiment of the invention based on inserting label The flow chart of PMU event monitoring methods.This method is used to include calculating the arithmetic core of operation and for performing control for execution The isomery many-core processor of system and the operation control core of service operations.

As shown in Fig. 2 the internuclear collaboration multithreading PMU events prison according to the preferred embodiment of the invention based on inserting label Survey method includes：

First step S1：Operation control core sets the thread performance event of concern run on each arithmetic core；

Second step S2：Initialize the PMU of the thread run on each arithmetic core；

Third step S3：Label is inserted on the thread run on each arithmetic core；Preferably, the label is populated In the precalculated position of each thread；Preferably, the label of the inserting is used for the configuration information for registering performance count event, and feels Know the perform track of arithmetic core program.

Four steps S4：Inserting on the thread that operation control core is run on each arithmetic core is pellucidly collected from the background The data that dress label returns in real time；For example, the data of the return include but is not limited to：The configuration information of performance count event The perform track of (including information relevant with PMU etc.) and arithmetic core program.

5th step S5：The data that finishing analysis return in operation control core set are monitored with execution performance, and according to Data results forming properties counting event records；Thus, the performance monitoring of unified full processor is formd.

In summary, the internuclear collaboration multithreading PMU event monitoring devices of the invention based on inserting label, using transporting Calculate and label is inserted in core, the method that control core establishes PMU monitoring services, realize the flexible control of performance monitoring, lifted The monitoring efficiencies of concurrent program PMU events.

The present invention inserts label on arithmetic core, establishes thread-level PMU monitoring services, and take principal and subordinate's collaboration Performance data monitoring scheme.The internuclear collaboration multithreading PMU event monitoring devices based on inserting label of the present invention, utilize the free time Control core realize synergisticing performance monitoring function to whole chip, data communication overhead is small, the application to arithmetic core The influence of performance is substantially negligible.

Furthermore, it is necessary to explanation, unless stated otherwise or is pointed out, the otherwise term in specification " first ", " the Two ", the description such as " 3rd " is used only for distinguishing each component in specification, element, step etc., each without being intended to indicate that Logical relation or ordinal relation between component, element, step etc..

It is understood that although the present invention is disclosed as above with preferred embodiment, but above-described embodiment and it is not used to Limit the present invention.For any those skilled in the art, without departing from the scope of the technical proposal of the invention, Many possible changes and modifications are all made to technical solution of the present invention using the technology contents of the disclosure above, or are revised as With the equivalent embodiment of change.Therefore, every content without departing from technical solution of the present invention, the technical spirit pair according to the present invention Any simple modifications, equivalents, and modifications made for any of the above embodiments, still fall within the scope of technical solution of the present invention protection It is interior.

Claims

1. a kind of internuclear collaboration multithreading PMU event monitoring methods based on inserting label, the internuclear collaboration multithreading PMU things Part monitoring method is used for the program feature monitoring for realizing whole chip using PMU Monitoring Datas on isomery many-core processor, different Structure many-core processor includes being used to perform the arithmetic core for calculating operation and the operation control for performing control and service operations Core；It is characterized in that the internuclear collaboration multithreading PMU event monitoring methods include：

The specified location insertion positioning label in the arithmetic core program of multiple tasks, for registering the configuration of performance count event Information, and perceive the perform track of arithmetic core program；Arithmetic core is touched within a predetermined period of time in operation control core Positioning label in program, according to the information of positioning label record, the monitoring scan service on arithmetic core is controlled to carry out multiple The configuration and read-write of the arithmetic core performance counter of task, and the performance count serviced to getting is handled by Monitoring Data Device value carries out statistical disposition, forming properties counting event record；

Operation control core sets the thread performance event of concern run on each arithmetic core；

Initialize the PMU of the thread run on each arithmetic core；

Label is inserted on the thread run on each arithmetic core, wherein the label is inserted into the pre-determined bit of each thread Put, and the label of the inserting is used for the execution for the configuration information and perception arithmetic core program for registering performance count event Track；

Inserting label on the thread that operation control core is run on each arithmetic core is pellucidly collected from the background returns in real time The data returned；

The data that finishing analysis return in operation control core set are monitored with execution performance；

Recorded according to data results forming properties counting event.