CN105426296A - Tag insertion based inter-core cooperation multi-thread PMU event monitoring method - Google Patents

Tag insertion based inter-core cooperation multi-thread PMU event monitoring method Download PDF

Info

Publication number
CN105426296A
CN105426296A CN201510826916.7A CN201510826916A CN105426296A CN 105426296 A CN105426296 A CN 105426296A CN 201510826916 A CN201510826916 A CN 201510826916A CN 105426296 A CN105426296 A CN 105426296A
Authority
CN
China
Prior art keywords
core
pmu
plug
thread
monitoring method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510826916.7A
Other languages
Chinese (zh)
Other versions
CN105426296B (en
Inventor
刘勇
彭超
陈华蓉
王敬宇
冯赟龙
王雯霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201510826916.7A priority Critical patent/CN105426296B/en
Publication of CN105426296A publication Critical patent/CN105426296A/en
Application granted granted Critical
Publication of CN105426296B publication Critical patent/CN105426296B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring

Abstract

The invention provides a tag insertion based inter-core cooperation multi-thread PMU event monitoring method, which is used for a heterogeneous many-core processor. The heterogeneous many-core processor comprises calculation cores used for executing calculation operations and a calculation control core used for executing control and service operations, wherein the calculation control core sets a performance event concerned by a thread running in each calculation core; a PMU of the thread running in each calculation core is initialized; a tag is inserted in the thread running in each calculation core; the calculation control core transparently collects data returned in real time by the tag inserted in the thread running in each calculation core in a background; and the calculation control core intensively arranges and analyzes the returned data to execute a performance monitoring record so as to form unified full-processor performance monitoring.

Description

Based on the internuclear collaborative multithreading PMU event monitoring method of plug-in mounting label
Technical field
The present invention relates to field of computer technology, more particularly, the present invention relates to a kind of internuclear collaborative multithreading PMU event monitoring method based on plug-in mounting label.
Background technology
The deviser of hardware architecture is in order to evaluate and test their structural design, and process implants a lot of hardware performance counter, and this way is also for the Performance Influence Factor utilizing hardware supported to carry out routine analyzer provides possibility.Along with the continuous lifting with performance of updating of CPU design, modern processors is mostly integrated with the special hardware performance monitoring means of a class, i.e. PMU (performancemonitoringunit, be called " performance monitoring unit " or " hardware performance counter "), carry out the performance event in collecting and treating apparatus.
Such as, the event if generation once command Cache (high-speed cache) is missed the target, then the corresponding registers of PMU records this event by adding.By the monitoring function of PMU, can disclose the actual performance event occurred in processor, by the statistics and analysis to these performance events, programmer can recognize what kind of bottom hardware behavior is different program coding modes can produce.Simultaneously, according to these behaviors, can analyze further is the performance what hardware event have impact on program, thus tutorial program person carries out the program update of algorithm aspect, and point out compiler to carry out code optimization, and then help operating system to realize more efficient resource management.
In actual development process, a lot of optimisation strategy of system have all used PMU Monitoring Data.By the performance monitoring function of PMU, system can provide than information during more comprehensive bottom operation for user.In effect, the Monitoring Data due to PMU truly can reflect the actual motion effect of program in particular hardware platform, therefore utilizes PMU mechanism optimization program feature to have much natural advantage.
Along with many-core processor becomes the major equipment of high-performance calculation gradually, how to have given play to the hardware potentiality of many-core processor, performance monitoring technique plays a part more and more important.For Godson 3A platform, prior art utility counter achieves the performance analysis tool Tprofiler of a one process sampling, and its realization is divided into two modules: front-end and back-end.Wherein front end runs on client layer, is responsible for the performance information analyzing backend collection, tutorial program person's Optimized code; Rear end runs on inner nuclear layer, is responsible for control performance counter, the hardware event information produced in collection procedure operational process.
But above-mentioned existing synergisticing performance monitoring technology on many-core processor, has following problem: first, because monitor separately, Data distribution8, needs complicated communication time collaborative; Secondly, there is certain expense in each monitoring point, a large amount of thread is monitored simultaneously exists certain expense.
Particularly, generally speaking, traditional performance monitoring for monokaryon, polycaryon processor often each processor core utilizes the PMU of each processor core to carry out independent performance monitoring voluntarily, this just can only reflect the performance service efficiency of certain processor core, the performance behaviour in service of whole processor can not be reflected, in order to the performance of the whole processor of concentrated expression, need data interaction mechanism on certain sheet that these Monitoring Data are carried out information fusion, form overall unified performance monitoring effect, reach the overall performance service efficiency reflecting processor authentic and validly.
Summary of the invention
Technical matters to be solved by this invention is for there is above-mentioned defect in prior art, for a kind of isomery many-core processor, provides a kind of multithreading synergisticing performance monitoring technology with low expense and lightweight.
In order to realize above-mentioned technical purpose, according to the present invention, provide a kind of internuclear collaborative multithreading PMU event monitoring method based on plug-in mounting label, described internuclear collaborative multithreading PMU event monitoring method is used for isomery many-core processor, and isomery many-core processor comprises the arithmetic core for performing calculating operation and the operation control core for performing control and service operations.
Described internuclear collaborative multithreading PMU event monitoring method comprises: operation control core arranges the performance event that thread that each arithmetic core runs is concerned about; The PMU of the thread that each arithmetic core of initialization runs; Plug-in mounting label on the thread that each arithmetic core runs; Operation control core collects the data that the plug-in mounting label on thread that each arithmetic core runs returns in real time pellucidly on backstage; The data that in operation control core set, finishing analysis returns are monitored with execution performance.
Preferably, the described internuclear collaborative multithreading PMU event monitoring method based on plug-in mounting label also comprises: according to data results forming property counting event record.
Preferably, described label is inserted into the precalculated position of each thread.
Preferably, the label of described plug-in mounting is for registering the configuration information of performance count event.
Preferably, the label of described plug-in mounting is also for the execution track of perception arithmetic core program.
The invention provides a kind of internuclear collaborative multithreading PMU event monitoring method based on plug-in mounting label, it on a kind of isomery many-core processor, can utilize PMU Monitoring Data, accurately and efficiently realizes the program feature monitoring of whole chip.
Accompanying drawing explanation
By reference to the accompanying drawings, and by reference to detailed description below, will more easily there is more complete understanding to the present invention and more easily understand its adjoint advantage and feature, wherein:
Fig. 1 schematically shows according to the preferred embodiment of the invention based on the schematic diagram of the internuclear collaborative multithreading PMU event monitoring method of plug-in mounting label.
Fig. 2 schematically shows according to the preferred embodiment of the invention based on the process flow diagram of the internuclear collaborative multithreading PMU event monitoring method of plug-in mounting label.
It should be noted that, accompanying drawing is for illustration of the present invention, and unrestricted the present invention.Note, represent that the accompanying drawing of structure may not be draw in proportion.Further, in accompanying drawing, identical or similar element indicates identical or similar label.
Embodiment
In order to make content of the present invention clearly with understandable, below in conjunction with specific embodiments and the drawings, content of the present invention is described in detail.
Isomery many-core processor refers generally to chip to exist the processor core with two kinds of difference in functionalitys, and a kind of processor core is absorbed in calculating, and logical design is simple, and quantity is more, calculates peaking capacity strong, is generally used for acceleration intensive calculations, is referred to as arithmetic core; Processor core is absorbed in control and a service, and logical design is complicated, negligible amounts, is generally used for and realizes the control of various function and service operations, be referred to as operation control core.
Internuclear collaborative multithreading PMU event monitoring technology based on plug-in mounting label adopts control core timing perception mechanism, based on the plug-in mounting label of arithmetic core program, utilize the thread-level PMU Monitoring Service in operation control core, while arithmetic core calculates, realize the record of control core to arithmetic core PMU event.This technology achieves the arithmetic core PMU event monitoring that can correspond to user program comparatively accurately on the one hand, also reduces the interference that plug-in mounting mechanism performs arithmetic core program on the other hand.
As shown in Figure 1, based on the insertion method of compiler, performance monitoring functional module assigned address in the arithmetic core program of multiple task inserts positioning label, for registering the configuration information of performance count event, and the execution track of perception arithmetic core program.Meanwhile, the operation control core that processor active task is relatively light is set up lightweight monitoring scan service, the positioning label within the time period of reasonable disposition in " touch " arithmetic core program.According to the information of positioning label record, monitoring scan service on control core carries out configuration and the read-write of the arithmetic core performance counter of multiple task, and by Monitoring Data process service, statistical treatment is carried out to the performance counter values got, final formation performance count logout comparatively accurately.
Fig. 2 schematically shows according to the preferred embodiment of the invention based on the process flow diagram of the internuclear collaborative multithreading PMU event monitoring method of plug-in mounting label.The method is for comprising arithmetic core for performing calculating operation and the isomery many-core processor for the operation control core that performs control and service operations.
As shown in Figure 2, comprise based on the internuclear collaborative multithreading PMU event monitoring method of plug-in mounting label according to the preferred embodiment of the invention:
First step S1: operation control core arranges the performance event that thread that each arithmetic core runs is concerned about;
Second step S2: the PMU of the thread that each arithmetic core of initialization runs;
Third step S3: plug-in mounting label on the thread run on each arithmetic core; Preferably, described label is inserted into the precalculated position of each thread; Preferably, the label of described plug-in mounting is for registering the configuration information of performance count event, and the execution track of perception arithmetic core program.
4th step S4: operation control core collects the data that the plug-in mounting label on thread that each arithmetic core runs returns in real time pellucidly on backstage; Such as, the data returned described in include but not limited to: the configuration information (comprising the information etc. relevant with PMU) of performance count event and the execution track of arithmetic core program.
5th step S5: the data that in operation control core set, finishing analysis returns are monitored with execution performance, and according to data results forming property counting event record; Thus, the performance monitoring of unified full processor is defined.
In sum, internuclear collaborative multithreading PMU event monitoring device based on plug-in mounting label of the present invention, adopt plug-in mounting label on arithmetic core, control core sets up the method for PMU Monitoring Service, achieve the flexible control of performance monitoring, improve the monitoring efficiency of concurrent program PMU event.
The present invention is plug-in mounting label on arithmetic core, sets up thread-level PMU Monitoring Service, and takes the performance data monitoring scheme that principal and subordinate works in coordination with.Internuclear collaborative multithreading PMU event monitoring device based on plug-in mounting label of the present invention, the control core of utilization free time achieves the synergisticing performance monitoring function to whole chip, data communication expense is little, substantially can ignore the impact of the application performance of arithmetic core.
In addition, it should be noted that, unless stated otherwise or point out, otherwise the term " first " in instructions, " second ", " the 3rd " etc. describe only for distinguishing each assembly, element, step etc. in instructions, instead of for representing logical relation between each assembly, element, step or ordinal relation etc.
Be understandable that, although the present invention with preferred embodiment disclose as above, but above-described embodiment and be not used to limit the present invention.For any those of ordinary skill in the art, do not departing under technical solution of the present invention ambit, the technology contents of above-mentioned announcement all can be utilized to make many possible variations and modification to technical solution of the present invention, or be revised as the Equivalent embodiments of equivalent variations.Therefore, every content not departing from technical solution of the present invention, according to technical spirit of the present invention to any simple modification made for any of the above embodiments, equivalent variations and modification, all still belongs in the scope of technical solution of the present invention protection.

Claims (5)

1. the internuclear collaborative multithreading PMU event monitoring method based on plug-in mounting label, described internuclear collaborative multithreading PMU event monitoring method is used for isomery many-core processor, and isomery many-core processor comprises the arithmetic core for performing calculating operation and the operation control core for performing control and service operations; It is characterized in that described internuclear collaborative multithreading PMU event monitoring method comprises:
Operation control core arranges the performance event that thread that each arithmetic core runs is concerned about;
The PMU of the thread that each arithmetic core of initialization runs;
Plug-in mounting label on the thread that each arithmetic core runs;
Operation control core collects the data that the plug-in mounting label on thread that each arithmetic core runs returns in real time pellucidly on backstage;
The data that in operation control core set, finishing analysis returns are monitored with execution performance.
2. the internuclear collaborative multithreading PMU event monitoring method based on plug-in mounting label according to claim 1, characterized by further comprising: according to data results forming property counting event record.
3. the internuclear collaborative multithreading PMU event monitoring method based on plug-in mounting label according to claim 1 and 2, it is characterized in that, described label is inserted into the precalculated position of each thread.
4. the internuclear collaborative multithreading PMU event monitoring method based on plug-in mounting label according to claim 1 and 2, it is characterized in that, the label of described plug-in mounting is for registering the configuration information of performance count event.
5. the internuclear collaborative multithreading PMU event monitoring method based on plug-in mounting label according to claim 1 and 2, it is characterized in that, the label of described plug-in mounting is also for the execution track of perception arithmetic core program.
CN201510826916.7A 2015-11-24 2015-11-24 Internuclear collaboration multithreading PMU event monitoring methods based on inserting label Active CN105426296B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510826916.7A CN105426296B (en) 2015-11-24 2015-11-24 Internuclear collaboration multithreading PMU event monitoring methods based on inserting label

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510826916.7A CN105426296B (en) 2015-11-24 2015-11-24 Internuclear collaboration multithreading PMU event monitoring methods based on inserting label

Publications (2)

Publication Number Publication Date
CN105426296A true CN105426296A (en) 2016-03-23
CN105426296B CN105426296B (en) 2018-04-10

Family

ID=55504514

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510826916.7A Active CN105426296B (en) 2015-11-24 2015-11-24 Internuclear collaboration multithreading PMU event monitoring methods based on inserting label

Country Status (1)

Country Link
CN (1) CN105426296B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112445547A (en) * 2019-09-02 2021-03-05 无锡江南计算技术研究所 Low-disturbance performance data acquisition method for heterogeneous many-core processor

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012133682A (en) * 2010-12-22 2012-07-12 Nec Corp Computer, core allocation method and program
CN103226487A (en) * 2013-04-25 2013-07-31 中国人民解放军信息工程大学 Data distribution and local optimization method for heterogeneous many-core architecture multi-level storage structure
US20130305359A1 (en) * 2012-05-14 2013-11-14 Qualcomm Incorporated Adaptive Observation of Behavioral Features on a Heterogeneous Platform
US20140006751A1 (en) * 2012-07-02 2014-01-02 Lsi Corporation Source Code Level Multistage Scheduling Approach for Software Development and Testing for Multi-Processor Environments

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012133682A (en) * 2010-12-22 2012-07-12 Nec Corp Computer, core allocation method and program
US20130305359A1 (en) * 2012-05-14 2013-11-14 Qualcomm Incorporated Adaptive Observation of Behavioral Features on a Heterogeneous Platform
CN104303156A (en) * 2012-05-14 2015-01-21 高通股份有限公司 Monitoring behavioral features in mobile multiprocessor platform
US20140006751A1 (en) * 2012-07-02 2014-01-02 Lsi Corporation Source Code Level Multistage Scheduling Approach for Software Development and Testing for Multi-Processor Environments
CN103226487A (en) * 2013-04-25 2013-07-31 中国人民解放军信息工程大学 Data distribution and local optimization method for heterogeneous many-core architecture multi-level storage structure

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王敬宇 等: "一种面向通用众核CPU的软件调试器设计", 《计算机工程与科学》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112445547A (en) * 2019-09-02 2021-03-05 无锡江南计算技术研究所 Low-disturbance performance data acquisition method for heterogeneous many-core processor

Also Published As

Publication number Publication date
CN105426296B (en) 2018-04-10

Similar Documents

Publication Publication Date Title
CN102955737B (en) The program debugging method of heterogeneous processor system and system
Cui et al. Efficient deterministic multithreading through schedule relaxation
Zheng et al. GRace: a low-overhead mechanism for detecting data races in GPU programs
CN101354758A (en) System and method for integrating real-time data and relationship data
Kim et al. Predicting potential speedup of serial code via lightweight profiling and emulations with memory performance model
Liu et al. Pinpointing data locality bottlenecks with low overhead
CN107003894A (en) Apparatus and method for the parser of hardware transactional internally stored program
CN104850411A (en) Storage system reference evaluation program generating method and apparatus
Kim et al. A study of source-level compiler algorithms for automatic construction of pre-execution code
Zhou et al. GPA: A GPU performance advisor based on instruction sampling
Zhang et al. iMLBench: A machine learning benchmark suite for CPU-GPU integrated architectures
CN102110052A (en) Parallel acceleration method for dynamic analysis of program behavior
CN103455364B (en) A kind of multi-core environment concurrent program Cache performance online obtains system and method
Faxén et al. Embla-data dependence profiling for parallel programming
WO2020114311A1 (en) Cpu-gpu heterogeneous soc performance characterization method based on machine learning
US20230244588A1 (en) Parallel program scalability bottleneck detection method and computing device
CN105426296A (en) Tag insertion based inter-core cooperation multi-thread PMU event monitoring method
UDDIN et al. High level simulation of SVP many-core systems
Syrowik et al. Use of CPU performance counters for accelerator selection in HLS-generated CPU-accelerator systems
Andjelkovic et al. Trace server: A tool for storing, querying and analyzing execution traces
CN115328731A (en) eBPF-based parallel program online performance data acquisition method
Nilakantan et al. Platform-independent analysis of function-level communication in workloads
CN103530132A (en) Method for transplanting CPU (central processing unit) serial programs to MIC (microphone) platform
Gruber Performance debugging toolbox for binaries: sensitivity analysis and dependence profiling
Sartor et al. Androprof: A profiling tool for the android platform

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant