CN111209155B - Performance detection method convenient for expansion and configuration - Google Patents

Performance detection method convenient for expansion and configuration Download PDF

Info

Publication number
CN111209155B
CN111209155B CN201811389865.6A CN201811389865A CN111209155B CN 111209155 B CN111209155 B CN 111209155B CN 201811389865 A CN201811389865 A CN 201811389865A CN 111209155 B CN111209155 B CN 111209155B
Authority
CN
China
Prior art keywords
event
monitoring
counting
instruction
detection method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811389865.6A
Other languages
Chinese (zh)
Other versions
CN111209155A (en
Inventor
费晓龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaxiaxin Beijing General Processor Technology Co ltd
Original Assignee
Huaxiaxin Beijing General Processor Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaxiaxin Beijing General Processor Technology Co ltd filed Critical Huaxiaxin Beijing General Processor Technology Co ltd
Priority to CN201811389865.6A priority Critical patent/CN111209155B/en
Publication of CN111209155A publication Critical patent/CN111209155A/en
Application granted granted Critical
Publication of CN111209155B publication Critical patent/CN111209155B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention belongs to the technical field of software monitoring, and particularly relates to a performance detection method convenient for expansion and configuration, which comprises the following specific steps of S1: executing the monitored write instruction, causing the write instruction to the execution unit, S2: after writing the order to the execution unit, distribute the category that the logic monitoring event locates in, and observe all subordinate counting units of this kind to be expropriated completely, if not expropriated completely, choose a vacant counting unit, this kind of performance detection method easy to expand and dispose, the design is rational, make the performance monitoring unit easy to expand, and reuse the existing special register and read and write the order format, flexible distribution and release, easy to use software, and can realize the discontinuous event statistics easily, through detecting the clock of the event bus of working condition control of the counting unit, reduce the power consumption.

Description

Performance detection method convenient for expansion and configuration
Technical Field
The invention relates to the technical field of software monitoring, in particular to a performance detection method convenient for expansion and configuration.
Background
The processor is an ultra-large scale integrated circuit and is an operation core and a control core of a computer. Its function is mainly to explain computer instruction and process Data in computer software, and the processor mainly includes Arithmetic Unit (ALU) and high-speed buffer memory (Cache) and Data (Data) for implementing connection between them, control and state Bus (Bus). The method is combined with an internal Memory (Memory) and an input/output (I/O) device to be called three core components of an electronic computer, a software program is arranged in a processor, and the software performance in the processor is usually monitored at present, but the current software performance monitoring is not easy to expand and is inconvenient for software use, only continuous event statistics can be carried out, and the power consumption is high.
Disclosure of Invention
The invention aims to provide a performance detection method convenient for expansion and configuration, so as to solve the problems that the existing software performance monitoring proposed in the background art is not easy to expand, is inconvenient for software use, can only carry out continuous event statistics and has large power consumption.
In order to achieve the purpose, the invention provides the following technical scheme: a performance detection method convenient for expansion and configuration comprises the following specific steps:
s1: executing the monitored write instruction to enable the write instruction to be sent to an execution unit;
s2: after writing an instruction to an execution unit, allocating the category of a logic monitoring event, observing whether all subordinate counting units of the category are completely affected, if not, selecting an idle counting unit, assigning the event and an initial value to the counting unit, starting to activate the counter, if all the counters are occupied, releasing the earliest monitoring event, and at a certain moment, simultaneously working a plurality of counting units in a certain category, receiving the events from other modules by a performance monitoring module, detecting by the currently working counting unit, and if the events in the category are triggered and are events concerned by the performance monitoring module, automatically adding the counter;
s3: and when all the counting units corresponding to a certain large class are monitored to be idle, closing the pipeline clock of the monitoring module for the event.
Preferably, in step S1, the instruction is a special register write instruction, which completely multiplexes the write path and the instruction format of the special register, the special register number therein represents a specific monitoring time, and the register number is divided into an event class code and an event code.
Preferably, in step S2, the counting unit is always in an operating state after being activated, and waits for self-adding at any time.
Compared with the prior art, the invention has the beneficial effects that: the performance detection method convenient for expansion and configuration is reasonable in design, enables the performance monitoring unit to be easily expanded, multiplexes the read-write instruction format of the existing special register, flexibly distributes and releases, facilitates software use, can easily realize discontinuous event statistics, controls the clock of an event bus by detecting the working state of the counting unit, and reduces power consumption.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below, and it should be apparent that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a technical scheme that: the events monitored by the processor are divided into two categories, namely micro-architecture and architecture, the events of the micro-architecture can be recorded as long as the trigger condition is met, and the architecture events can be recorded only by being successfully submitted after being triggered. Instruction cache related events, instruction TLB access, miss, backfill, instruction cache access, miss, backfill, instruction fetch and dispatch related events, dispatch N instructions (N is 1,2, 3.), dispatch stalls caused by front/back end resources, pipeline cancel operations, access related events, Load/Store cache overflow, Store bypass Load scenario, Bank conflict induced pipeline stalls, data cache access, miss, backfill, commit stage related events, speculatively executed instructions, committed instructions, non-aligned address access behavior, page crossing access behavior, exception trigger, interrupt response, CPU cycle number;
in order to meet the requirement of software for simultaneously monitoring a plurality of different events, each event classification comprises three hardware counting units for counting the events, each counting unit is provided with a valid bit, an event field, a counter,
therefore, software personnel need to ensure the reasonable use of the counting unit, once a certain period of monitoring task is completed, the monitored numerical value is read from the counting unit to the general register in time to complete the release of resources.
The event field records which specific event is monitored by the current counting unit and is used for matching the read request, the read request is successfully executed only when the event field of a specific counting unit is matched, if the event field of a specific counting unit is not matched, the full 0 value is returned, therefore, software personnel need to activate a counting unit for counting the events before reading a certain monitoring event, and the reading and writing need to occur in pairs, so that the complete software intention is ensured.
The conventional method is that the current counter is cleared from 0 directly, counting is started from 0, the software needs to monitor the triggering condition of a certain event in two periods of time discontinuously, if the current counter is cleared from 0 each time, a counting unit needs to be set twice and added after being read, if the initial value is allowed to be set, the counting unit can be set for the second time, the counting result of the last time is directly assigned, and the counting is added on the basis. If the counter reaches the upper limit, it remains unchanged.
The software implementation monitoring process is specifically as follows:
executing a monitored write instruction, wherein the instruction is used as a special register write instruction, completely multiplexing the write path and the instruction format of the previous special register, the number of the special register in the instruction represents a specific monitoring time, and the number of the register is divided into two parts: event class encoding and event encoding, taking the previous classification as an example, the instruction cache related event is 000, the instruction TLB access inside is 000, the TLB miss is 001, and six-bit encoding is used to determine the unique event.
After writing the instruction to the execution unit, the allocation logic will monitor the class in which the event is located and see if all the subordinate counting units of the class are fully invoked, if not, select a free counting unit, assign the event and the initial value to the counting unit, start activating the counter, and if all the technicians are occupied, release the earliest monitoring event before. The activated counting unit is not affected by other monitoring events.
At a certain moment, a plurality of counting units in a certain category can work simultaneously, the whole performance monitoring module can receive hundreds of events from other modules, the counting unit which works currently can detect that if the events in the category are triggered and are events concerned by the counting unit, the counter is added by itself.
Once the counting unit is activated, the counting unit is always in a working state and waits for self-adding at any time. The counter is released only by a read request, which also multiplexes the format of the read special register instruction. The sequence of the write requests of the monitoring event may not be consistent with the sequence of the read requests, and the scheduling queue may select the corresponding counting unit to release (the release process is just to clear the valid bit).
When all the counting units corresponding to a certain large class are monitored to be idle, the pipeline clock of the monitoring module is closed to reduce power consumption.
Because the instruction format of the read-write special register is special instruction coding, each monitoring event can correspond to a virtual architecture register number, and the simultaneous monitoring of a plurality of monitoring events is realized by using a small number of registers in hardware. Therefore, the codes can be flexibly distributed to realize the extension of the monitoring event.
Each class of time has a fixed maximum monitoring number, so when the event class is divided, the frequently monitored events need to be dispersed to different classes as much as possible, and the events need to be considered.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (2)

1. A performance detection method that facilitates expansion and configuration, characterized by: the performance detection method convenient for expansion and configuration comprises the following specific steps:
s1: executing a monitored write instruction to enable the write instruction to an execution unit, wherein the instruction is a special register write instruction, the write path and the instruction format of a previous special register are completely multiplexed, the number of the special register in the special register represents a specific monitoring time, and the number of the register is divided into two parts: event class encoding and event encoding;
s2: after writing an instruction to an execution unit, allocating the category of a logic monitoring event, observing whether all subordinate counting units of the category are completely affected, if not, selecting an idle counting unit, assigning the event and an initial value to the counting unit, starting to activate the counter, if all the counters are occupied, releasing the earliest monitoring event, and at a certain moment, simultaneously working a plurality of counting units in a certain category, receiving the events from other modules by a performance monitoring module, detecting by the currently working counting unit, and if the events in the category are triggered and are events concerned by the performance monitoring module, automatically adding the counter;
s3: and when all the counting units corresponding to a certain large class are monitored to be idle, closing the pipeline clock of the event to the monitoring module.
2. A performance detection method facilitating expansion and configuration according to claim 1, characterized in that: in step S2, the counting unit is always in the working state after being activated, and waits for self-adding at any time.
CN201811389865.6A 2018-11-21 2018-11-21 Performance detection method convenient for expansion and configuration Active CN111209155B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811389865.6A CN111209155B (en) 2018-11-21 2018-11-21 Performance detection method convenient for expansion and configuration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811389865.6A CN111209155B (en) 2018-11-21 2018-11-21 Performance detection method convenient for expansion and configuration

Publications (2)

Publication Number Publication Date
CN111209155A CN111209155A (en) 2020-05-29
CN111209155B true CN111209155B (en) 2022-09-23

Family

ID=70781871

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811389865.6A Active CN111209155B (en) 2018-11-21 2018-11-21 Performance detection method convenient for expansion and configuration

Country Status (1)

Country Link
CN (1) CN111209155B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0221779D0 (en) * 2002-09-19 2002-10-30 Advanced Risc Mach Ltd Controlling performance counters within a data processing system
CN1500248A (en) * 2000-12-29 2004-05-26 ض� Qualification of event detection by thread ID and thread privilege level
CN1648871A (en) * 2004-01-14 2005-08-03 国际商业机器公司 Method and apparatus for qualifying collection of performance monitoring events by types of interrupt when interrupt occurs
CN101192188A (en) * 2006-11-30 2008-06-04 国际商业机器公司 Weighted event counting system and method for processor performance measurements
CN103838539A (en) * 2012-11-23 2014-06-04 三星电子株式会社 Performance measurement unit, processor core comprising thereof and process profiling method
CN104216812A (en) * 2014-08-29 2014-12-17 杭州华为数字技术有限公司 Method and device for carrying out multi-event statistics on performance monitoring unit
CN105487958A (en) * 2015-11-24 2016-04-13 无锡江南计算技术研究所 Processor internal behavior monitoring method
CN107451038A (en) * 2016-05-30 2017-12-08 龙芯中科技术有限公司 Hardware event acquisition method, processor and computing system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7200522B2 (en) * 2005-01-27 2007-04-03 International Business Machines Corporation Method, apparatus, and computer program product in a performance monitor for sampling all performance events generated by a processor

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1500248A (en) * 2000-12-29 2004-05-26 ض� Qualification of event detection by thread ID and thread privilege level
GB0221779D0 (en) * 2002-09-19 2002-10-30 Advanced Risc Mach Ltd Controlling performance counters within a data processing system
CN1648871A (en) * 2004-01-14 2005-08-03 国际商业机器公司 Method and apparatus for qualifying collection of performance monitoring events by types of interrupt when interrupt occurs
CN101192188A (en) * 2006-11-30 2008-06-04 国际商业机器公司 Weighted event counting system and method for processor performance measurements
CN103838539A (en) * 2012-11-23 2014-06-04 三星电子株式会社 Performance measurement unit, processor core comprising thereof and process profiling method
CN104216812A (en) * 2014-08-29 2014-12-17 杭州华为数字技术有限公司 Method and device for carrying out multi-event statistics on performance monitoring unit
CN105487958A (en) * 2015-11-24 2016-04-13 无锡江南计算技术研究所 Processor internal behavior monitoring method
CN107451038A (en) * 2016-05-30 2017-12-08 龙芯中科技术有限公司 Hardware event acquisition method, processor and computing system

Also Published As

Publication number Publication date
CN111209155A (en) 2020-05-29

Similar Documents

Publication Publication Date Title
KR100384263B1 (en) Method and system for monitoring performance in multi-threaded processors
CN100383740C (en) Method and apparatus for suspending execution of a thread until a specified memory access occurs
US20110055838A1 (en) Optimized thread scheduling via hardware performance monitoring
US10061588B2 (en) Tracking operand liveness information in a computer system and performing function based on the liveness information
US7827541B2 (en) Method and apparatus for profiling execution of code using multiple processors
US6539500B1 (en) System and method for tracing
US7200522B2 (en) Method, apparatus, and computer program product in a performance monitor for sampling all performance events generated by a processor
TWI235912B (en) Performance monitor system and method suitable for use in an integrated circuit
US20080256339A1 (en) Techniques for Tracing Processes in a Multi-Threaded Processor
US20090132796A1 (en) Polling using reservation mechanism
TWI739748B (en) Measuring address translation latency
WO2007038800A2 (en) Profiling using a user-level control mechanism
US20060184837A1 (en) Method, apparatus, and computer program product in a processor for balancing hardware trace collection among different hardware trace facilities
KR20100112137A (en) Mechanism for profiling program software running on a processor
JP6450705B2 (en) Persistent commit processor, method, system and instructions
JPH10275099A (en) Performance monitor for data processing system
EP2790106A2 (en) Performance measurement unit, processor core including the same and process profiling method
US7617385B2 (en) Method and apparatus for measuring pipeline stalls in a microprocessor
US20210042146A1 (en) Systems, Methods, and Apparatuses for Resource Monitoring
CN111209155B (en) Performance detection method convenient for expansion and configuration
WO2008030708A1 (en) Event handling for architectural events at high privilege levels
US20220308882A1 (en) Methods, systems, and apparatuses for precise last branch record event logging
US20220100626A1 (en) Monitoring performance cost of events
Drung et al. Enhance performance of program automatic online judging systems using affinity algorithm and queuing theory in SMP environment
Li et al. A high efficient flash storage system for two-way cable modem

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant