CN115269309A - Processor micro-architecture monitoring method and device - Google Patents
Processor micro-architecture monitoring method and device Download PDFInfo
- Publication number
- CN115269309A CN115269309A CN202210752152.1A CN202210752152A CN115269309A CN 115269309 A CN115269309 A CN 115269309A CN 202210752152 A CN202210752152 A CN 202210752152A CN 115269309 A CN115269309 A CN 115269309A
- Authority
- CN
- China
- Prior art keywords
- monitoring
- processor
- data
- debugging
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3024—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a central processing unit [CPU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/22—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
- G06F11/2205—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
- G06F11/2236—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested to test CPU or processors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/22—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
- G06F11/2273—Test methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention provides a processor micro-architecture monitoring method and device. The method comprises the following steps: the debugging state of the processor to be tested is controlled based on the component control module, and direct monitoring data of the processor to be tested is obtained based on the data obtaining module; the component control module and the data acquisition module are positioned in the kernel space of the debugging processor; the direct monitoring data are transmitted into a memory space which is created in advance based on a data transmission equipment driving module, and the direct monitoring data in the memory space are read by a monitoring data analysis module for analysis; the data transmission device driving module is located in the kernel space of the debugging processor, and the monitoring data analysis module is located in the user space of the debugging processor. The processor micro-architecture monitoring method provided by the invention can effectively improve the reliability, stability and expandability of the processor micro-architecture monitoring, thereby improving the processor micro-architecture monitoring effect.
Description
Technical Field
The invention relates to the technical field of computer application, in particular to a processor micro-architecture monitoring method and device. In addition, an electronic device and a processor-readable storage medium are also related.
Background
In the technical documentation of ARM (Advanced RISC Machines) processors, the processor state is divided into a normal state and a debug state. In the normal state, the processor executes processes using an efficient modern pipeline, while in the debug state, the pipeline of the processor is halted and the system state and microarchitectural state of the processor are fixed. To improve the debuggability of the processor, the ARM processor provides a set of debug registers. The debug registers are primarily oriented to the debug state and provide basic single processor debug functions including bringing the processor into the debug state, executing instructions in a single step in the debug state, obtaining processor data in the debug state and transferring processor data in the debug state, etc. The breakpoint register in the debug register includes DBGBCR and DBGBVR, which enables the processor to enter a debug state, the former for controlling the breakpoint attribute and the latter for setting the breakpoint address. When the instruction address of the process matches the value of the DBGBVR, the processor will flush the pipeline and enter a debug state. In a debugging state, the processor is taken over by the debugger, and the debugger acquires the micro-architecture state of the processor through DBGDTRTX and DBGDTRRX registers in the debugging register and performs data interaction. The debugger steps through EDITR and EDSCR registers a portion of the instructions on the processor under test in a debug state. The ARM processor provides an interface for debugging registers, so that other processors on the same chip can be used as a debugger to debug a specific processor, and the realization of an on-chip cross-core debugging model is possible.
To improve the testability of the processor, the ARM processor also provides a debug architecture CoreSight. CoreSight is an IP core integrated by an ARM processor and used for debugging, and includes an Advanced Peripheral Bus (APB), a Cross Trigger Interface (CTI), an Embedded Trace macro module (ETM), an Embedded Trace Buffer (ETB), and the like. The advanced peripheral bus provides an access interface of the debugging architecture, and the debugging architecture, such as a CTI register set and the like, can be accessed through a fixed physical address. The CTI component realizes the direct debugging signal receiving and sending of the debugger and the debugged processor, and the technical document of the ARM processor provides the detailed steps of using the CTI component to enable the processor to be tested to enter and exit the debugging state. By using the interface, debugging signals can be directly sent and received between processors on the same chip through a CTI register without relying on a JTAG (Joint Test Action Group) interface and an external debugger, so that the realization of a cross-core debugging model on the chip becomes possible. However, most existing COReSigt-based ARM debuggers or monitoring tools use external debuggers, including industry DS-5 and ADS, and Ninja, among others. External debuggers mostly have good user interfaces and rich debugging functions, but are insufficient in monitoring effect. Therefore, how to provide a more efficient micro-architecture monitoring solution for a processor is an urgent problem to be solved.
Disclosure of Invention
Therefore, the invention provides a processor micro-architecture monitoring method and device, which aim to overcome the defect that the monitoring effect is poor due to the fact that a processor micro-architecture monitoring scheme in the prior art is high in limitation.
In a first aspect, the present invention provides a method for monitoring a micro-architecture of a processor, comprising: controlling the debugging state of a processor to be tested based on a preset component control module, and acquiring direct monitoring data of the processor to be tested based on a data acquisition module; the component control module and the data acquisition module are positioned in a kernel space of the debugging processor;
transmitting the direct monitoring data into a memory space which is created in advance based on a data transmission equipment driving module, and reading the direct monitoring data in the memory space by using a monitoring data analysis module for analysis; the data transmission device driving module is located in a kernel space of the debugging processor, and the monitoring data analysis module is located in a user space of the debugging processor.
Further, the component control module includes: the system comprises a debugging register control module, a cross trigger interface component control module and a performance monitoring component control module;
the debugging state of the processor to be tested is controlled based on the preset component control module, and the method specifically comprises the following steps: reading and writing the content of a debugging register of the processor to be tested through an interface of the processor to be tested based on the debugging register control module; based on the cross trigger interface component control module, reading and writing the content of the cross trigger interface component control module through the interface; based on the performance monitoring component control module, reading and writing the content of the performance monitoring component through the interface; the processor to be tested is controlled to enter and exit a debugging state, the processor to be tested is controlled to execute a corresponding monitoring instruction in the debugging state, and the data acquisition module is controlled to acquire monitoring data and the data transmission module to transmit the monitoring data.
Further, the processor micro-architecture monitoring method further includes:
acquiring indirect monitoring data of the processor to be tested based on the data acquisition module; the indirect monitoring data comprises monitoring data corresponding to a branch predictor, an instruction page table cache, a data page table cache and a secondary cache in the processor to be tested;
writing the indirect monitoring data into the memory space on line through a preset performance monitoring component, and reading the indirect monitoring data in the memory space by using a monitoring data analysis module for analysis; the performance monitoring component comprises a micro-architecture event corresponding to the processor to be tested, and the combination of the micro-architecture events can indirectly reflect the key attributes of the micro-architecture components in the processor to be tested.
Further, before the direct monitoring data is transmitted into the memory space created in advance based on the data transmission device driver module, the method further includes: acquiring a monitoring command sent by a user process based on a preset cross-domain communication protocol, and checking the validity of the monitoring command according to the current state of a monitoring system by using a monitoring system state monitor of the kernel space;
the cross-domain communication protocol is obtained by dividing the component control module and the data acquisition module into independent sub-monitoring function modules, packaging the sub-monitoring function modules into functions, writing the functions into the kernel space, and mapping the sub-monitoring function modules into monitoring commands and packaging the monitoring commands.
Further, the direct monitoring data includes: the first-level cache and the joint page table cache corresponding monitoring data.
Further, the processor micro-architecture monitoring method further includes: acquiring the content of the corresponding micro-architecture component based on the data acquisition module; wherein the content of the microarchitectural component comprises the content of an event counter in the performance monitoring component and the content of a clock counter in the performance monitoring component;
the reading of the indirect monitoring data in the memory space by using the monitoring data analysis module for analysis specifically includes:
and after the content of the micro-architecture component is obtained by the monitoring data analysis module, respectively analyzing the monitoring data corresponding to the branch predictor, the instruction page table cache, the data page table cache and the secondary cache through corresponding target algorithms so as to recover the key attribute of the micro-architecture component.
Further, before acquiring a monitoring command sent by a user process based on a preset cross-domain communication protocol, the processor micro-architecture monitoring method further includes:
dividing the monitoring function of the monitoring system in the debugging processor into a plurality of independent sub-monitoring function modules, and packaging the sub-monitoring function modules into independent functions in the kernel space to realize the monitoring function; packaging the monitoring function into a monitoring command so that the user process of the user space can call the function of the kernel space through the monitoring command to realize the corresponding monitoring function; using a finite state machine in the kernel space, and checking whether a monitoring command corresponding to the user process is legal or not according to the current state of the monitoring system; and realizing cross-domain communication between the user space and the kernel space based on a Netlink mechanism of a Linux operating system, and encapsulating the monitoring command into the cross-domain communication protocol.
In a second aspect, the present invention further provides a processor micro-architecture monitoring device, including:
the control unit is used for controlling the debugging state of the processor to be tested based on a preset component control module and acquiring direct monitoring data of the processor to be tested based on a data acquisition module; the component control module and the data acquisition module are positioned in a kernel space of the debugging processor;
the first monitoring data analysis unit is used for transmitting the direct monitoring data into a memory space which is created in advance based on a data transmission equipment driving module, and reading the direct monitoring data in the memory space by using a monitoring data analysis module for analysis; the data transmission device driving module is located in a kernel space of the debugging processor, and the monitoring data analysis module is located in a user space of the debugging processor.
Further, the component control module includes: the system comprises a debugging register control module, a cross trigger interface component control module and a performance monitoring component control module;
the control unit is specifically configured to: reading and writing the content of a debugging register of the processor to be tested through an interface of the processor to be tested based on the debugging register control module; based on the cross trigger interface component control module, reading and writing the content of the cross trigger interface component control module through the interface; based on the performance monitoring component control module, reading and writing the content of the performance monitoring component through the interface; the processor to be tested is controlled to enter and exit a debugging state, the processor to be tested is controlled to execute a corresponding monitoring instruction in the debugging state, and the data acquisition module is controlled to acquire monitoring data and the data transmission module to transmit the monitoring data.
Further, the processor microarchitecture monitoring device further includes:
the second monitoring data analysis unit is used for analyzing indirect monitoring data of the processor to be tested, which is acquired based on the data acquisition module; the indirect monitoring data comprises monitoring data corresponding to a branch predictor, an instruction page table cache, a data page table cache and a secondary cache in the processor to be tested;
writing the indirect monitoring data into the memory space on line through a preset performance monitoring component, and reading the indirect monitoring data in the memory space by using a monitoring data analysis module for analysis; the performance monitoring component comprises a micro-architecture event corresponding to the processor to be tested, and the combination of the micro-architecture events can indirectly reflect the key attributes of the micro-architecture components in the processor to be tested.
Further, before the direct monitoring data is transmitted into the memory space created in advance based on the data transmission device driver module, the method further includes: the monitoring command analysis unit is used for analyzing a monitoring command which is sent by a user process and acquired based on a preset cross-domain communication protocol, and checking the validity of the monitoring command according to the current state of the monitoring system by using a monitoring system state monitor of the kernel space;
the cross-domain communication protocol is obtained by dividing the component control module and the data acquisition module into independent sub-monitoring function modules, packaging the sub-monitoring function modules into functions, writing the functions into the kernel space, and mapping the sub-monitoring function modules into monitoring commands and packaging the monitoring commands.
Further, the direct monitoring data includes: the first level cache and the joint page table cache corresponding monitoring data.
Further, the processor microarchitecture monitoring device further includes: the data content acquisition unit is used for acquiring the content of the corresponding micro-architecture component based on the data acquisition module; wherein the content of the microarchitectural component comprises the content of an event counter in the performance monitoring component and the content of a clock counter in the performance monitoring component;
the second monitoring data analysis unit is specifically configured to:
and after the content of the micro-architecture component is obtained by the monitoring data analysis module, respectively analyzing the monitoring data corresponding to the branch predictor, the instruction page table cache, the data page table cache and the secondary cache through corresponding target algorithms so as to recover the key attribute of the micro-architecture component.
Further, before acquiring the monitoring command sent by the user process based on the preset cross-domain communication protocol, the processor micro-architecture monitoring device further includes:
the interaction unit is used for dividing the monitoring function of the monitoring system in the debugging processor into a plurality of independent sub-monitoring function modules and packaging the sub-monitoring function modules into independent functions in the kernel space so as to realize the monitoring function; packaging the monitoring function into a monitoring command so that the user process of the user space can call the function of the kernel space through the monitoring command to realize the corresponding monitoring function; using a finite state machine in the kernel space, and checking whether a monitoring command corresponding to the user process is legal or not according to the current state of the monitoring system; and realizing cross-domain communication between the user space and the kernel space based on a Netlink mechanism of a Linux operating system, and encapsulating the monitoring command into the cross-domain communication protocol.
In a third aspect, the present invention also provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the processor micro-architecture monitoring method as described in any one of the above when executing the computer program.
In a fourth aspect, the present invention further provides a processor-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the processor micro-architecture monitoring method according to any one of the above.
The processor micro-architecture monitoring method provided by the invention controls the debugging state of a processor to be tested through a component control module positioned in the kernel space of a debugging processor, acquires direct monitoring data of the processor to be tested based on a data acquisition module, transmits the direct monitoring data into a memory space created by a data transmission equipment driving module which is positioned in the kernel space of the debugging processor in advance, and reads the direct monitoring data in the memory space for analysis by utilizing a monitoring data analysis module positioned in the user space of the debugging processor; the method has wide monitoring coverage range, and can effectively improve the reliability, stability and expandability of the micro-architecture monitoring of the processor, thereby improving the system structure and the micro-architecture monitoring effect of the processor.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart illustrating a method for monitoring a micro-architecture of a processor according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating the coverage of a processor micro-architectural monitoring method according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a monitoring system corresponding to the micro-architectural monitoring method for a processor according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a monitoring system runtime state machine provided by an embodiment of the present invention;
FIG. 5 is a basic flowchart of user process instruction instrumentation according to an embodiment of the present invention;
FIG. 6 is a block diagram of a micro-architectural monitoring device for a processor according to an embodiment of the present invention;
fig. 7 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments of the present invention, belong to the protection scope of the present invention.
The invention provides a monitoring method of a processor micro-architecture. The method can realize a simple and efficient on-chip cross-core debugging model based on a debugging register, a performance monitoring module and a debugging architecture CoreSigt of an ARM processor, thereby realizing the real-time monitoring of the micro-architecture when the processor runs. The debugging register provides a basic interface for debugging a single processor, the performance monitoring module provides a method for indirectly monitoring micro-architecture components of the processor, and the debugging architecture CoreSight provides a mechanism for quitting debugging of the on-chip processor. The direct monitoring range of the method comprises a first-level Cache (L1 Cache or L1 Cache) and a combined page table Cache (Unified Translation Cache, unified TLB); the indirect monitoring range of the method comprises partial attributes of a Branch Predictor (BP), an Instruction page table Cache (Instruction TLB), a Data page table Cache (Data TLB) and a second-level Cache (L2 Cache or L2 Cache). The method not only supports micro-architecture monitoring based on hardware breakpoints in the application process, but also supports instrumentation of instruction granularity, and can track instruction flow and data flow of an ARM processor system structure level.
The following describes an embodiment of the micro-architecture monitoring method of a processor based on the present invention in detail. As shown in fig. 1, which is a schematic flow chart of a processor micro-architecture monitoring method according to an embodiment of the present invention, the specific implementation process includes the following steps:
step 101: the debugging state of the processor to be tested is controlled based on a preset component control module, and direct monitoring data of the processor to be tested is obtained based on a data obtaining module. The component control module and the data acquisition module are positioned in a kernel space of the debugging processor. The direct monitoring data includes: the first-level cache and the joint page table cache corresponding monitoring data. The processor under test may refer to an ARM processor under test, such as CPU1 in fig. 3. The debugging processor may refer to an ARM processor as a debugger, such as the CPU2 in fig. 3.
The component control module specifically comprises: the hardware comprises a debugging register control module, a Cross Trigger Interface (CTI) component control module and a Performance monitoring component control module (namely (Performance Monitor Unit, PMU) component control module), wherein the debugging register control module, the CTI component control module and the Performance monitoring component control module can be realized based on technical documents of an ARM processor. Correspondingly, the debugging state of the processor to be tested is controlled based on the preset component control module, and the corresponding process comprises the following steps: reading and writing the content of a debugging register of the processor to be tested through an interface of the processor to be tested based on the debugging register control module; based on the cross trigger interface component control module, reading and writing the content of the cross trigger interface component control module through the interface; based on the performance monitoring component control module, reading and writing the content of the performance monitoring component through the interface; the processor to be tested is controlled to enter and exit a debugging state, the processor to be tested is controlled to execute a corresponding monitoring instruction in the debugging state, and the data acquisition module is controlled to acquire monitoring data and the data transmission module to transmit the monitoring data.
Specifically, based on the debugging register control module, writing, deleting and attribute configuration of a hardware breakpoint are realized by configuring DBGBCR and DBGBVR registers; controlling single step execution of an instruction in a debugging state through ITE fields of an EDITR register and an EDSCR register; setting an acquisition mode of register and memory data in a debugging state through an MA field of an EDSCR register; synchronization of data acquisition is controlled by the TXfull and RXfull fields of the EDSCR registers. Based on the cross trigger interface component control module, the transmission of a debugging exit signal is realized by controlling registers such as CTICONTROL, CTIGATE, CTIOUTEN, CTIAPPPULSE, CTIINTACK and CTITRIGOUTSTATUS, so that the processor to be tested is controlled to exit the debugging state. Based on the PMU component control module, PMU is turned on and off through PMCR and PMCNTENSET registers, system structure and micro-architecture events to be monitored are configured through PMEVTYPER registers, and the use permission of the PMU component is configured through a PMUSERENR register. Further, based on the data acquisition and transmission module, the debugging registers DBGDTRRX and DBGDTRTX are used to acquire the register and memory data (i.e. the monitoring data of the system structure) of the processor to be tested, and 64-bit data is acquired at most each time; the method comprises the steps of using a first-level Cache (L1 Cache) provided by an ARM processor and a combined page table Cache (Unified TLB) access interface to obtain relevant micro-architecture component contents, using a PMEVCNTR register of a PMU to obtain PMU event counter contents, and using the PMCNTR register to obtain PMU clock counter contents, namely monitoring data of micro-architecture events. After the monitoring data is acquired, the data acquisition and transmission module stores the monitoring data based on a Linux Virtual File System (VFS) and a page table management mechanism, and specifically, the data acquisition and transmission module creates and mounts a device File onto the Linux System and then binds the device File with a VFS node. The data acquisition and transmission module comprises a data acquisition module and a data transmission module. In the specific implementation process, in order to implement automatic and reliable monitoring, the component control module and the data acquisition and transmission module need to be divided into independent monitoring function modules in advance, and the independent monitoring function modules are packaged into functions and then written into a kernel space, and then the corresponding monitoring function modules are mapped into monitoring commands and packaged into a cross-domain communication protocol, so that interaction between a user and a monitoring system is implemented.
Step 102: transmitting the direct monitoring data into a memory space which is created in advance based on a data transmission equipment driving module, and reading the direct monitoring data in the memory space by using a monitoring data analysis module for analysis; the data transmission device driving module is located in a kernel space of the debugging processor, and the monitoring data analysis module is located in a user space of the debugging processor. The direct monitoring data includes: the first level Cache (L1 Cache) and the combined page table Cache (Unified TLB) Cache the corresponding monitoring data. The first-level Cache includes a first-level Instruction Cache (L1 Instruction Cache) and a first-level Data Cache (L1 Data Cache) in fig. 2.
In an embodiment of the present invention, before transmitting the direct monitoring data into a memory space created in advance based on a data transmission device driver module, the method further includes: and acquiring a monitoring command sent by a user process based on a preset cross-domain communication protocol, and checking the validity of the monitoring command according to the current state of the monitoring system by using a monitoring system state monitor of the kernel space. The cross-domain communication protocol is obtained by dividing the component control module and the data acquisition module into independent sub-monitoring function modules, packaging the sub-monitoring function modules into functions, writing the functions into the kernel space, and mapping the sub-monitoring function modules into monitoring commands and packaging the monitoring commands. In an actual implementation process, before acquiring a monitoring command sent by a user process based on a preset cross-domain communication protocol, the method further includes: dividing the monitoring function of the monitoring system in the debugging processor into a plurality of independent sub-monitoring function modules, and packaging the sub-monitoring function modules into independent functions in the kernel space to realize the monitoring function; packaging the monitoring function into a monitoring command so that the user process of the user space can call the function of the kernel space through the monitoring command to realize the corresponding monitoring function; using a finite state machine in the kernel space, and checking whether a monitoring command corresponding to the user process is legal or not according to the current state of the monitoring system; and realizing cross-domain communication between the user space and the kernel space based on a Netlink mechanism of a Linux operating system, and encapsulating a monitoring command into a cross-domain communication protocol.
As shown in fig. 3, which is a monitoring system for a processor. CPU1 is the processor to be tested, and CPU2 is the processor as the debugger. In the CPU2, a control module including a debugging register control module, a cross trigger interface component control module and a PMU component control module, a monitoring system state machine, a monitoring system state monitor and a data transmission device drive are positioned in a kernel space, and a communication interface and a monitoring data analysis module realized based on the Netlink are positioned in a user space. The control module directly reads and writes a debugging register, a CTI component and a PMU component of the processor to be tested CPU1 through a memory mapping interface of the ARM processor so as to achieve the purposes of controlling the processor to be tested to enter and exit a debugging state, executing instructions, acquiring data and transmitting data in the debugging state. The user process uses the communication protocol in the invention content to send a command to the monitoring system, and the monitoring system state monitor of the kernel space directly checks the validity of the command according to the current state of the monitoring system. The direct monitoring data of the processor to be tested is transmitted into the controlled register through the debugging register and then transmitted into the memory space created by the equipment drive on line; the indirect monitoring data of the processor to be tested is written into the chip memory space on line through the PMU. The analysis module located in the user space can read the data in the memory space off line and analyze the data. The loosely-coupled architecture greatly improves the reliability, stability and expandability of the monitoring system.
In the embodiment of the invention, the mode of directly accessing the first-level cache and the joint TLB is provided based on the ARM processor. Taking the AArch64 Instruction set architecture as an example, the S3_, c15, c2, u 0 register provides a method for accessing the tag field of the primary Data Cache (L1 Data Cache), the S3_, c15, c4, u 0 register provides a method for accessing the Data field of the primary Data Cache, the S3_, c15, c2, u1 and S3_, c15, u 4, u1 registers provide a method for accessing the tag field and the Data field of the primary Instruction Cache (L1 Instruction Cache), respectively, and the S3_, c15, u 4, u2 register provides a method for accessing the joint page table Cache. After the access to the data has been made, data can be transferred to general purpose registers through the four registers S3_3_c15_c0_, S3_3_c15_c0_, 3. In debug mode, the contents of the level one cache and the Unified TLB can be monitored directly through this interface.
In the embodiment of the present invention, a set of PMU (Performance Monitor Unit) events provided by the ARM processor is used to reflect the processor state by counting the PMU events corresponding to the processor micro-architecture. Table 1 lists micro-architectural events (i.e., PMU events) for a processor, the combination of which can reflect key attributes of the processor micro-architectural components.
Table 1 ARM supplied PMU events
In a specific implementation process, an on-chip cross-core debugging model (as shown in fig. 3) constructed based on an ARM processor debugging register, a performance monitoring module and a debugging architecture CoreSight realizes monitoring of components such as a cache, a page table cache and a branch predictor of the ARM processor. In the monitoring process based on the monitoring system, a hardware breakpoint function provided by a debugging register can be used, when an instruction executed on a processor reaches a specified address, the processor is suspended and enters a debugging state, and the system structure and the micro-architecture state of the processor are kept; the debugging register is used for providing a single-step instruction execution mechanism, a data transmission mechanism and a memory read-write mechanism under the debugging state of the processor, so that most basic functions of cross-core debugging are realized; the CTI component of the debugging architecture CoreSigt is used for realizing the functions of exiting the debugging state and recovering normal execution of the processor; directly monitoring all data of a first-level instruction cache, a first-level data cache and a combined page table cache of the processor by using a cache data acquisition interface provided by the ARM processor; using a system structure and a micro-architecture event counting function provided by the performance monitoring module, key attributes of a part of micro-architecture components which cannot directly acquire Data are indirectly monitored, and specifically include part of attributes of a Branch Predictor (BP), an Instruction page table Cache (Instruction TLB), a Data page table Cache (Data TLB) and a second level Cache (L2 Cache) in fig. 2.
It should be noted that, in the implementation process, part of the operations can be implemented only when the processor is in a specific state, for example, the cache of the processor can be refreshed only in a normal state. Improper operation in some cases may cause the processor to crash, such as the target core not being restarted for a long time after the breakpoint is opened.
Furthermore, in order to realize reliable and stable monitoring, the invention ensures the stability and reliability of the monitoring process by using a mode of loosely coupling the monitoring function and the monitoring command and realizing cross-domain, and ensures the automation and the high efficiency of the monitoring process by using a cross-domain communication protocol. The specific implementation process comprises the following steps: dividing the monitoring function of the monitoring system into a plurality of submodules, and packaging the submodules into independent functions in kernel space to realize the monitoring function; in addition, the monitoring function is encapsulated into a monitoring command, and the process of the user space is allowed to call the monitoring function realized by the kernel space through the monitoring command; meanwhile, a limited state machine (namely a monitoring system state machine) is used in the kernel space, so that whether the monitoring command of the user is legal or not is checked according to the current state of the monitoring system, and system errors caused by wrong monitoring steps of the user are prevented. And finally, realizing cross-domain communication between a user space and a kernel space based on a Netlink mechanism of a Linux operating system, and encapsulating a monitoring command into a cross-domain communication protocol, thereby realizing efficient automatic monitoring.
It should be noted that, the present invention realizes the isolation of the command and the monitoring function between the user space and the kernel space by dividing the monitoring function into independent modules and encapsulating the modules into functions of the kernel space, and mapping the function modules into the monitoring command. The invention realizes a limited state machine (namely a monitoring system state machine) and a monitor (namely a monitoring system state monitor) in an inner core space, wherein the state machine has five states which are respectively that the monitoring system is not started (V), the monitor is not started (I), the monitor is started and a target core is in a normal mode (L), the target core is in a debugging mode (D) and the monitoring system completes one round of monitoring and the monitor is suspended (E). The monitor obtains the state of the processor to be tested in real time by circularly checking the STATUS field of the EDSCR register of the processor to be tested. Specifically, as shown in fig. 4, by inserting a kernel driver, the state can be transferred from V to I; by activating the listener, the state can be transferred from I to L; the monitor acquires whether the processor to be tested is in a debugging state by continuously accessing the STATUS field of the EDSCR register on the processor to be tested, if so, the state of the monitoring system is updated to D, and otherwise, the state of the monitoring system returns to L. When the instruction address of the processor entering the debugging state is located at the tail node of the breakpoint address linked list provided by the user, or the user actively suspends the listener, the monitoring system enters the E state. Only the E state can delete the kernel driver and return to the V state. The E state needs to return to the I state by resetting the monitoring system settings to continue to start monitoring of the next set of test cases.
Table 2 lists command operation codes corresponding to each monitoring function of the monitoring system and states of the monitoring system allowed to be executed, and includes 16 functions in total. The lower 8 bits of the command opcode are used to describe the sequence number of the function and the upper 8 bits are used to describe the state in which the function is allowed to execute. It should be noted that this coding still preserves a large amount of scalability space.
TABLE 2 instruction encoding and Enable execution status of monitoring functions of a monitoring system
The user and the monitoring system realize cross-domain real-time communication through a Netlink mechanism of a Linux operating system. Like network communication, the present invention designs a cross-domain communication protocol as follows. Where field len indicates the packet length. get/reply represents the data sending direction, get represents the data sent from the test case to the monitoring system, and reply represents the data sent from the monitoring system to the test case. checksum is used to check the correctness of the packet. The op represents the contents of the test case request. The ret _ state is defined only in the reply packet, and indicates the transmission, examination, and execution results of the data packet. The state is defined only in the reply packet, and represents the state of the monitoring system when the packet is sent. addr _ n represents the data length when the data packet is accompanied by list data. < list > is a pointer to an array list. The remaining positions are left, all filled with 0.
In the specific application process of the invention, when the instruction granularity of the user process facing the ARM processor is inserted, in order to further improve the automation degree and the expandability of monitoring, the invention designs a method for supporting a user to execute a specific code in a specific context environment of the processor, thereby realizing the insertion of the instruction granularity. The key to this approach is to support any number of hardware breakpoints. For example: it is assumed that the user constructs a linked list of addresses of arbitrary length in advance, and the monitoring system is required to bring the processor to be tested into a debugging state at these addresses, so as to automatically execute the monitoring function or other operations predefined by the user. Because the number of the hardware breakpoint registers of the ARM processor is limited, in order to support any number of hardware breakpoints, the invention provides a gradual debugging mode, namely, the breakpoints are triggered at each instruction, and the instrumentation codes of the user are executed when the instruction address is matched with the address linked list of the user. In order to ensure the continuity of the breakpoint, when entering a debugging state, the next instruction address can be dynamically calculated by acquiring the content of the general register of the processor to be tested, and the value of the hardware breakpoint register is updated to the address. Specifically, as shown in fig. 5, the basic flow of the user process instruction granularity instrumentation is described. After receiving the breakpoint address linked list provided by the user, the monitoring system first writes the entry address (e.g., the first instruction address of the main function) of the program into the hardware breakpoint register DBGBVR. Then, after the processor to be tested enters a debugging state due to the hardware breakpoint each time, the monitoring system clears the hardware breakpoint and dynamically calculates the next instruction address, and writes the address into the hardware breakpoint register in advance. If the current breakpoint address is in the breakpoint address linked list of the user, the instrumentation code of the user is called, and if the instrumentation code contains a monitoring command, the kernel also checks the validity of the command in a state machine mode.
In the implementation process of the invention, the method also comprises the steps of acquiring indirect monitoring data of the processor to be tested based on the data acquisition module; the indirect monitoring data comprises monitoring data corresponding to a branch predictor, an instruction page table cache, a data page table cache and a secondary cache in the processor to be tested; writing the indirect monitoring data into the memory space on line through a preset performance monitoring component, and reading the indirect monitoring data in the memory space by using a monitoring data analysis module for analysis; the performance monitoring component comprises a micro-architecture event corresponding to the processor to be tested, and the combination of the micro-architecture events can indirectly reflect the key attributes of the micro-architecture component in the processor to be tested. Further, the method also comprises the following steps: acquiring the content of the corresponding micro-architecture component based on the data acquisition module; wherein the content of the microarchitectural component comprises content of an event counter in the performance monitoring component and content of a clock counter in the performance monitoring component. Correspondingly, the indirect monitoring data in the memory space is read by using the monitoring data analysis module for analysis, and the corresponding implementation process includes: and after the content of the micro-architecture component is obtained by the monitoring data analysis module, respectively analyzing the monitoring data corresponding to the branch predictor, the instruction page table cache, the data page table cache and the secondary cache through corresponding target algorithms so as to recover the key attribute of the micro-architecture component.
In the embodiment of the present invention, the micro-architecture event provided by the usability monitoring module is used to recover the corresponding partial content through the corresponding target algorithm (i.e. indirect monitoring algorithm). For example, the indirect monitoring algorithm includes: the partial contents of the Branch Target Buffer (BTB) and the RETURN address Stack (RSB) in BP are monitored based on a combination of events BR _ MIS _ PRED, BR _ PRED, and BR _ RETURN _ RETIRED. The partial content of the secondary CACHE is monitored based on a combination of events such as L2D _ CACHE and BUS _ ACCESS. The partial contents of the instruction page table cache and the data page table cache are monitored based on a combination of events such as L1I _ TLB _ REFILL, L1D _ TLB _ REFILL, and INST _ corrected. The indirect monitoring algorithms include algorithm 1, algorithm 2, algorithm 3, algorithm 4, and algorithm 5.
It should be noted that the present invention designs an algorithm for monitoring the branch target buffer and the return address stack of the components of the ARM processor branch predictor based on the PMU events listed in table 1. Algorithm 1 uses 6 PMU events to implement a method for determining whether the content of the branch target buffer entry indexed by an address is a specific address. PMU sampling is carried out before and after an indirect branch instruction I of an execution address v, if an immediate branch or return branch count changes, measurement noise exists, otherwise, branch instruction prediction execution and error prediction count are observed, and if the branch instruction is predicted to be executed but is predicted incorrectly, the predicted target address of a branch target buffer table entry indexed by v is not an actual target address tv of I, so that the aim of indirectly monitoring the branch target buffer table entry is fulfilled. Similarly, algorithm 2 indirectly infers the return address stack top address by executing a return instruction and observing the predicted case count of the branch instruction. This indirect monitoring approach destroys the program context, so the test case of the processor to be tested is rolled back after each call of the indirect monitoring algorithm, thereby keeping the same processor environment before each indirect monitoring. Specifically, algorithm 1: a branch target buffer interval connection monitoring algorithm: inputting: an indirect branch jump instruction I; the virtual address v of the instruction; a known jump address tv; PMU events INST _ RETIRED, BR _ MIS _ PRED, BR _ IMMED _ RETIRED, BR _ RETURN _ RETIRED, and INST _ SPEC List ← [ e _ SPEC8,e16,e18,e13,e14,e27](ii) a And (3) outputting: and a judgment J of whether the entry of the branch target buffer of the v index is tv or not. Beginning: SE ← [0,0,0,0,0,0];For i∈[0,len(e)-1]Do;SE[i]←SAMPLE(e[i]);End For;Execute(I);For i∈[0,len(e)-1]Do;SE[i]←SAMPLE(e[i])-SE[i];End For;If SE[3]>0 or SE[4]>0or SE[0]=0or SE[5]=0or SE[0]<SE[6];J←0;Else;J←SE[1]=0 and SE[2]=1; end If; retur J; and (6) ending. Algorithm2: return address stack indirect monitoring algorithm: inputting: a return instruction I; the virtual address v of the instruction; a known jump address tv; PMU events INST _ RETIRED, BR _ MIS _ PRED, BR _ RETURN _ RETIRED, and INST _ SPEC List e ← [ e8,e16,e18,e14,e27](ii) a And (3) outputting: a determination J is returned whether the top of the address stack is tv. Beginning: SE ← [0,0,0,0,0];For i∈[0,len(e)-1]Do;SE[i]←SAMPLE(e[i]);End For;Execute(I);For i∈[0,len(e)-1]Do:SE[i]←SAMPLE(e[i])-SE[i];End For;If SE[3]=0 or SE[4]=0 or SE[0]=0 or SE[0]<SE[5];J←0;Else;J←SE[1]=0and SE[2]=1; end If; retur J; and (6) ending.
In the indirect monitoring process, the instruction page table cache and the data page table cache use PMU events INST _ red, L1I _ TLB _ refil, and L1D _ TLB _ refil, and also use a clock counter of the PMU. The algorithm 3 judges whether the address corresponding to the instruction has a related entry in the instruction page table cache by executing an arbitrary non-access and non-jump instruction with a known address v and observing whether the instruction page table cache is missing and whether the clock cycle for executing the instruction is greater than the clock cycle threshold value when the page table cache is missing. Similarly, algorithm 4 determines whether there is a memory access entry associated with address v in the data page table cache by executing a memory access instruction whose target address is v and observing whether a page table cache miss occurs. Specifically, the algorithm 3 is: instruction page table cache indirection monitoring algorithm: inputting: a non-access and non-jump instruction I with a virtual address v; a known page table cache miss clock cycle threshold tm; PMU event INST _ RETIRED and L1I _ TLB _ REFLL List e ← [ e8,e2](ii) a A PMU clock counter c; and (3) outputting: and J, judging whether an item related to the address v exists in the instruction page table cache. Beginning: SE ← [0,0,0];For i∈[0,len(e)-1]Do;SE[i]←SAMPLE(e[i]);End For;SE[2]←SAMPLE(c);Execute(I);For i∈[0,len(e)-1]Do;SE[i]←SAMPLE(e[i])-SE[i];End For;SE[2]←SAMPLE(c)-SE[2];If SE[0]=0;J←0;ELSE;J←SE[2]≤tm and SE[1]=0; end If; retur J; and (6) ending. The algorithm 4 is: data page table cache indirect monitoring algorithm: inputting: a memory access instruction I with a memory access address v; a known page table cache miss clock cycle threshold tm; PMU events INST _ RETIRED and L1D _ TLB _ REFLL List e ← [ e8,e5](ii) a A PMU clock counter c; and (3) outputting: and J, judging whether an item related to the address v exists in the cache of the data page table. Beginning: SE ← [0,0,0];For i∈[0,len(e)-1]Do;SE[i]←SAMPLE(e[i]);End For;SE[2]←SAMPLE(c);Execute(I);For i∈[0,len(e)-1]Do;SE[i]←SAMPLE(e[i])-SE[i];End For;SE[2]←SAMPLE(c)-SE[2];If SE[0]=0;J←0;ELSE;J←SE[2]≤tm and SE[1]=0; end If; return J; and (6) ending.
In addition, in the indirect monitoring process, PMU events INST _ RETIRED, MEM _ ACCESS, L2D _ CACHE, and BUS _ ACCESS are used for monitoring the secondary CACHE. Algorithm 5 first computes its set (set) number in the level one cache for virtual address v, and then empties the level one cache set using a DC ISW instruction. And executing an ACCESS instruction I, wherein if the L2D _ CACHE event is added with 1, the second-level CACHE is accessed, and if the BUS _ ACCESS event is not added with 1 at the same time, the third-level CACHE (L3 CACHE) is not accessed, so that the effective data corresponding to the virtual address v exists in the second-level CACHE. Specifically, the algorithm 5 is: secondary cache indirect monitoring: inputting: a memory access instruction I; the virtual address v of the target memory; PMU events INST _ RETIRED, MEM _ ACCESS, L2D _ CACHE and BUS _ ACCESS tables e ← [ e8,e19,e22,e25]And (3) outputting: and D, monitoring whether effective data corresponding to the virtual address v exists in the secondary cache or not. Beginning: SE ← [0,0,0,0];set←GETSET(v):Execute(DC ISW(set)):For i∈[0,len(e)-1]Do;SE[i]←SAMPLE(e[i]);End For;Execute(I);SE←[0,0,0,0];set←GETSET(v);Execute(DC ISW(set));For i∈[0,len(e)-1]Do;SE[i]←SAMPLE(e[i]);End For;Execute(I);For i∈[0,len(e)-1]Do;SE[i]←SAMPLE(e[i])-SE[i];End For;If SE[0]=0 or SE[1]=0 or SE[1]<SE[2]orSE[0]<SE[1];J←0;ELSE;J←SE[2]=1 and SE[3]=0; end If; return J; and (6) ending.
The processor micro-architecture monitoring method of the embodiment of the invention controls the debugging state of a processor to be tested through a component control module positioned in the kernel space of the debugging processor, acquires direct monitoring data of the processor to be tested based on a data acquisition module, transmits the direct monitoring data into a memory space created by a data transmission equipment driving module which is positioned in the kernel space of the debugging processor in advance, and reads the direct monitoring data in the memory space for analysis by utilizing a monitoring data analysis module positioned in the user space of the debugging processor; the monitoring coverage range is wide, and the reliability, stability and expandability of the micro-architecture monitoring of the processor can be effectively improved, so that the system structure and the micro-architecture monitoring effect of the processor are improved.
Corresponding to the processor micro-architecture monitoring method, the invention also provides a processor micro-architecture monitoring device. Since the embodiment of the apparatus is similar to the above method embodiment, it is relatively simple to describe, and please refer to the description of the above method embodiment, and the following embodiments of the processor micro-architecture monitoring apparatus are only exemplary. Fig. 6 is a schematic structural diagram of a processor micro-architecture monitoring device according to an embodiment of the present invention.
The invention relates to a processor micro-architecture monitoring device, which comprises the following parts:
the control unit 601 is configured to control a debugging state of a processor to be tested based on a preset component control module, and acquire direct monitoring data of the processor to be tested based on a data acquisition module; the component control module and the data acquisition module are positioned in a kernel space of a debugging processor;
a first monitoring data analysis unit 602, configured to transmit the direct monitoring data into a memory space created in advance based on a data transmission device driver module, and read the direct monitoring data in the memory space by using a monitoring data analysis module for analysis; the data transmission device driving module is located in a kernel space of the debugging processor, and the monitoring data analysis module is located in a user space of the debugging processor.
Further, the component control module includes: the system comprises a debugging register control module, a cross trigger interface component control module and a performance monitoring component control module;
the control unit is specifically configured to: based on the debugging register control module, reading and writing the content of a debugging register of the processor to be tested through an interface of the processor to be tested; based on the cross trigger interface component control module, reading and writing the content of the cross trigger interface component control module through the interface; based on the performance monitoring component control module, reading and writing the content of the performance monitoring component through the interface; the processor to be tested is controlled to enter and exit a debugging state, the processor to be tested is controlled to execute a corresponding monitoring instruction in the debugging state, and the data acquisition module is controlled to acquire monitoring data and the data transmission module to transmit the monitoring data.
Further, the processor microarchitecture monitoring device further includes:
the second monitoring data analysis unit is used for acquiring indirect monitoring data of the processor to be tested based on the data acquisition module; the indirect monitoring data comprises monitoring data corresponding to a branch predictor, an instruction page table cache, a data page table cache and a secondary cache in the processor to be tested;
writing the indirect monitoring data into the memory space on line through a preset performance monitoring component, and reading the indirect monitoring data in the memory space by using a monitoring data analysis module for analysis; the performance monitoring component comprises a micro-architecture event corresponding to the processor to be tested, and the combination of the micro-architecture events can indirectly reflect the key attributes of the micro-architecture components in the processor to be tested.
Further, before the direct monitoring data is transmitted into the memory space created in advance based on the data transmission device driver module, the method further includes: the monitoring command analysis unit is used for acquiring a monitoring command sent by a user process based on a preset cross-domain communication protocol and checking the validity of the monitoring command according to the current state of a monitoring system by using a monitoring system state monitor of the kernel space;
the cross-domain communication protocol is obtained by dividing the component control module and the data acquisition module into independent sub-monitoring function modules, packaging the sub-monitoring function modules into functions, writing the functions into the kernel space, and mapping the sub-monitoring function modules into monitoring commands and packaging the monitoring commands.
Further, the direct monitoring data includes: the first level cache and the joint page table cache corresponding monitoring data.
Further, the processor microarchitecture monitoring device further includes: the data content acquisition unit is used for acquiring the content of the corresponding micro-architecture component based on the data acquisition module; wherein the content of the microarchitectural component comprises the content of an event counter in the performance monitoring component and the content of a clock counter in the performance monitoring component;
the second monitoring data analysis unit is specifically configured to:
and after the content of the micro-architecture component is obtained by the monitoring data analysis module, respectively analyzing the monitoring data corresponding to the branch predictor, the instruction page table cache, the data page table cache and the secondary cache through corresponding target algorithms so as to recover the key attribute of the micro-architecture component.
Further, before acquiring a monitoring command sent by a user process based on a preset cross-domain communication protocol, the processor micro-architecture monitoring device further includes:
the interaction unit is used for dividing the monitoring function of the monitoring system in the debugging processor into a plurality of independent sub-monitoring function modules and packaging the sub-monitoring function modules into independent functions in the kernel space so as to realize the monitoring function; packaging the monitoring function into a monitoring command so that the user process of the user space can call the function of the kernel space through the monitoring command to realize the corresponding monitoring function; using a finite state machine in the kernel space, and checking whether a monitoring command corresponding to the user process is legal or not according to the current state of the monitoring system; and realizing cross-domain communication between the user space and the kernel space based on a Netlink mechanism of a Linux operating system, and encapsulating the monitoring command into the cross-domain communication protocol.
The processor micro-architecture monitoring device provided by the embodiment of the invention controls the debugging state of the processor to be tested through the component control module positioned in the kernel space of the debugging processor, acquires the direct monitoring data of the processor to be tested based on the data acquisition module, transmits the direct monitoring data into the memory space created by the data transmission equipment driving module which is positioned in the kernel space of the debugging processor in advance, reads the direct monitoring data in the memory space for analysis by utilizing the monitoring data analysis module positioned in the user space of the debugging processor, has wide monitoring coverage range, and can effectively improve the reliability, stability and expandability of the processor micro-architecture monitoring, thereby improving the system structure and the micro-architecture monitoring effect of the processor.
Corresponding to the processor micro-architecture monitoring method, the invention also provides electronic equipment. Since the embodiment of the electronic device is similar to the above method embodiment, the description is simple, and please refer to the description of the above method embodiment, and the electronic device described below is only schematic. Fig. 7 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention. The electronic device may include: a processor (processor) 701, a memory (memory) 702, a communication bus 703 (i.e. the device bus), and a lookup engine 705, wherein the processor 701 and the memory 702 communicate with each other through the communication bus 703 and communicate with the outside through a communication interface 704. The processor 701 may invoke logic instructions in the memory 702 to perform a processor micro-architectural monitoring method comprising: controlling the debugging state of a processor to be tested based on a preset component control module, and acquiring direct monitoring data of the processor to be tested based on a data acquisition module; the component control module and the data acquisition module are positioned in a kernel space of the debugging processor; transmitting the direct monitoring data into a memory space which is created in advance based on a data transmission equipment driving module, and reading the direct monitoring data in the memory space by using a monitoring data analysis module for analysis; the data transmission device driving module is located in a kernel space of the debugging processor, and the monitoring data analysis module is located in a user space of the debugging processor.
Furthermore, the logic instructions in the memory 702 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a Memory chip, a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In another aspect, embodiments of the present invention further provide a computer program product, where the computer program product includes a computer program stored on a processor-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a computer, the computer is capable of executing the processor micro-architecture monitoring method provided by the above-mentioned method embodiments. The method comprises the following steps: controlling the debugging state of a processor to be tested based on a preset component control module, and acquiring direct monitoring data of the processor to be tested based on a data acquisition module; the component control module and the data acquisition module are positioned in a kernel space of the debugging processor; transmitting the direct monitoring data into a memory space which is created in advance based on a data transmission equipment driving module, and reading the direct monitoring data in the memory space by using a monitoring data analysis module for analysis; the data transmission device driving module is located in a kernel space of the debugging processor, and the monitoring data analysis module is located in a user space of the debugging processor.
In still another aspect, an embodiment of the present invention further provides a processor-readable storage medium, where a computer program is stored on the processor-readable storage medium, and the computer program is implemented to perform the processor micro-architecture monitoring method provided in the foregoing embodiments when executed by a processor. The method comprises the following steps: controlling the debugging state of a processor to be tested based on a preset component control module, and acquiring direct monitoring data of the processor to be tested based on a data acquisition module; the component control module and the data acquisition module are positioned in a kernel space of the debugging processor; transmitting the direct monitoring data into a memory space which is created in advance based on a data transmission equipment driving module, and reading the direct monitoring data in the memory space by using a monitoring data analysis module for analysis; the data transmission device driving module is located in a kernel space of the debugging processor, and the monitoring data analysis module is located in a user space of the debugging processor.
The processor-readable storage medium may be any available media or data storage device that can be accessed by a processor, including, but not limited to, magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memory (NAND FLASH), solid State Disks (SSDs)), etc.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A method for monitoring a processor micro-architecture, comprising:
controlling the debugging state of a processor to be tested based on a preset component control module, and acquiring direct monitoring data of the processor to be tested based on a data acquisition module; the component control module and the data acquisition module are positioned in a kernel space of the debugging processor;
transmitting the direct monitoring data into a memory space created in advance based on a data transmission equipment driving module, and reading the direct monitoring data in the memory space by using a monitoring data analysis module for analysis; the data transmission device driving module is located in a kernel space of the debugging processor, and the monitoring data analysis module is located in a user space of the debugging processor.
2. The processor micro-architecture monitoring method of claim 1, wherein the component control module comprises: the system comprises a debugging register control module, a cross trigger interface component control module and a performance monitoring component control module;
the debugging state of the processor to be tested is controlled based on the preset component control module, and the method specifically comprises the following steps: reading and writing the content of a debugging register of the processor to be tested through an interface of the processor to be tested based on the debugging register control module; based on the cross trigger interface component control module, reading and writing the content of the cross trigger interface component control module through the interface; based on the performance monitoring component control module, reading and writing the content of the performance monitoring component through the interface; the processor to be tested is controlled to enter and exit a debugging state, the processor to be tested is controlled to execute a corresponding monitoring instruction in the debugging state, and the data acquisition module is controlled to acquire monitoring data and the data transmission module to transmit the monitoring data.
3. The method of claim 1, further comprising: acquiring indirect monitoring data of the processor to be tested based on the data acquisition module; the indirect monitoring data comprises monitoring data corresponding to a branch predictor, an instruction page table cache, a data page table cache and a secondary cache in the processor to be tested;
writing the indirect monitoring data into the memory space on line through a preset performance monitoring component, and reading the indirect monitoring data in the memory space by using a monitoring data analysis module for analysis; the performance monitoring component comprises a micro-architecture event corresponding to the processor to be tested, and the combination of the micro-architecture events can indirectly reflect the key attributes of the micro-architecture components in the processor to be tested.
4. The method according to claim 1, further comprising, before the transmitting the direct monitoring data into the memory space created based on the data transmission device driver module in advance: acquiring a monitoring command sent by a user process based on a preset cross-domain communication protocol, and checking the validity of the monitoring command according to the current state of a monitoring system by using a monitoring system state monitor of the kernel space;
the cross-domain communication protocol is obtained by dividing the component control module and the data acquisition module into independent sub-monitoring function modules, encapsulating the sub-monitoring function modules into functions, writing the functions into the kernel space, mapping the sub-monitoring function modules into monitoring commands and encapsulating the monitoring commands.
5. The processor micro-architecture monitoring method of claim 3, further comprising: acquiring the content of the corresponding micro-architecture component based on the data acquisition module; wherein the content of the microarchitectural component comprises the content of an event counter in the performance monitoring component and the content of a clock counter in the performance monitoring component;
the reading of the indirect monitoring data in the memory space by using the monitoring data analysis module for analysis specifically includes:
after the content of the micro-architecture component is obtained based on the monitoring data analysis module, the monitoring data corresponding to the branch predictor, the instruction page table cache, the data page table cache and the secondary cache are respectively analyzed through corresponding target algorithms, so that the key attribute of the micro-architecture component is recovered.
6. The method for monitoring the micro-architecture of the processor according to claim 4, wherein before acquiring the monitoring command sent by the user process based on the predetermined cross-domain communication protocol, the method further comprises: dividing the monitoring function of the monitoring system in the debugging processor into a plurality of independent sub-monitoring function modules, and packaging the sub-monitoring function modules into independent functions in the kernel space to realize the monitoring function; packaging the monitoring function into a monitoring command so that the user process of the user space can call the function of the kernel space through the monitoring command to realize the corresponding monitoring function; using a finite state machine in the kernel space, and checking whether a monitoring command corresponding to a user process is legal or not according to the current state of the monitoring system; and realizing cross-domain communication between the user space and the kernel space based on a Netlink mechanism of a Linux operating system, and encapsulating the monitoring command into the cross-domain communication protocol.
7. The processor micro-architecture monitoring method of claim 1, wherein the direct monitoring data comprises: the first level cache and the joint page table cache corresponding monitoring data.
8. A processor micro-architectural monitoring device, comprising:
the control unit is used for controlling the debugging state of the processor to be tested based on a preset component control module and acquiring direct monitoring data of the processor to be tested based on a data acquisition module; the component control module and the data acquisition module are positioned in a kernel space of the debugging processor;
the first monitoring data analysis unit is used for transmitting the direct monitoring data into a memory space which is created in advance based on a data transmission equipment driving module, and reading the direct monitoring data in the memory space by using a monitoring data analysis module for analysis; the data transmission equipment driving module is located in a kernel space of the debugging processor, and the monitoring data analysis module is located in a user space of the debugging processor.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the processor micro-architecture monitoring method according to any of claims 1 to 7 when executing the computer program.
10. A processor-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the processor micro-architecture monitoring method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210752152.1A CN115269309A (en) | 2022-06-28 | 2022-06-28 | Processor micro-architecture monitoring method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210752152.1A CN115269309A (en) | 2022-06-28 | 2022-06-28 | Processor micro-architecture monitoring method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115269309A true CN115269309A (en) | 2022-11-01 |
Family
ID=83763233
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210752152.1A Pending CN115269309A (en) | 2022-06-28 | 2022-06-28 | Processor micro-architecture monitoring method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115269309A (en) |
-
2022
- 2022-06-28 CN CN202210752152.1A patent/CN115269309A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12085612B2 (en) | On-chip debugging device and method | |
US7533302B2 (en) | Trace and debug method and system for a processor | |
EP0849670B1 (en) | Integrated computer providing an instruction trace | |
JP4094724B2 (en) | Apparatus and method for identifying exceptions when debugging software | |
EP2686772B1 (en) | Diagnosing code using single step execution | |
US7506205B2 (en) | Debugging system and method for use with software breakpoint | |
US7555605B2 (en) | Data processing system having cache memory debugging support and method therefor | |
KR20080022181A (en) | Mechanism for storing and extracting trace information using internal memory in microcontrollers | |
US7562258B2 (en) | Generation of trace elements within a data processing apparatus | |
CN109254883B (en) | Debugging device and method for on-chip memory | |
US7594140B2 (en) | Task based debugger (transaction-event-job-trigger) | |
JP2010044747A (en) | Message logging for software application | |
US20100011250A1 (en) | Microcontroller information extraction system and method | |
JP6360665B2 (en) | Data processor device and method for handling watchpoints | |
JP2007257441A (en) | Processor and processor control method | |
CN117707969B (en) | ARMv 8-based operation system adjustment and measurement system | |
US9348723B2 (en) | Method, system, and computer program product | |
JPH11110255A (en) | Software debugging device and method | |
US10754743B2 (en) | Apparatus and method using debug status storage element | |
CN115269309A (en) | Processor micro-architecture monitoring method and device | |
US7836283B2 (en) | Data acquisition messaging using special purpose registers | |
US20230088780A1 (en) | Profiling of sampled operations processed by processing circuitry | |
Carretero et al. | Hardware/software-based diagnosis of load-store queues using expandable activity logs | |
CN100533401C (en) | Emulation and debug interfaces for testing an integrated circuit with an asynchronous microcontroller | |
US8027829B2 (en) | System and method for integrated circuit emulation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |