CN112905474A - Hardware-based advanced program dynamic control flow tracking method and device - Google Patents

Hardware-based advanced program dynamic control flow tracking method and device Download PDF

Info

Publication number
CN112905474A
CN112905474A CN202110253236.6A CN202110253236A CN112905474A CN 112905474 A CN112905474 A CN 112905474A CN 202110253236 A CN202110253236 A CN 202110253236A CN 112905474 A CN112905474 A CN 112905474A
Authority
CN
China
Prior art keywords
execution
data
information
program
tracking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110253236.6A
Other languages
Chinese (zh)
Other versions
CN112905474B (en
Inventor
左志强
吉凯
王乙飞
陶威
王林章
李宣东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202110253236.6A priority Critical patent/CN112905474B/en
Publication of CN112905474A publication Critical patent/CN112905474A/en
Application granted granted Critical
Publication of CN112905474B publication Critical patent/CN112905474B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3636Software debugging by tracing the execution of the program
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45591Monitoring or debugging support

Abstract

The invention discloses a hardware-based high-level program dynamic control flow tracking method and a hardware-based high-level program dynamic control flow tracking device. The method comprises the steps of statically analyzing a program to be tracked, generating a control flow graph, a class inheritance graph and a call graph, and executing the program to be tracked through a virtual machine to collect a bytecode instruction template, execution information and tracking data. And then decoding and matching the tracking data according to the thread, generating a corresponding execution flow diagram by contrasting the control flow diagram, finally judging whether the execution flow data is missing or not, if so, restoring the execution flow diagram as far as possible, recovering the missing part, and finally outputting the execution flow diagram. The invention realizes the control flow tracking of the high-level language program by using the hardware tracking module, and solves the problem of inconsistent hardware output tracking data and disk storage speed by executing the filling-up analysis processing of the flow data.

Description

Hardware-based advanced program dynamic control flow tracking method and device
Technical Field
The invention relates to dynamic control flow tracing of hardware-based high-level program execution.
Background
Modern CPUs are mostly equipped with hardware Trace modules such as Intel Processor Trace (PT) and ARM Embedded Trace Macrocell (ETM), which provide efficient control flow tracing capabilities. The control flow tracing technique plays an important role in a wide range of software engineering activities, including testing, debugging, performance analysis, and the like. For example, we obtain the complete control flow trace information of a certain program, so that various execution information such as method and statement coverage, path coverage and call information can be easily calculated.
The existing tracking technologies mainly include two types: software tracing and hardware tracing. Software tracing typically stakes user source code, typically relies on compiler architecture and has a high running overhead. Compared with software tracking, the hardware tracking module is utilized, so that the running expense and universality are smaller, and instrumentation modification on source codes is not needed.
So far, hardware-based tracing only applies to local programs that can run directly on hardware. This is because the processor can only process hardware instructions, which for native programs can be easily mapped back to the source code with the help of compiled debug information generated in the source code compilation. However, as high-level languages such as Java, Go, and Scala play an increasingly important role in modern computing, there is an urgent need to expand hardware modules to provide efficient control flow tracing for high-level languages.
Tracking high-level language programs using hardware presents significant challenges, primarily due to the complexity of the high-level language runtime. For example, a Java Virtual Machine (JVM) executing a Java program, which switches between interpretive execution and jit (justintime) compiled execution: the JVM starts to execute for interpretation, and when a piece of code or method becomes a hot spot code or method, it switches to JIT compiling mode and directly executes the compiled code. Meanwhile, for such high-level languages, a huge gap exists between instructions actually executed by the CPU and high-level language codes of users, which is reflected not only in structural differences between machine codes and byte codes, but also in differences between codes generated under different compiling strategies. For example, in a JVM, interpretation execution directly generates code using templates, while JIT compilation optimizes compiled code multiple times. It is even more troublesome that the runtime may insert various check codes (such as read-write barriers and bounds checking) into the code, resulting in a huge structural difference between the code written by the developer and the actual CPU execution.
Disclosure of Invention
The problems to be solved by the invention are as follows: hardware tracing is implemented for high-level language programs that execute based on interpretation.
In order to solve the problems, the invention adopts the following scheme:
the invention relates to a high-level program dynamic control flow tracking method based on hardware, which comprises the following steps:
s1: acquiring a program to be tracked;
s2: performing static analysis on the program to be tracked to generate a control flow graph, a class inheritance graph and a call relation graph;
s3: collecting a bytecode instruction template, execution information and tracking data when the virtual machine executes the program to be tracked;
s4: generating an execution flow graph according to the generated control flow graph and the collected bytecode instruction template, the tracking data and the execution information;
s5: outputting the execution flow chart;
wherein the step S3 includes the following steps:
s31: starting a virtual machine, then initializing processor hardware tracking on a started virtual machine process, enabling tracking data of a processor control flow obtained by tracking the virtual machine process by the processor hardware to be input to a specified tracking data buffer area, and transferring the tracking data in the tracking data buffer area to a disk by a tracking data transferring module;
s32: when the virtual machine process initializes the program to be traced, a bytecode instruction template is derived;
s33: when the virtual machine initializes a program to be tracked, executing information initialization is carried out on a virtual machine process, so that when the virtual machine process executes the program to generate a machine code instruction, mapping information between a byte code instruction and the machine code instruction is input to a specified execution information buffer area, and an execution information unloading module unloads the execution information in the execution information buffer area into a disk; the execution information is mapping information between byte code instructions and machine code instructions;
s34: executing the program to be traced and starting processor hardware tracing through the virtual machine process, and storing the tracing data and the execution information in the program execution process to be traced into a disk through the tracing data dump module and the execution information cache module;
the step S4 includes the following steps:
s41: extracting tracking data belonging to a program to be tracked according to the thread switching information and the timestamp information of the tracking data;
s42: determining whether the extracted trace data is interpreted execution data or real-time execution data according to the memory address information; for the interpretation execution data, finding a corresponding byte code instruction according to the byte code instruction template; decoding the real-time execution data, and then obtaining a corresponding byte code instruction according to the mapping information between the byte code instruction and the machine code instruction;
s43: splicing the byte code instructions into byte code instruction streams, finding corresponding nodes in the control flow graph by the byte code instruction streams, and recording execution flow data to generate execution flow graph nodes;
the execution stream data includes an execution order label and a timestamp of execution.
Further, according to the method for tracing a high-level program dynamic control flow based on hardware of the present invention, before outputting the execution flow graph in step S5, it is determined whether there is a situation where execution flow data is missing with respect to the execution flow graph, and if there is a situation where execution flow data is missing, the method is completed as much as possible, and includes the following steps:
s51: according to the thread switching information and the timestamp information of the tracking data, sequencing the node sub-threads in the execution flow graph according to the timestamp information of the execution flow data;
s52: judging whether the nodes sequenced by each thread are continuous on the time stamp or not, and if not, marking the nodes as missing points;
s53: judging whether the execution stream data exists in the nodes which are continuous before and after the missing point, if the execution stream data exists in the nodes which are continuous before and after the missing point, the execution stream data is supplemented into the missing point.
The invention relates to a hardware-based advanced program dynamic control flow tracking device, which comprises the following modules:
m1, used for: acquiring a program to be tracked;
m2, used for: performing static analysis on the program to be tracked to generate a control flow graph, a class inheritance graph and a call relation graph;
m3, used for: collecting a bytecode instruction template, execution information and tracking data when the virtual machine executes the program to be tracked;
m4, used for: generating an execution flow graph according to the generated control flow graph and the collected bytecode instruction template, the tracking data and the execution information;
m5, used for: outputting the execution flow chart;
wherein the module M3 includes the following modules:
m31, used for: starting a virtual machine, then initializing processor hardware tracking on a started virtual machine process, enabling tracking data of a processor control flow obtained by tracking the virtual machine process by the processor hardware to be input to a specified tracking data buffer area, and transferring the tracking data in the tracking data buffer area to a disk by a tracking data transferring module;
m32, used for: when the virtual machine process initializes the program to be traced, a bytecode instruction template is derived;
m33, used for: when the virtual machine initializes a program to be tracked, executing information initialization is carried out on a virtual machine process, so that when the virtual machine process executes the program to generate a machine code instruction, mapping information between a byte code instruction and the machine code instruction is input to a specified execution information buffer area, and an execution information unloading module unloads the execution information in the execution information buffer area into a disk; the execution information is mapping information between byte code instructions and machine code instructions;
m34, used for: executing the program to be traced and starting processor hardware tracing through the virtual machine process, and storing the tracing data and the execution information in the program execution process to be traced into a disk through the tracing data dump module and the execution information cache module;
the module M4 includes the following modules:
m41, used for: extracting tracking data belonging to a program to be tracked according to the thread switching information and the timestamp information of the tracking data;
m42, used for: determining whether the extracted trace data is interpreted execution data or real-time execution data according to the memory address information; for the interpretation execution data, finding a corresponding byte code instruction according to the byte code instruction template; decoding the real-time execution data, and then obtaining a corresponding byte code instruction according to the mapping information between the byte code instruction and the machine code instruction;
m43, used for: splicing the byte code instructions into byte code instruction streams, finding corresponding nodes in the control flow graph by the byte code instruction streams, and recording execution flow data to generate execution flow graph nodes;
the execution stream data includes an execution order label and a timestamp of execution.
Further, in the hardware-based high-level program dynamic control flow tracing apparatus according to the present invention, before the module M5 outputs the execution flow graph, the module M5 determines whether there is a situation where execution flow data is missing with respect to the execution flow graph, and if there is a situation where execution flow data is missing, the module M includes the following modules:
m51, used for: according to the thread switching information and the timestamp information of the tracking data, sequencing the node sub-threads in the execution flow graph according to the timestamp information of the execution flow data;
m52, used for: judging whether the nodes sequenced by each thread are continuous on the time stamp or not, and if not, marking the nodes as missing points;
m53, used for: judging whether the execution stream data exists in the nodes which are continuous before and after the missing point, if the execution stream data exists in the nodes which are continuous before and after the missing point, the execution stream data is supplemented into the missing point.
The invention has the following technical effects:
1. the invention collects the bytecode instruction template, the execution information and the tracking data by executing the program to be tracked through the virtual machine, and then generates the execution flow graph by combining the control flow graph, thereby realizing the hardware tracking of the high-level language program.
2. By means of missing completion, the influence of missing problems caused by insufficient export speed of a tracking data disk on an execution flow graph is reduced.
Drawings
FIG. 1 is a flow and data flow diagram of an embodiment of the present invention.
Fig. 2 is an exemplary procedure to be traced in an embodiment of the present invention.
Fig. 3 is a control flow graph obtained by static analysis of the program in fig. 2.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
The method for tracing the dynamic control flow of the high-level program based on hardware is to trace the dynamic control flow of a Java language program and realize the dynamic control flow of the Java language program through a software program running on a computer of an Intel processor. Corresponding to the Intel Processor, the hardware Trace is based on an Intel Processor Trace (Intel PT) development kit. Corresponding to the Java language program, the virtual machine is a Java virtual machine. Referring to fig. 1, the method of the present embodiment includes the steps of:
s1: acquiring a program to be tracked;
s2: performing static analysis on the program to be tracked to generate a control flow graph, a class inheritance graph and a call relation graph;
s3: collecting a bytecode instruction template, execution information and tracking data when the virtual machine executes the program to be tracked;
s4: generating an execution flow graph through dynamic and static data fusion according to the generated control flow graph and the collected bytecode instruction template, tracking data and execution information;
s5: and outputting the execution flow graph after the execution flow graph is lack of the filling.
It is noted that the input of step S4 is based on the outputs of steps S2 and S3, and there is no input-output relationship between steps S2 and S3, so the order of steps S2 and S3 may be interchanged. After the steps S2 and S3 are sequentially exchanged, that is, the virtual machine executes the program to be traced to collect the bytecode instruction template, the execution information and the trace data, and then performs the static analysis of the program to be traced, which has no influence on the present invention.
The present embodiment is directed to Java programs, and therefore, the program to be traced is required to be a Java source package composed of class files. For the source program of the java file, it needs to be converted into a class file. For the source package of jar file, decompression processing needs to be performed in advance. The above-mentioned "acquisition" in step S1 means that "the program to be tracked" is an input of the present invention. Therefore, how the java file or jar file is converted into the class file is not the discussed scope of the invention and is not described in detail.
In step S2, each class file in the java language source package consisting of the class files is analyzed one by one: for each class in the class file, collecting the inheritance relationship; analyzing a byte code instruction of each method in the class, generating a control flow graph, and collecting each method and function information called by the method; and finally obtaining the whole control flow graph according to the control flow graph of each method and function and the calling relation of the function and/or the method. A control flow graph is a graph composed of nodes and node relationships. Each node is a block consisting of a sequence of bytecode instructions. Referring to fig. 2 and 3, a control flow graph obtained by static analysis of the program in fig. 2 is shown in fig. 3. Each block in fig. 3 is a node composed of program basic blocks. Each basic block of the program is composed of byte code instructions arranged in sequence. The process of obtaining the control flow graph through static analysis on the program in step S2 is familiar to those skilled in the art, and will not be described in detail herein.
In step S3, the output information includes the bytecode instruction template, the execution information, and the trace data. Step S3 includes the following steps:
s31: starting the virtual machine, then initializing processor hardware tracking on the started virtual machine process, enabling the tracking data of the processor control flow obtained by the processor hardware tracking of the virtual machine process to be input to a specified tracking data buffer area, and transferring the tracking data in the tracking data buffer area to a disk by a tracking data transferring module.
S32: when the virtual machine process initializes the program to be traced, a bytecode instruction template is derived.
S33: when the virtual machine initializes the program to be tracked, the execution information of the virtual machine process is initialized, so that when the virtual machine process executes the program to generate a machine code instruction, the mapping information between the byte code instruction and the machine code instruction is input to a specified execution information buffer area, and the execution information in the execution information buffer area is transferred to a disk by an execution information transfer module; the execution information is mapping information between byte code instructions and machine code instructions.
S34: executing the program to be traced through the virtual machine process, starting processor hardware tracing, and storing the tracing data and the execution information in the program execution process to be traced into a disk through a tracing data dump module and an execution information cache module.
In the above steps S31 to S34, the virtual machine is a Java virtual machine in this embodiment.
In step S31, the initialization of processor hardware tracing is performed on the started virtual machine process, which is implemented by calling the perf _ event _ open system function in the Intel PT. The perf _ event _ open system function is defined as follows:
𝑖𝑛𝑡 perf_event_open(𝑠𝑡𝑟𝑢𝑐𝑡 perf_event_attr ∗ attr, 𝑝𝑖d_𝑡𝑝𝑖d,𝑖𝑛𝑡𝑐𝑝𝑢, 𝑖𝑛𝑡 group_fd, unsigned long flags)
the parameters pid and CPU are used to determine the monitored process and the CPU core. In this embodiment, pid is also a process number of the Java virtual machine process. And monitoring all threads of the main thread fork of the Java virtual machine process by setting an inherit attribute in a perf _ event _ attr parameter. By setting the perf _ event _ attr parameter, a control stream data buffer is opened up in memory. All threads of the Java virtual machine process fork and control flow information of each thread are stored in the buffer area when the Java virtual machine executes each thread of the Java program.
In addition, the problem of insufficient memory space occupation of the buffer area is solved. The invention stores the control flow data in the buffer zone to the magnetic disk through the tracking data storage module. The trace data here includes: thread switch information and control stream data. The control flow data includes call information of the function and jump information of the jump instruction.
Note that in the perf _ event _ open function, the thread switch information needs another buffer for storage. That is, the thread switch information buffer and the aforementioned control flow data buffer constitute the aforementioned trace data buffer of the present invention.
In step S32, when the Java virtual machine process is initialized, the Java virtual machine process allocates a memory space storage bytecode instruction template with a fixed size in the memory. For a certain Java virtual machine, the created bytecode instruction template is consistent. Therefore, the embodiment derives the bytecode instruction template when the Java virtual machine process is initialized. And then when the Java virtual machine loads the Java application program to be tracked, deriving the first address of the bytecode instruction template corresponding to the Java application program to be tracked, and obtaining the corresponding bytecode instruction template. In addition, in the embodiment, when the Java virtual machine process is initialized, the bytecode instruction template needs to be transferred to the Intel PT, and when the Java application to be traced is loaded, the corresponding home address needs to be transferred to the Intel PT.
In the interpretation mode of the Java virtual machine, the Java virtual machine process generates a small segment of machine code for each bytecode instruction by using a code template, and stores the small segment of machine code in a memory after the Java virtual machine process is initialized, namely the bytecode instruction template, and when executing the bytecode, the Java virtual machine process directly jumps to a corresponding machine code address to execute the machine code. Namely, the bytecode instruction template is a fast mapping table of the bytecode instruction and the machine code instruction.
When the Java virtual machine process executes a Java application program, when the number of times of calling a certain method or function reaches a certain number of times or the number of times of program circulation of a certain program segment reaches a certain number of times, the Java virtual machine process regards the segment code as a hot spot code, the Java virtual machine process directly converts the hot spot code into a machine code instruction through a compiler, and when the segment code is executed again next time, the Java virtual machine process directly jumps to the address of the machine code instruction for execution, so that the efficiency of the Java virtual machine for executing the Java program is improved. In step S33, the mapping information between the bytecode instruction and the machine code instruction refers to mapping information between the bytecode instruction and the machine code instruction derived when the Java virtual machine process directly converts the hot spot code into the machine code instruction through the compiler.
Steps S31, S32, and S33 are all processes of initialization of step S3, and step S34 is a process in which the Java virtual machine process actually executes the Java application. The step S34 enables the tracking data and execution information of the processor hardware to be derived when the Java virtual machine process executes the Java application by the aforementioned initialization configuration of the steps S31, S32, and S33. Furthermore, it is usually necessary to start the processor hardware trace of the Intel PT by calling the ioctl function in step S34. When the execution of the Java application program is finished, an ioctl function is required to be called to close the processor hardware tracing of the Intel PT.
In addition, it should be noted that, when the virtual machine process executes the program to be traced, the generated trace data is more, and the transferring speed of the trace data transferring module may not be able to keep up with the generated trace data, thereby possibly causing the data in the trace data buffer to be overwritten without being stored in the disk, and causing the data transferred by the trace data transferring module to be lost.
In addition, in this embodiment, the trace data unloading module and the execution information caching module are both implemented by processes. The tracking data buffer and the execution information buffer adopt a mode of sharing a memory.
Step S4 is a process of analyzing the data output from steps S2 and S3, and includes the steps of:
s41: extracting tracking data belonging to a program to be tracked according to the thread switching information of the tracking data and the timestamp information of the tracking data;
s42: determining whether the extracted trace data is interpreted execution data or real-time execution data according to the memory address information; for the interpretation execution data, finding a corresponding byte code instruction according to a byte code instruction template; decoding the real-time execution data, and then obtaining a corresponding byte code instruction according to mapping information between the byte code instruction and the machine code instruction;
s43: splicing the byte code instructions into byte code instruction streams, finding corresponding nodes in the control flow graph by the byte code instruction streams, and recording execution flow data to generate execution flow graph nodes.
When the Intel PT initialization configuration is performed by perf _ event _ open in step S31, it is set that control flow tracing is performed on all threads in the Java virtual machine process. The trace data output at step S3 includes processor control flow trace data for all threads in the Java virtual machine process. Therefore, in step S41, trace data belonging to the program to be traced needs to be extracted from the trace data of these threads. The specific extraction method is based on the thread switching information of the trace data and the thread entry bytecode instruction address in the thread switching information. And judging whether the comparison between the thread inlet and the initial address of the byte code instruction template belongs to the byte code instruction template address space of the program to be tracked. In step S41, it is necessary to load trace data from the disk.
In step S42, the trace data is first divided into multiple segments according to the thread switching and timestamp information, and the data in each segment is then divided into interpretive execution data and real-time execution data according to the memory address information. The control flow data in the trace data is itself a machine code instruction. If the memory address of the machine code instruction can be found in the address space of the byte code instruction template and belongs to the machine code instruction in the corresponding information of a certain byte code instruction and the machine code instruction, the control flow data is interpreted execution data, otherwise, the control flow data is real-time execution data. For analyzing the execution data, the corresponding bytecode instruction is directly extracted from the corresponding information of the bytecode instruction and the machine code instruction found by the bytecode instruction template. For real-time execution data, decoding is carried out by adopting an Intel open source PT decoding library libipt, and then a corresponding byte code instruction is obtained according to the execution information, namely mapping information between the byte code instruction and a machine code instruction.
In step S43, when the bytecode instructions are spliced into a bytecode instruction stream, the bytecode instruction stream is first divided and spliced into functions according to the function call/return instruction. When the functions are divided, due to the loss of the trace data, the bytecode instructions may have the situation of loss, and the corresponding functions are found by traversing each function in the call relation graph and matching the context of the functions or the methods. And then matching the byte code instruction stream spliced according to the function or the method with each node of the control flow graph of the function or the method. Then, the execution order is marked according to the sequence of the byte code instruction stream and compared with the sequence of each node, and the time stamp information of the corresponding byte code instruction stream is added to the node. And marking the execution sequence of each node of the control flow graph and adding timestamp information to form the execution flow graph. That is, each node of the execution flow graph includes an execution order label and a timestamp of execution as compared to the control flow graph. The execution order label and the time stamp of execution are also the execution stream data. For example, in fig. 3, the nodes B01 to B06 are 6 nodes, and when a certain execution is performed in the order of the nodes B01, B03, B04 and B06, the execution orders of the nodes B01, B03, B04 and B06 are respectively labeled as 1,2,3 and 4; while for nodes B02 and B05, which are not executing, the execution order index is set to-1.
It should be noted that, since there are cases of multi-thread execution of a Java application and cases of multiple executions of a control flow graph of a function or a method, there are multiple execution order information and execution time stamp information marked by each node in the execution flow graph, which are grouped into a set.
The execution stream data loss in step S5 is caused by the aforementioned trace data loss. For example, in fig. 3, after obtaining the byte code instruction stream matching according to the trace data during a certain execution, the obtained execution order is: node B01, node B03, node B06. Obviously, this necessarily lacks the performance of node B04. At this time, the execution flow data of the node B03 or the node B06 may be directly taken as the execution flow data of the node B04 against the execution flow data of the reference node B03 or the node B06 of the node B04. Considering that each node in the execution flow graph may contain multiple execution flow data, the following method is adopted for this embodiment:
s51: according to the thread switching information and the timestamp information of the tracking data, sequencing the node sub-threads in the execution flow graph according to the timestamp information of the execution flow data;
s52: judging whether the nodes sequenced by each thread are continuous on the time stamp or not, and if not, marking the nodes as missing points;
s53: judging whether the execution stream data exists in the nodes which are continuous before and after the missing point, if the execution stream data exists in the nodes which are continuous before and after the missing point, the execution stream data is supplemented into the missing point.
It should be noted that, in the above-mentioned process of performing stream data missing compensation, there may be a case where the trace data missing is too much to be compensated. Therefore, the above process can only be completed by missing as much as possible, but cannot be completed.
In addition, it should be noted that, if the speed of the trace data unloading module unloading data is fast enough, the trace data buffer is large enough, and at this time, the trace data will not be lost, and at this time, the execution flow graph can be directly output without the aforementioned trace data missing completion process.

Claims (4)

1. A high-level program dynamic control flow tracking method based on hardware is characterized by comprising the following steps:
s1: acquiring a program to be tracked;
s2: performing static analysis on the program to be tracked to generate a control flow graph, a class inheritance graph and a call relation graph;
s3: collecting a bytecode instruction template, execution information and tracking data when the virtual machine executes the program to be tracked;
s4: generating an execution flow graph according to the generated control flow graph and the collected bytecode instruction template, the tracking data and the execution information;
s5: outputting the execution flow chart;
wherein the step S3 includes the following steps:
s31: starting a virtual machine, then initializing processor hardware tracking on a started virtual machine process, enabling tracking data of a processor control flow obtained by a processor hardware module tracking the virtual machine process to be input to a specified tracking data buffer area, and transferring the tracking data in the tracking data buffer area to a disk by a tracking data transferring module;
s32: when the virtual machine process initializes the program to be traced, a bytecode instruction template is derived;
s33: when the virtual machine initializes a program to be tracked, executing information initialization is carried out on a virtual machine process, so that when the virtual machine process executes the program to generate a machine code instruction, mapping information between a byte code instruction and the machine code instruction is input to a specified execution information buffer area, and an execution information unloading module unloads the execution information in the execution information buffer area into a disk; the execution information is mapping information between byte code instructions and machine code instructions;
s34: executing the program to be traced and starting processor hardware tracing through the virtual machine process, and storing the tracing data and the execution information in the program execution process to be traced into a disk through the tracing data dump module and the execution information cache module;
the step S4 includes the following steps:
s41: extracting tracking data belonging to a program to be tracked according to the thread switching information and the timestamp information of the tracking data;
s42: determining whether the extracted trace data is interpreted execution data or real-time execution data according to the memory address information; for the interpretation execution data, finding a corresponding byte code instruction according to the byte code instruction template; decoding the real-time execution data, and then obtaining a corresponding byte code instruction according to the mapping information between the byte code instruction and the machine code instruction;
s43: splicing the byte code instructions into byte code instruction streams, finding corresponding nodes in the control flow graph by the byte code instruction streams, and recording execution flow data to generate execution flow graph nodes;
the execution stream data includes an execution order label and a timestamp of execution.
2. The method for dynamically controlling flow tracing of a hardware-based high-level program according to claim 1, wherein in step S5, before outputting the execution flow graph, it is determined whether there is a situation where execution flow data is missing, and if there is a situation where execution flow data is missing, the situation is filled as much as possible, comprising the following steps:
s51: according to the thread switching information and the timestamp information of the tracking data, sequencing the node sub-threads in the execution flow graph according to the timestamp information of the execution flow data;
s52: judging whether the nodes sequenced by each thread are continuous on the time stamp or not, and if not, marking the nodes as missing points;
s53: judging whether the execution stream data exists in the nodes which are continuous before and after the missing point, if the execution stream data exists in the nodes which are continuous before and after the missing point, the execution stream data is supplemented into the missing point.
3. A hardware-based advanced program dynamic control flow tracking device is characterized by comprising the following modules:
m1, used for: acquiring a program to be tracked;
m2, used for: performing static analysis on the program to be tracked to generate a control flow graph, a class inheritance graph and a call relation graph;
m3, used for: collecting a bytecode instruction template, execution information and tracking data when the virtual machine executes the program to be tracked;
m4, used for: generating an execution flow graph according to the generated control flow graph and the collected bytecode instruction template, the tracking data and the execution information;
m5, used for: outputting the execution flow chart;
wherein the module M3 includes the following modules:
m31, used for: starting a virtual machine, then initializing processor hardware tracking on a started virtual machine process, enabling tracking data of a processor control flow obtained by tracking the virtual machine process by the processor hardware to be input to a specified tracking data buffer area, and transferring the tracking data in the tracking data buffer area to a disk by a tracking data transferring module;
m32, used for: when the virtual machine process initializes the program to be traced, a bytecode instruction template is derived;
m33, used for: when the virtual machine initializes a program to be tracked, executing information initialization is carried out on a virtual machine process, so that when the virtual machine process executes the program to generate a machine code instruction, mapping information between a byte code instruction and the machine code instruction is input to a specified execution information buffer area, and an execution information unloading module unloads the execution information in the execution information buffer area into a disk; the execution information is mapping information between byte code instructions and machine code instructions;
m34, used for: executing the program to be traced and starting processor hardware tracing through the virtual machine process, and storing the tracing data and the execution information in the program execution process to be traced into a disk through the tracing data dump module and the execution information cache module;
the module M4 includes the following modules:
m41, used for: extracting tracking data belonging to a program to be tracked according to the thread switching information and the timestamp information of the tracking data;
m42, used for: determining whether the extracted trace data is interpreted execution data or real-time execution data according to the memory address information; for the interpretation execution data, finding a corresponding byte code instruction according to the byte code instruction template; decoding the real-time execution data, and then obtaining a corresponding byte code instruction according to the mapping information between the byte code instruction and the machine code instruction;
m43, used for: splicing the byte code instructions into byte code instruction streams, finding corresponding nodes in the control flow graph by the byte code instruction streams, and recording execution flow data to generate execution flow graph nodes;
the execution stream data includes an execution order label and a timestamp of execution.
4. The apparatus for dynamically controlling flow of a hardware-based advanced program as claimed in claim 1, wherein said module M5, before outputting said execution flow diagram, judges whether there is a situation of execution flow data missing in said execution flow diagram, and if there is a situation of execution flow data missing, it is filled as much as possible, comprising the following modules:
m51, used for: according to the thread switching information and the timestamp information of the tracking data, sequencing the node sub-threads in the execution flow graph according to the timestamp information of the execution flow data;
m52, used for: judging whether the nodes sequenced by each thread are continuous on the time stamp or not, and if not, marking the nodes as missing points;
m53, used for: judging whether the execution stream data exists in the nodes which are continuous before and after the missing point, if the execution stream data exists in the nodes which are continuous before and after the missing point, the execution stream data is supplemented into the missing point.
CN202110253236.6A 2021-03-09 2021-03-09 Hardware-based advanced program dynamic control flow tracking method and device Active CN112905474B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110253236.6A CN112905474B (en) 2021-03-09 2021-03-09 Hardware-based advanced program dynamic control flow tracking method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110253236.6A CN112905474B (en) 2021-03-09 2021-03-09 Hardware-based advanced program dynamic control flow tracking method and device

Publications (2)

Publication Number Publication Date
CN112905474A true CN112905474A (en) 2021-06-04
CN112905474B CN112905474B (en) 2022-04-22

Family

ID=76108052

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110253236.6A Active CN112905474B (en) 2021-03-09 2021-03-09 Hardware-based advanced program dynamic control flow tracking method and device

Country Status (1)

Country Link
CN (1) CN112905474B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117591436A (en) * 2024-01-18 2024-02-23 南京研利科技有限公司 Observability data acquisition method and device for Go source codes

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1614555A (en) * 2003-11-06 2005-05-11 国际商业机器公司 Apparatus and method for autonomic hardware assisted thread stack tracking
CN103365702A (en) * 2013-07-11 2013-10-23 中国科学院合肥物质科学研究院 System and method for tracking process of lightweight virtual machine under IaaS cloud environment
CN108459965A (en) * 2018-03-06 2018-08-28 南京大学 A kind of traceable generation method of software of combination user feedback and code dependence
CN110858410A (en) * 2018-08-06 2020-03-03 英特尔公司 Programmable ray tracing with hardware acceleration on a graphics processor

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1614555A (en) * 2003-11-06 2005-05-11 国际商业机器公司 Apparatus and method for autonomic hardware assisted thread stack tracking
CN103365702A (en) * 2013-07-11 2013-10-23 中国科学院合肥物质科学研究院 System and method for tracking process of lightweight virtual machine under IaaS cloud environment
CN108459965A (en) * 2018-03-06 2018-08-28 南京大学 A kind of traceable generation method of software of combination user feedback and code dependence
CN110858410A (en) * 2018-08-06 2020-03-03 英特尔公司 Programmable ray tracing with hardware acceleration on a graphics processor

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
谈诚: "云环境下虚拟机内恶意行为检测与起源追踪技术研究", 《中国优秀博硕士学位论文全文数据库(博士) 信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117591436A (en) * 2024-01-18 2024-02-23 南京研利科技有限公司 Observability data acquisition method and device for Go source codes

Also Published As

Publication number Publication date
CN112905474B (en) 2022-04-22

Similar Documents

Publication Publication Date Title
US8938729B2 (en) Two pass automated application instrumentation
US20170255545A1 (en) Methods and systems of function-specific tracing
RU2668973C2 (en) Debugging native code by transitioning from execution in native mode to execution in interpreted mode
US6662362B1 (en) Method and system for improving performance of applications that employ a cross-language interface
US9632909B2 (en) Transforming user script code for debugging
US20090172664A1 (en) Adding a profiling agent to a virtual machine to permit performance and memory consumption analysis within unit tests
US20100115494A1 (en) System for dynamic program profiling
Bebenita et al. Trace-based compilation in execution environments without interpreters
US20120131559A1 (en) Automatic Program Partition For Targeted Replay
US20130125096A1 (en) Systems and Methods for Dynamic Collection of Probe Call Sites
EP3895022B1 (en) Improving emulation and tracing performance using compiler-generated emulation optimization metadata
EP3652648B1 (en) Replaying time-travel traces relying on processor undefined behavior
CN112905474B (en) Hardware-based advanced program dynamic control flow tracking method and device
US7684971B1 (en) Method and system for improving simulation performance
US7624381B1 (en) Portable detection of start and completion of object construction
Williams et al. Dyninst and mrnet: Foundational infrastructure for parallel tools
KR100597414B1 (en) Data processing device and register allocation method using data processing device
CN113778838A (en) Binary program dynamic taint analysis method and device
WO2014131319A1 (en) Methods and apparatuses for identifying and tracking process of operating system, and for obtaining information
Cesati et al. A memory access detection methodology for accurate workload characterization
CN113849397A (en) Execution engine, virtual machine, related apparatus and related methods
CN113535545A (en) Binary pile inserting method for program dynamic analysis
Sartor et al. Androprof: A profiling tool for the android platform
RU2390821C1 (en) Dynamic instrumentation technique
US11106522B1 (en) Process memory resurrection: running code in-process after death

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant