CN115617687A - Program instrumentation method, apparatus, device and storage medium - Google Patents

Program instrumentation method, apparatus, device and storage medium Download PDF

Info

Publication number
CN115617687A
CN115617687A CN202211350276.3A CN202211350276A CN115617687A CN 115617687 A CN115617687 A CN 115617687A CN 202211350276 A CN202211350276 A CN 202211350276A CN 115617687 A CN115617687 A CN 115617687A
Authority
CN
China
Prior art keywords
program
function
analyzed
instruction
redundant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211350276.3A
Other languages
Chinese (zh)
Other versions
CN115617687B (en
Inventor
张超
殷婷婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202211350276.3A priority Critical patent/CN115617687B/en
Publication of CN115617687A publication Critical patent/CN115617687A/en
Application granted granted Critical
Publication of CN115617687B publication Critical patent/CN115617687B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application provides a program instrumentation method, a program instrumentation device, program instrumentation equipment and a storage medium, and relates to the technical field of computers. The method comprises the following steps: acquiring a program to be analyzed, wherein the program to be analyzed is a binary program; identifying a redundant instruction in a program to be analyzed based on an analysis requirement, and determining a stub function to be inserted, wherein the redundant instruction is a code unrelated to the original semantic meaning and/or the analysis requirement of the program to be analyzed, and the position of the redundant instruction is a stub insertion point of the stub function; assigning an address offset in the function call instruction according to the offset between the position of the stub function and the stub point; and replacing the redundant instruction with a function calling instruction. By removing the redundant instructions in the program to be analyzed, the instrumentation space can be reserved on the premise of not damaging the original structure and instruction offset of the program to be analyzed, and new codes are injected into the program to be analyzed, so that the method has higher stability, and is not only suitable for small and medium-sized programs, but also suitable for complex large-scale programs such as operating system kernels and the like.

Description

Program instrumentation method, apparatus, device and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a program instrumentation method, apparatus, device, and storage medium.
Background
The network space is an important component of the modern society and is widely applied to important scenes such as industrial production and the like. As an important component of a network space, functions of a program are more and more diversified and complicated, so that requirements for program testing are increased, program problems need to be found in time through program testing, and negative effects on daily use of the user due to faults or security holes of the program are avoided.
The program instrumentation technology is widely applied to dynamic testing of programs as an auxiliary means for program analysis, and acquires the internal state of the programs in the execution process by injecting new codes into the programs so as to provide feedback information for the dynamic testing; or the original execution flow of the program is changed to adapt to special analysis requirements.
One of the application cases of the more representative program instrumentation technique is code coverage feedback instrumentation in the fuzzy test. Fuzzy testing is considered to be an effective automatic vulnerability mining technology at present, and attempts to trigger a vulnerability in a program by generating a large number of test cases and repeatedly executing a test target. The fuzz test generally evaluates the quality of a test case according to the code quantity triggered by the test case, however, a program generally does not actively record and feed back the code execution condition, so that a program instrumentation technology is needed to add instructions into a basic block of the program, and information is recorded when the basic block is executed. Besides dynamic testing, program instrumentation techniques can also be referenced in rich scenarios such as program patches, for example, adding conditional checks before critical functions, or bulk adding log information in associated functions, etc. Generally, the program instrumentation technology greatly increases convenience and flexibility for the whole process of program development, testing and maintenance.
In the related art, the program instrumentation technology generally only supports source code instrumentation, i.e., source code instrumentation, which injects code into a program during the compilation stage. If a development framework (Spring boot) supports a developer to actively define a notification behavior in a program, and a custom code is called before and after a specified function is executed; the Compiler comprises a GNU Compiler Collection (GCC) and a lightweight Compiler (clone), supports code coverage instrumentation for automating source codes, can add codes to each basic block in the compiling process, and records whether the basic blocks are triggered. However, source code is often difficult to obtain, e.g., a large amount of commercial software, an operating system, and the like, do not open source code for commercial needs; and the problem of source code loss often exists in some software which is developed earlier. Under these circumstances, a need arises for analysis of a large number of closed-source programs (i.e., binary programs), and therefore, it is highly desirable to provide an efficient program instrumentation scheme suitable for closed-source programs to assist dynamic testing.
Disclosure of Invention
The application provides a program instrumentation method, a device, equipment and a storage medium, which are used for providing an effective program instrumentation scheme suitable for a closed-source program to assist dynamic testing.
In a first aspect, the present application provides a program instrumentation method, comprising: acquiring a program to be analyzed, wherein the program to be analyzed is a binary program; identifying a redundant instruction in the program to be analyzed based on the analysis requirement, and determining a stub function to be inserted, wherein the redundant instruction is a code unrelated to the original semantic meaning and/or the analysis requirement of the program to be analyzed, the position of the redundant instruction is a stub insertion point of the stub function, and the stub function is used for adding a function corresponding to the analysis requirement to the program to be analyzed, storing context information executed by the program to be analyzed and restoring and executing the program to be analyzed based on the context information; assigning an address offset in the function call instruction according to the offset between the position of the stub function and the stub point; and replacing the redundant instruction with a function calling instruction.
In one possible embodiment, identifying redundant instructions in a program to be analyzed based on analysis requirements includes: determining a redundant instruction based on the analysis requirement; and scanning and recording redundant instructions in the program to be analyzed.
In one possible implementation, scanning redundant instructions in a program to be analyzed includes: for a fixed-length instruction set, scanning redundant instructions through instruction byte codes; for instruction sets of indeterminate length, redundant instructions are scanned by the disassembly tool.
In one possible implementation, if a stub function in the program instrumentation method is linked with a program to be analyzed in a dynamic link library manner, a function call instruction calls the stub function based on a jump table of a location-independent function, where the jump table includes a program link table or a stub function table; if the file structure of the program to be analyzed cannot be added with the position-independent function jump table, the function call instruction calls the stub function based on the table entry of the non-key function, and the non-key function comprises a log function.
In one possible implementation, after replacing the redundant instruction with the function call instruction, the method further includes: determining whether a first space occupied by the redundant instruction is larger than a second space occupied by the function call instruction; in response to the first space being larger than the second space, a null instruction is filled before the function call instruction and/or after the function call instruction according to the excess space portion.
In one possible embodiment, the context information includes register values and memory states.
In a second aspect, the present application provides a program instrumentation comprising: the acquisition module is used for acquiring a program to be analyzed, wherein the program to be analyzed is a binary program; the system comprises an identification module, a data processing module and a data processing module, wherein the identification module is used for identifying a redundant instruction in a program to be analyzed based on an analysis requirement and determining a stub function to be inserted, the redundant instruction is a code unrelated to the original semantic meaning and/or the analysis requirement of the program to be analyzed, the position of the redundant instruction is a stub insertion point of the stub function, and the stub function is used for adding a function corresponding to the analysis requirement into the program to be analyzed, storing context information executed by the program to be analyzed and restoring and executing the program to be analyzed based on the context information; the assignment module is used for assigning the address offset in the function call instruction according to the offset between the position of the stub function and the stub insertion point; and the replacing module is used for replacing the redundant instruction with a function calling instruction.
In a possible implementation manner, the identification module may be configured to identify redundant instructions in the program to be analyzed based on the analysis requirement, specifically: determining a redundant instruction based on the analysis requirement; and scanning and recording redundant instructions in the program to be analyzed.
In a possible implementation, the identification module may be further configured to: for a fixed-length instruction set, scanning redundant instructions through instruction byte codes; for instruction sets of indeterminate length, redundant instructions are scanned by the disassembly tool.
In a possible embodiment, the method further comprises: when the pile function is linked with the program to be analyzed in a dynamic link library mode, the function call instruction calls the pile function based on a jump table of the position-independent function, and the jump table comprises a program link table or a pile function table; if the file structure of the program to be analyzed cannot be added with the position-independent function jump table, the function call instruction calls the stub function based on the table entry of the non-key function, and the non-key function comprises a log function and the like.
In one possible implementation, the replacement module may be further configured to: determining whether a first space occupied by the redundant instruction is larger than a second space occupied by the function call instruction; in response to the first space being larger than the second space, a null instruction is filled before the function call instruction and/or after the function call instruction according to the excess space portion.
In a possible implementation manner, the context information of the program to be analyzed, which is stored by the stub function to be inserted and identified in the identification module, includes a register value and a memory state.
In a third aspect, the present application provides an electronic device, comprising: a memory and a processor. The memory is used for storing program instructions; the processor is configured to invoke program instructions in the memory to perform the program instrumentation method of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon computer-executable instructions that, when executed, implement the program instrumentation method of the first aspect.
In a fifth aspect, the present application provides a computer program product comprising a computer program that, when executed, is adapted to implement the program instrumentation method of the first aspect.
According to the program instrumentation method, the program instrumentation device, the program instrumentation equipment and the program instrumentation storage medium, the program to be analyzed is obtained and is a binary program; identifying a redundant instruction and a stub function to be inserted in the program to be analyzed based on the analysis requirement, wherein the redundant instruction is a code unrelated to the original semantic meaning and/or the analysis requirement of the program to be analyzed, the position of the redundant instruction is a stub insertion point of the stub function, and the stub function is used for adding a function corresponding to the analysis requirement to the program to be analyzed, storing context information executed by the program to be analyzed and restoring and executing the program to be analyzed based on the context information; assigning an address offset in the function call instruction according to the offset between the position of the stub function and the stub point; and replacing the redundant instruction with a function calling instruction. Because the method and the device perform instrumentation by removing the redundant instruction in the program to be analyzed, the removal of the instruction does not affect the normal operation of the program to be analyzed, so that an instrumentation space is reserved on the premise of not damaging the original structure and the instruction offset of the program to be analyzed, new codes are injected into the program to be analyzed, the method and the device have high stability, are not only suitable for small and medium user mode programs, but also can be applied to complex large programs such as an operating system kernel and the like, or other programs with high requirements on stability, and have wider application range.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
FIG. 1 is a schematic illustration of a program before and after instrumentation according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating a method for program instrumentation according to an embodiment of the present application;
FIG. 3 is an operational diagram illustrating implementation of code coverage feedback instrumentation in a macOS driver according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a program pile inserter provided in an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, article, or apparatus.
In the related art, instrumentation technology of a closed source program is still rare, and the existing tools also have the problems of stability and expandability. The closed-source program instrumentation technology comprises a static instrumentation technology and a dynamic instrumentation technology, wherein the static instrumentation technology often introduces reference ambiguity due to damage of offset between instructions, so that the program generates abnormity due to code instrumentation, and the stability of the program after instrumentation is reduced. These techniques also support a limited number of program types, such as static binary rewriter (e 9 patch) which has a large modification to the program layout, and is usually only applicable to user-mode programs and only supports the x64 instruction set; the modified compiler binary rewrite tool (Retrowrite) cannot support binaries developed in common languages, such as the c + + language. The dynamic instrumentation technique often depends on virtualization, instruction translation and other techniques, has higher operating environment requirements and large operating overhead, and is difficult to be widely applied to various application programs.
In order to solve the above problems, the present application provides a program instrumentation method, which is applied to scenarios such as coverage rate collection and program patching of a fuzzy test by removing redundant instructions in a program, reserving instrumentation space, and injecting codes into the program on the premise of not destroying an original structure and instruction offset of the program. The scheme has extremely high stability and does not depend on complex static analysis, so that the scheme can be applied to complex large programs such as an operating system kernel.
Fig. 1 is a schematic diagram of a program before and after instrumentation according to an embodiment of the present application. As shown in fig. 1, the method comprises two parts, namely a program instrumentation part and a program instrumentation part. The program to be analyzed before the program instrumentation is executed according to the original execution sequence, as shown in the figure, the program of the basic block 1 is executed first, then the program of the basic block 2 is executed, then the program of the basic block 3 is executed, and the programs are executed in sequence.
The program is instrumented by removing redundant instructions of the program to be analyzed, for example, redundant instructions are included in a basic block as part of a replaceable instrumentation function. The program to be analyzed after program instrumentation replaces a pile-forming function call instruction with a redundant instruction identified in the program to be analyzed before program instrumentation, the execution process after program instrumentation is the program of executing the basic block 1 first, but after the part of the pile-forming function call instruction is executed, the program to be analyzed jumps to the called pile-forming function execution program first, after the execution of the called pile-forming function program is completed, the program of the basic block 1 after the execution of the original redundant instruction is completed is executed, after the execution of all the programs in the basic block 1 is completed, the program of the basic block 2 is executed, and the execution process is the same as that of the basic block 1.
Illustratively, the stub function call instruction is bl _ funcA, which means that the specifically called stub function is a funcA function. Wherein, the stub function can be stored as a function in the code segment, dynamic link library or driver of the stub function. The stub function is a function that needs to perform two functions: 1. saving context information at a function inlet, and restoring the context information at a function outlet (the context information specifically refers to a current register value); 2. the additional function that the instrumentation tool user wishes to add to the analysis program, i.e. the function corresponding to the analysis requirements, is implemented.
A method of procedural instrumentation according to an exemplary embodiment of the present application is described below with reference to fig. 2 in conjunction with the example of fig. 1. It should be noted that the above application scenario is only shown for the convenience of understanding the spirit and principle of the present application, and the embodiments of the present application are not limited by the application scenario shown in fig. 1.
Fig. 2 is a flowchart illustrating a program instrumentation method according to an embodiment of the present application. As shown in fig. 2, the method for program instrumentation in the embodiment of the present application includes the following steps:
s201: and acquiring a program to be analyzed, wherein the program to be analyzed is a binary program.
In this step, the program to be analyzed is a program that the program developer provides to the user for use. For an open source program, a user can obtain a source code and a binary program of a program to be analyzed; however, for the closed source program, the user can only obtain the binary program of the closed source program, but cannot obtain the source code of the program to be analyzed, and therefore, in order to be suitable for the analysis of the open source program and the closed source program at the same time, the program to be analyzed obtained by the present application is the binary program.
For example, when the analysis scenario is specifically a test scenario, the program to be analyzed is a program to be tested.
S202: based on the analysis requirement, identifying a redundant instruction in the program to be analyzed, and determining a stub function to be inserted, wherein the redundant instruction is a code unrelated to the original semantics and/or the analysis requirement of the program to be analyzed, the position of the redundant instruction is a stub insertion point of the stub function, and the stub function is used for adding a function corresponding to the analysis requirement to the program to be analyzed, saving context information executed by the program to be analyzed, and restoring and executing the program to be analyzed based on the context information.
In this step, the analysis requirement is to verify each function of a certain program to be analyzed, and by testing item by item, the function of the execution condition of the program to be analyzed is checked or recorded. Illustratively, the test functions include: collecting code coverage (i.e., which codes have been executed), recording values of particular variables within the program, etc.
Illustratively, the stub function to be inserted is a function developed in advance based on the analysis requirement, and can be called through the subsequent steps.
For example, the redundancy instructions may be set in conjunction with actual demand. For example, in the scenario of fuzz testing, common redundant instructions come from protection measures in program operation, such as stack protection mechanism instruction (canary), pointer verification instruction (PA), and the like, and such instructions are mainly used to protect a program from being utilized by an attacker in a user use process, and in the scenario of fuzz testing targeting vulnerability mining, there is no requirement for program protection, and therefore, the instructions can be deleted. Other common redundant instructions include instructions for logging and instructions that can be merged or deleted due to compiler optimization bugs.
For example, after the program to be analyzed executes the stub function, it is necessary to actively jump back to the original execution flow of the program to be analyzed. In this step, the stub function is also responsible for saving the context information executed by the program to be analyzed and restoring execution of the program to be analyzed based on the context information, and based on this, the program to be analyzed can continue to normally execute the original code of the program to be analyzed after the stub function is executed. Illustratively, the context information includes register values, memory states, and the like. Further, the context information may include information for listening to the running state of the program to be analyzed.
S203: and assigning the address offset in the function call instruction according to the offset between the position of the stub function and the stub point.
In this step, based on the analysis requirement in step S202, the stub function to be inserted by the program to be analyzed is determined, and then the position of the stub function is obtained. Illustratively, there is an offset of a distance between the location of the stub function and the instrumentation point of the function call instruction. And modifying the address offset in the function call instruction according to the offset between the position of the stub function and the pile inserting point by using the same function call instruction, wherein different function addresses correspond to different stub functions, thereby realizing the call of different stub functions.
Illustratively, the stub function is used for realizing additional functions which a user wishes to add to the program to be analyzed, such as code coverage acquisition, variable value recording and the like. The stub function may be defined by a user of the present application, but before or after the user-defined function is executed, context information for executing the program to be analyzed needs to be saved and the program to be analyzed needs to be executed based on the context information. Illustratively, the context information includes register values and memory states, and the stub function may allocate a stack space of an appropriate size at a function entry, and temporarily store the register values in the original control flow on the stack, so as to prevent the subsequent stub function from overwriting the register values. Exemplarily, inside the stub function, if there is no special requirement, it should be avoided to modify the stack data of the program to be analyzed, and avoid the program to be analyzed from being abnormal; after the function is executed and before the original control flow of the program to be analyzed is returned, the heap space applied in the stub function needs to be released, the register value temporarily stored on the stack is recovered, and the current stack space is released.
S204: and replacing the redundant instruction with a function calling instruction.
In this step, since the space occupied by the redundant instruction is usually small and is not enough to complete the function of instrumentation of a complex program, for example, the present application implements a new function by using an stub function, and fills the space occupied by the original redundant instruction with/replaces the redundant instruction with the stub function call instruction, where the function call instruction acts as a jump board between the original logic of the program to be analyzed and the new function.
Illustratively, the function of a function call instruction differs under different instruction sets, but typically only one to two instructions are required to implement it.
In this step, the redundant instruction is replaced with a function call instruction, and the function call instruction space of the program to be analyzed can be reserved by removing the redundant instruction from the program to be analyzed or compressing the redundant instruction from the program to be analyzed, and the function call instruction is placed in the reserved space, so that when the program to be analyzed executes the program of the function call instruction, the stub function can be successfully called, and the function that the user needs to complete through the stub can be executed.
In the program instrumentation method provided by the embodiment of the application, redundant instructions in the program to be analyzed are removed, and the normal operation of the program to be analyzed is not affected by the removal of the instructions, so that an instrumentation space is reserved on the premise of not damaging the original structure of the program to be analyzed and the deviation between the instructions and between the instructions and data, and a new code is injected into the program to be analyzed.
On the basis of the foregoing embodiment, optionally, identifying redundant instructions in the program to be analyzed based on the analysis requirement includes: determining a redundant instruction based on the analysis requirement; and scanning and recording redundant instructions in the program to be analyzed.
For example, a user of the present application may designate one or more types of redundant instructions as targets to be replaced, depending on the analysis requirements. The selected redundant instruction needs not to affect the normal operation of the program after being deleted or modified, wherein the redundant instruction can be a single instruction or a combination of a plurality of instructions. After the redundant instruction is determined, the redundant instruction in the program to be analyzed needs to be scanned and recorded.
Further, scanning redundant instructions in the program to be analyzed includes: for a fixed-length instruction set, scanning redundant instructions through instruction byte codes; for instruction sets of indeterminate length, redundant instructions are scanned by the disassembly tool.
Illustratively, the fixed-length Instruction Set includes an Advanced Reduced Instruction Set (Advanced Reduced Instruction Set Computing Machines, ARM for short); the instruction set of indefinite length includes x64. The method is realized based on a static binary rewriting scheme, completes program instrumentation before program loading, does not depend on complex static analysis, avoids operation expenses caused by dynamically modifying function call instructions and adjusting control flow when the program to be analyzed runs, and can support large program instrumentation.
In a possible implementation manner, if a stub function in the program instrumentation method is linked with a program to be analyzed in a dynamic link library manner, a function call instruction calls the stub function based on a jump table of a position-independent function, wherein the jump table comprises a program link table or a stub function table; if the file structure of the program to be analyzed cannot be added with the position-independent function jump table, the function call instruction calls the stub function based on the table entry of the non-key function, and the non-key function comprises a log function.
For example, when a stub function and a program to be analyzed are linked in a form of a dynamic Link library, a relative address offset between the stub function and the program to be analyzed cannot be determined, a function call instruction needs to call the stub function by using a jump Table of a location-independent function such as a program Link Table (plt for short) or a stub function Table (stub), and for a file structure in which the plt or stub entry cannot be added, an entry of a non-critical function such as a log function that does not affect a main function of the program to be analyzed may be replaced for calling the stub function. Plt is used for an Executable connection Format (ELF) file of the Linux system, and stub is used for a binary Executable file (Mach-O) of the MacOS system.
Exemplarily, in a user mode, the stub function code may be combined with the stub function to be inserted in a form of a dynamic link library; in the kernel mode, the method can be called by other kernel codes in a driving mode; for file structures such as Executable and Linkable Format (ELF for short) supporting adding code segments, stub functions can also be placed in newly added code segments.
In one possible implementation, after replacing the redundant instruction with the function call instruction, the method further includes: determining whether a first space occupied by the redundant instruction is larger than a second space occupied by the function call instruction; in response to the first space being larger than the second space, a null instruction is filled before the function call instruction and/or after the function call instruction according to the excess space portion.
Illustratively, the redundant instruction may be replaced with a stub function call instruction by modifying the binary file of the program to be analyzed. When the redundant instruction occupies more space than is required by the function call instruction, the extra space may be filled with a null instruction (nop), which does not perform an operation but occupies one program step.
For example, the program instrumentation method provided in the present application is also applicable to an open source program in which a program developer provides source code, and the implementation process thereof is similar to the above embodiments and is not described herein again.
In summary, the present application has at least the following advantages:
1. high stability: according to the method, the redundant instruction is replaced by the function calling instruction, the stub function to be inserted is called through the function calling instruction, the operation process avoids the deviation between the instruction and between the instruction and the data in the original program of the program to be analyzed, the program to be analyzed after the stub insertion can stably run, the collapse of the program to be analyzed due to address reference ambiguity is avoided, and the method is suitable for program stub insertion with high requirements on stability, such as an operating system kernel and the like.
2. And (3) expandability: the method provided by the application is not limited to a specific development language or instruction set, does not influence the original structure, size and basic function of the program to be analyzed, does not depend on complex static analysis, and can support large program instrumentation.
3. High efficiency: the method provided by the application is realized based on a static binary rewriting scheme, program instrumentation is completed before the program is loaded, the method does not depend on complex static analysis, and the overhead caused by reading and writing a program memory and modifying a program instruction during the program running of dynamic rewriting is avoided.
Fig. 3 is an operation diagram illustrating implementation of code coverage feedback instrumentation in a macOS driver according to an embodiment of the present application. As shown in fig. 3, the macOS-driven binary file format includes LC _ TEXT, LC _ SYMTAB, LC _ dysytab, TEXT, stub, and String Table, wherein all basic blocks (bb) are in the TEXT component, and redundant instructions in the TEXT component are replaced by BL offset _ x, BL offset _ y, and BL offset _ z, wherein the address offset between the function call point 1 (bb _ 1) and the stub function is x, and the function call instruction is BL offset _ x; the address offset between function call point 2 (bb _ 2) and the stub function is y, and the function call instruction is BL offset _ y; the address offset between the function call point 3 (bb _ 3) and the stub function is z, the function call instruction is BL offset _ z, and the address offsets x, y, z are different from each other. For example, in a macOS driver, when a stub function and a program to be analyzed are linked in a dynamic link library, a relative address offset between the stub function and the program to be analyzed cannot be determined, a function call instruction needs to call the stub function by using a jump table stub of a location-independent function, and for a file structure in which the stub table cannot be added, a non-critical function entry such as a log function (_ IOLog) that does not affect a main function of the program to be analyzed may be replaced for calling the stub function. In fig. 3, a stub function is called by multiplexing a stub table, a _ IOLog function is replaced by a _ COVPC function, and then an address where the _ COVPC function is located is automatically filled in the BL _ COVPC function for storage through a program linking tool (e.g., kmutil in macOS) of the system itself.
For example, a macOS driver is a difficult program instrumentation program, which runs in a system kernel mode, has a high requirement on stability, has a highly compact file format and a strict verification mechanism, and is difficult to directly add additional instructions into a program, and the conventional program instrumentation technology cannot modify the program instrumentation program. In the method provided by the application, the pointer verification instruction in the macOS driver is removed, the instrumentation space is reserved, and stable coverage instrumentation is realized under the condition that the function and the structure of the macOS driver are not damaged. Specifically, a pointer verification instruction in the macOS driver is replaced by a function call instruction as a redundant instruction, and is used for calling a stub function responsible for recording the code coverage rate, and a specific replacement process is as follows:
Figure BDA0003919382840000111
illustratively, the content to the left of lines 7-9 is a redundant instruction, and the content after "= >" is a function call instruction.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Fig. 4 is a schematic structural diagram of a program pile inserting device according to an embodiment of the present application. For convenience of explanation, only portions related to the embodiments of the present application are shown. As shown in fig. 4, the program pile-inserting device 40 includes: an acquisition module 41, a recognition module 42, an assignment module 43 and a replacement module 44. Wherein the content of the first and second substances,
an obtaining module 41, configured to obtain a program to be analyzed, where the program to be analyzed is a binary program;
the identification module 42 is configured to identify a redundant instruction in the program to be analyzed based on the analysis requirement, and determine a stub function to be inserted, where the redundant instruction is a code unrelated to the original semantic meaning and/or the analysis requirement of the program to be analyzed, the position of the redundant instruction is a stub insertion point of the stub function, and the stub function is configured to add a function corresponding to the analysis requirement to the program to be analyzed, store context information executed by the program to be analyzed, and resume execution of the program to be analyzed based on the context information;
the assignment module 43 is configured to assign an address offset in the function call instruction according to an offset between a position where the stub function is located and the stub insertion point;
and a replacement module 44 for replacing the redundant instruction with a function call instruction.
In a possible implementation, the identification module 42 may be configured to identify redundant instructions in the program to be analyzed based on the analysis requirement, specifically: determining a redundant instruction based on the analysis requirement; and scanning and recording redundant instructions in the program to be analyzed.
In a possible implementation, the recognition module 42 is configured to scan redundant instructions in the program to be analyzed, specifically: for a fixed-length instruction set, scanning redundant instructions through instruction byte codes; for instruction sets of indeterminate length, redundant instructions are scanned by the disassembly tool.
In one possible implementation, when the stub function is linked with the program to be analyzed in a dynamic link library mode, the function call instruction calls the stub function based on a jump table of the position-independent function, and the jump table comprises a program link table or a stub function table; if the file structure of the program to be analyzed cannot be added with the position-independent function jump table, the function call instruction calls the stub function based on the table entry of the non-key function, and the non-key function comprises a log function.
In one possible implementation, the replacement module 44 may further be configured to: determining whether a first space occupied by the redundant instruction is larger than a second space occupied by the function call instruction; in response to the first space being larger than the second space, a null instruction is filled before the function call instruction and/or after the function call instruction according to the excess space portion.
In one possible embodiment, the context information of the program to be analyzed, stored by the stub function to be inserted and identified in the identification module 42, includes register values and memory states.
The implementation principle and technical effect of the program pile inserting device provided in the embodiment of the present application are similar to those of the above embodiments, and reference may be made to the above embodiments specifically, which are not repeated herein.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the application. As shown in fig. 5, the electronic device 500 includes: at least one processor 510, memory 520, a communication interface 530, and a system bus 540. The memory 520 and the communication interface 530 are connected to the processor 510 through the system bus 540 and complete mutual communication, the memory 520 is used for storing instructions, the communication interface 530 is used for communicating with other devices, and the processor 510 is used for calling the instructions in the memory to execute the above-mentioned scheme of the program instrumentation method embodiment.
The Processor 510 mentioned in fig. 5 may be a general-purpose Processor, including a central processing unit, a Network Processor (NP), and the like; a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
The Memory 520 may include a Random Access Memory (RAM), a Static Random Access Memory (SRAM), an electrically Erasable Programmable Read-Only Memory (EEPROM), an Erasable Programmable Read-Only Memory (EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk, such as at least one disk Memory.
Communication interface 530 is used to enable communication between the program instrumentation and other devices (e.g., clients).
The system bus 540 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus 540 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
Those skilled in the art will appreciate that the electronic device illustrated in fig. 5 does not constitute a limitation of the electronic device and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components.
Embodiments of the present application further provide a computer-readable storage medium, in which computer-executable instructions are stored, and when the computer-executable instructions are executed, the above program instrumentation method is implemented.
Embodiments of the present application also provide a computer program product, which includes a computer program, and when executed, the computer program implements the above program instrumentation method.
The embodiment of the present application further provides a chip for executing the instruction, where the chip is used to execute the program instrumentation method according to any one of the above method embodiments.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A method of procedural instrumentation, comprising:
acquiring a program to be analyzed, wherein the program to be analyzed is a binary program;
identifying a redundant instruction in the program to be analyzed based on an analysis requirement, and determining a stub function to be inserted, wherein the redundant instruction is a code unrelated to the original semantic meaning of the program to be analyzed and/or the analysis requirement, the position of the redundant instruction is a stub insertion point of the stub function, and the stub function is used for adding a function meeting the analysis requirement into the program to be analyzed, saving context information executed by the program to be analyzed, and restoring execution of the program to be analyzed based on the context information;
assigning an address offset in the function call instruction according to the offset between the position of the stub function and the stub insertion point;
and replacing the redundant instruction with a function call instruction.
2. The program instrumentation method of claim 1, wherein said identifying redundant instructions in said program to be analyzed based on analysis requirements comprises:
determining the redundant instructions based on the analysis requirements;
and scanning and recording the redundant instructions in the program to be analyzed.
3. The program instrumentation method of claim 2, wherein scanning for redundant instructions in the program to be analyzed comprises:
for a fixed-length instruction set, scanning the redundant instruction through an instruction byte code;
for instruction sets of indeterminate length, the redundant instructions are scanned by the disassembly tool.
4. A procedural instrumentation method according to any one of claims 1 to 3, further comprising:
if the stub function is linked with the program to be analyzed in a dynamic link library mode, the function calling instruction calls the stub function based on a jump table of a position-independent function, and the jump table comprises a program link table or a stub function table;
if the file structure of the program to be analyzed cannot increase the jump table of the position-independent function, the function call instruction calls the stub function based on the table entry of the non-key function, and the non-key function comprises a log function.
5. The program instrumentation method of any one of claims 1 to 3, wherein replacing the redundant instruction with a function call instruction further comprises:
determining whether a first space occupied by the redundant instruction is larger than a second space occupied by the function call instruction;
in response to the first space being larger than the second space, filling a null instruction before the function call instruction and/or after the function call instruction according to a spare space portion.
6. The program instrumentation method of any one of claims 1 to 3, wherein the context information comprises register values and memory states.
7. A procedural pile insertion device, comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a program to be analyzed, and the program to be analyzed is a binary program;
the identification module is used for identifying a redundant instruction in the program to be analyzed based on an analysis requirement and determining a stub function to be inserted, wherein the redundant instruction is a code unrelated to the original semantics of the program to be analyzed and/or the analysis requirement, the position of the redundant instruction is a stub insertion point of the stub function, and the stub function is used for adding a function corresponding to the analysis requirement into the program to be analyzed, saving context information executed by the program to be analyzed and restoring and executing the program to be analyzed based on the context information;
the assignment module is used for assigning the address offset in the function call instruction according to the offset between the position of the stub function and the stub insertion point;
and the replacing module is used for replacing the redundant instruction with a function calling instruction.
8. The program instrumentation device of claim 7, wherein the identification module is specifically configured to:
determining the redundant instructions based on the analysis requirements;
and scanning and recording the redundant instructions in the program to be analyzed.
9. An electronic device, comprising: a memory and a processor;
the memory is to store program instructions;
the processor is configured to call program instructions in the memory to perform the program instrumentation method of any one of claims 1 to 6.
10. A computer-readable storage medium having computer-executable instructions stored therein, the computer-executable instructions when executed for implementing the program instrumentation method of any one of claims 1 to 6.
CN202211350276.3A 2022-10-31 2022-10-31 Program instrumentation method, device, equipment and storage medium Active CN115617687B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211350276.3A CN115617687B (en) 2022-10-31 2022-10-31 Program instrumentation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211350276.3A CN115617687B (en) 2022-10-31 2022-10-31 Program instrumentation method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115617687A true CN115617687A (en) 2023-01-17
CN115617687B CN115617687B (en) 2023-08-25

Family

ID=84876254

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211350276.3A Active CN115617687B (en) 2022-10-31 2022-10-31 Program instrumentation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115617687B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116502239A (en) * 2023-06-27 2023-07-28 清华大学 Memory vulnerability detection method, device, equipment and medium for binary program
CN117171043A (en) * 2023-09-18 2023-12-05 盐城柒壹网络科技有限公司 Intelligent source code detection method, system, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1710547A (en) * 2004-06-16 2005-12-21 华为技术有限公司 Software detection method and system
CN103955424A (en) * 2014-03-25 2014-07-30 杭州电子科技大学 Virtualized embedded type binary software defect detection system
CN105159835A (en) * 2015-10-24 2015-12-16 北京航空航天大学 Pile inserting position obtaining method based on global superblock domination graph
CN105183642A (en) * 2015-08-18 2015-12-23 中国人民解放军信息工程大学 Instrumentation based program behavior acquisition and structural analysis method
CN110414218A (en) * 2018-11-13 2019-11-05 腾讯科技(深圳)有限公司 Kernel detection method, device, electronic equipment and storage medium
CN111913878A (en) * 2020-07-13 2020-11-10 苏州洞察云信息技术有限公司 Program analysis result-based bytecode instrumentation method, device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1710547A (en) * 2004-06-16 2005-12-21 华为技术有限公司 Software detection method and system
CN103955424A (en) * 2014-03-25 2014-07-30 杭州电子科技大学 Virtualized embedded type binary software defect detection system
CN105183642A (en) * 2015-08-18 2015-12-23 中国人民解放军信息工程大学 Instrumentation based program behavior acquisition and structural analysis method
CN105159835A (en) * 2015-10-24 2015-12-16 北京航空航天大学 Pile inserting position obtaining method based on global superblock domination graph
CN110414218A (en) * 2018-11-13 2019-11-05 腾讯科技(深圳)有限公司 Kernel detection method, device, electronic equipment and storage medium
CN111913878A (en) * 2020-07-13 2020-11-10 苏州洞察云信息技术有限公司 Program analysis result-based bytecode instrumentation method, device and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116502239A (en) * 2023-06-27 2023-07-28 清华大学 Memory vulnerability detection method, device, equipment and medium for binary program
CN116502239B (en) * 2023-06-27 2023-09-19 清华大学 Memory vulnerability detection method, device, equipment and medium for binary program
CN117171043A (en) * 2023-09-18 2023-12-05 盐城柒壹网络科技有限公司 Intelligent source code detection method, system, equipment and storage medium
CN117171043B (en) * 2023-09-18 2024-03-26 盐城柒壹网络科技有限公司 Intelligent source code detection method, system, equipment and storage medium

Also Published As

Publication number Publication date
CN115617687B (en) 2023-08-25

Similar Documents

Publication Publication Date Title
CN109491695B (en) Incremental updating method for integrated android application
CN115617687B (en) Program instrumentation method, device, equipment and storage medium
CN106371940B (en) Method and device for solving program crash
US5581697A (en) Method and apparatus for run-time error checking using dynamic patching
US11507362B1 (en) System and method for generating a binary patch file for live patching of an application
CN109032631B (en) Application program patch package obtaining method and device, computer equipment and storage medium
JP2000181725A (en) Method and system for altering executable code and giving addition function
KR101995285B1 (en) Method and apparatur for patching security vulnerable executable binaries
CN110088736B (en) Self-tuning method and system
CN108829465B (en) Local dynamic loading system and method based on direct read-write FLASH
CN103019787A (en) Function call relation determining method, hotfix updating method and hotfix updating device
CN102364433B (en) Method for realizing Wine construction tool transplanting on ARM (Advanced RISC Machines) processor
CN1318976C (en) Software detection method and system
Lee et al. An SMT encoding of LLVM’s memory model for bounded translation validation
EP3540598A1 (en) Method, device and server for checking a defective function
CN111625225A (en) Program specified data output method and device
CN110765008A (en) Data processing method and device
CN111538506A (en) Method and system for simplifying executable files
CN116028061A (en) Byte code file generation method, page jump device and computer equipment
CN115456628A (en) Intelligent contract viewing method and device based on block chain, storage medium and equipment
CN111984329B (en) Boot software standardized generation and execution method and system
CN114048125A (en) Test case determination method and device, computing equipment and storage medium
CN113220303A (en) Compiling method and system of kernel module
CN112486822A (en) Software testing device, method, computer equipment and storage medium
CN115167862A (en) Patch method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant