CN113836023B - Compiler security testing method based on architecture cross check - Google Patents

Compiler security testing method based on architecture cross check Download PDF

Info

Publication number
CN113836023B
CN113836023B CN202111128084.3A CN202111128084A CN113836023B CN 113836023 B CN113836023 B CN 113836023B CN 202111128084 A CN202111128084 A CN 202111128084A CN 113836023 B CN113836023 B CN 113836023B
Authority
CN
China
Prior art keywords
code
compiler
instruction
compiling
binary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111128084.3A
Other languages
Chinese (zh)
Other versions
CN113836023A (en
Inventor
徐坚皓
丁柱
茅兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202111128084.3A priority Critical patent/CN113836023B/en
Publication of CN113836023A publication Critical patent/CN113836023A/en
Application granted granted Critical
Publication of CN113836023B publication Critical patent/CN113836023B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3624Software debugging by performing operations on the source code, e.g. via a compiler
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/52Binary to binary
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a compiler security testing method based on architecture cross check, which aims to detect software security holes introduced by a compiler when compiling common open source software into different architecture binary codes. Since the changes of the security-related code before and after compiling satisfy the architecture consistency, software vulnerabilities introduced by the compiler can be detected by comparing the changes and performing an architecture cross check. The method comprises the following steps: modeling and positioning the safety related instruction, corresponding to IR (Intermediate Representation) code and binary code, judging the change of the semantic state of the safety related instruction and performing cross check of the architecture. The method of the invention realizes the efficient and accurate positioning of the software security hole introduced by the compiler.

Description

Compiler security testing method based on architecture cross check
Technical Field
The invention relates to a compiler security testing method based on architecture cross checking.
Background
Software vulnerabilities (Software Vulnerability) refer to flaws in the security of a computer system that threaten the confidentiality, integrity, availability, access control, etc. of the system or its application data. A compiler (compiler) is a computer program widely used to convert source code written in a certain programming language into binary program files for actual execution.
Software vulnerabilities often stem from problems with the source program itself, and can be localized to relevant errors of the software source code. However, in recent years, more and more software security vulnerabilities, whose vulnerability points do not exist directly in the source code, are introduced by the compiler at the compilation stage.
Each architecture may have problems introduced by its own compiler. The compiler backend of each architecture is independent in implementation and design, and there are many proprietary code generation and optimization strategies, as such complex software, which the compiler will inevitably have problems in design and implementation at these modules. These architecture-related problems have an increasingly prominent real-world impact. With the popularity of mobile internet, internet of things and embedded devices, a large number of systems and applications need to run on different architecture platforms. As an important software infrastructure, existing compilers do support more and more architecture, e.g. GCC and Clang both support over 40 architecture. Compiler backend for these architectures is also widely used. There is an increasing importance of system software supporting a large number of architectures, for example the most popular open source operating system Kernel Linux Kernel can run on more than 25 architecture platforms.
Traditional vulnerability discovery work is mainly performed through dynamic and static techniques. Dynamic vulnerability discovery refers to the discovery of vulnerabilities during execution by a given program specific input. According to different operation modes, the fuzzy test and dynamic symbol execution can be classified. The static vulnerability discovery technology is to detect possible vulnerability points by summarizing features of vulnerabilities reflected in programs and assisting program analysis technology. According to different vulnerability summarization modes, the vulnerability model can be divided into a manually defined vulnerability model and a machine learning model learning vulnerability model. According to different research objects, the method can be divided into source code static vulnerability mining and binary static vulnerability mining.
The above conventional vulnerability discovery work generally only considers the security of the software program itself, and does not specifically consider the security problem introduced by the compiler into the binary program. Recently, some vulnerability detection works consider the influencing factors of the compiler, are used to detect the security problems introduced by the compiler in a targeted manner, and can be mainly divided into two categories.
The first class of work is to detect vulnerabilities introduced by the compiler by static analysis methods based on their semantic features.
The second type of work is the correctness test of the compiler. The correctness test of the compiler is a process of detecting the compiler through a series of methods to ensure that the compiler meets the correctness (i.e. accords with the compiling language specification) in the compiling process. One core problem with such an approach is the test criteria, i.e. how to determine that the behavior of the compiler is correct. There are generally two types of solutions to the problem of testing criteria: differential testing (Differntial Testing) and metamorphic testing (Metamorphic Testing).
While the above approach and finding many compiler security issues, directed to introducing a source program software vulnerability to a compiler, gets a positive response from the developer, creating a large impact. However, the existing detection methods also face some common problems:
1. the universality of the detection method and the pertinence to the safety problem cannot be considered. On the one hand, the general detection methods, such as CSmith, EMI, etc., are aimed at the correctness problem of the compiler rather than the security problem. This makes some potential security issues unattractive, for example, a compiler developer does not handle these issues in time, and a compiler user does not apply a vulnerability patch of a compiler in time. This results in that both of these methods require a priori knowledge and have not been able to discover problems of unknown type.
2. It is difficult to address architecture related issues. On the one hand, the existing targeted static method does not consider the compiler optimization behavior related to the architecture, and cannot find the problem related to the architecture at all. On the other hand, the dynamic approach also suffers from the following problems: (1) difficulty of deployment of multiple architectures: many methods are difficult to migrate to multi-architecture environments because of the high degree of reliance on architecture-dependent infrastructure; (2) the effort of repeated detection is too great: the dynamic analysis of the binaries of different architectures of the same source code must be independent, which greatly increases the amount of analysis (many of which are duplicated) if multiple architectures are considered, making the coverage problem inherent in dynamic methods more serious.
Disclosure of Invention
The invention aims to: aiming at the defects of the prior art, the invention provides a compiler security testing method based on architecture cross check, which can detect software loopholes introduced by a given compiler (supporting multiple architectures) when generating binary codes.
The core idea of the invention is as follows: since the changes of the security-related code before and after compiling satisfy the architecture consistency, software vulnerabilities introduced by the compiler can be detected by comparing the changes and performing an architecture cross check.
The architecture consistency is met by the security-related code changes before and after compilation because the security-related semantic segments in the system program all have specific functions and are generally architecture independent, and the architecture independence of the security functions determines that the security-related semantic segments should be consistent from architecture to architecture. If inconsistencies occur, binary code representing a certain architecture has a high probability of introducing security problems.
Specifically, the invention comprises the following steps:
step 1, compiling a test program to an architecture-independent IR (intermediate representation ) code using a target compiler (compiler as a test object);
step 2, a stake-inserting target compiler can collect new compiling time information for code correspondence of the following step 3;
step 3, defining a safety related code by using the error processing code and a special model, and analyzing and positioning the safety related code on the IR code;
step 4, continuing to compile the IR codes by using the target compiler after the instrumentation to obtain binary codes of different architectures (namely different central processing unit instruction set architectures, commonly known as x86_64, arm64, arm, risc 64 and the like);
step 5, based on the new compiling time information, corresponding the IR level security related code to a two-level system code;
and 6, performing architecture cross check on semantic state change of the safety-related codes.
The step 2 comprises the following steps:
and collecting new compiling time information in a compiling stage by a compiling flow of the instrumentation compiler, wherein the new compiling time information comprises the following components: representative instruction information (instruction type, instruction parameter, number of basic block where the instruction is located and position in the basic block), IR (intermediate representation code) level control flow information (predecessor and successor basic block numbers of each basic block), correspondence of IR and binary code basic block level (IR basic block numbers corresponding to machine-related code basic blocks), storing collected new compile-time information into debug information of related instructions (newly built custom debug information of the present invention); wherein the representative instructions include the following three: compare and branch instructions, function call instructions, memory access instructions.
In step 2, a new data structure is created in the target compiler for compiling intermediate codes of different levels to store dynamic custom debugging information (different from original debugging information maintained by the compiler) in the compiling process;
the instrumentation compiler architecture-dependent compilation process is initiated at a stage prior to the start of the compilation process, acquires the required IR level of compilation-time information (present in the existing analysis process of the compiler), stores the acquired information in the data structure (here the IR level of compilation-time information, which needs to be guaranteed to pass to the binary level;
the compiling process related to the architecture of the instrumentation compiler is tracked, the original debugging information is processed by the compiler, the user-defined debugging information is processed in the same way, namely, the transfer of the debugging information is tracked when different intermediate codes are converted and when the compiler optimizes the leading-in instruction combination or generation, and finally, the IR-level compiling time information is transferred to the corresponding data structure of the binary code level;
and (3) the process of generating the binary codes by the instrumentation compiler, supplementing the compiling time information of the binary code level again, and storing the stored compiling time information of the binary code level as custom debugging information in a data segment of a binary code file in a uniform format (such as Protobuf format). And then reading the specified data segment of the binary code file according to the specific format to obtain the compiling time information of each instruction in the binary code.
The step 3 comprises the following steps:
step 3-1, for the test source code, collecting interfaces of error processing codes according to the general programming specification, so as to locate the error processing codes;
step 3-2, judging whether the security check is based on the error processing code: determining a conditional check as a security check when it has and only one branch is leading to error handling;
step 3-3, traversing the IR code by taking the variable of the security check operation as a key variable (such as a pointer variable in pointer security check and a divisor variable in 0 check), constructing a definition-use chain, and marking instructions with a use relation on the key variable as the security operation so as to locate the security related code.
The detection framework of the invention can also define various special safety related codes according to actual requirements. Can be used to detect a specific security problem of a certain class. As an example, two semantic model examples are provided herein, and corresponding localization methods (i.e. "dedicated model" described in step 3):
race access to the memory cells (Racy Memory Access). Racy Memory Access is a common type of sensitive operation, and because the compiler models the concurrency inadequately, the compiler is likely to introduce a new Racy Memory Access, causing concurrency problems. Referring to the Lockset analysis of RacerX and RELAY, the racy object (memory object with race access present) can be located in these system software.
Dedicated security related function calls. Common system software provides a sophisticated security-related function call interface. These functions encapsulate some measures to address some specific security issues. The definition of these functions can be found in fixed code modules (such as errno.h and errno-base.h of Linux kernel). On this basis, by constructing a function call graph, related calls in a program and wrapper functions (wrapper functions) of the functions can be located according to the graph relationship, and the function calls are used as safety related instructions.
In step 4, when the target compiler after instrumentation continues to compile the IR code, custom debugging information of the compile-time information is stored.
The step 5 comprises the following steps:
step 5-1, analyzing the binary code obtained in the step 4, and obtaining the corresponding relation between three representative instructions and the IR code (namely, which binary instruction is specifically corresponding to one IR instruction) according to the custom debugging information of the binary code, and judging that the two instructions are in one-to-one correspondence if the custom debugging information of one instruction of the IR code and the corresponding instruction in the binary code are consistent; the code is composed of instructions. The IR instructions are part of the IR code.
And 5-2, analyzing the relation between the instruction which needs to be corresponding and the control flow (the basic block number where the instruction is located, the position relation in the same basic block, the control flow dominant relation among the instructions and the like) and the data flow (the Use-defined chain and the definition-Use chain and the like of related data) of the instruction which needs to be corresponding and the existing representative instruction by utilizing a general static analysis method (which can be realized by the self, an analysis framework of a compiler can also provide an analysis interface), and judging that the two instructions are in one-to-one correspondence according to the relation between the control flow and the data flow, the relation between the three representative instructions and the IR code, the type information (the acquisition mode is a plurality of, such as the type information of the IR instruction is acquired through a compiler analysis interface), and the corresponding relation between the target binary code (the instruction type of the binary instruction is obtained through a common disassembly tool such as objdump) and the IR code if the control flow data flow relation of the code instruction and the control flow of the binary code instruction is consistent and the type is matched with each other.
The step 6 comprises the following steps:
step 6-1, based on the corresponding relation between the target binary instruction and the IR code, judging whether the safety related code positioned in step 3 has a semantic state change after compiling in step 4, wherein the semantic state change comprises: the removal of the safety-related instructions and the sequential exchange of the safety-related instructions are judged in the following way: if one binary instruction does not find an IR instruction as a correspondence, it is determined that it is removed, and if the control flow order between binary instructions is different from the order between the corresponding IR instructions, it is determined that their order is swapped;
and 6-2, comparing the binary codes of different architectures compiled in the step 4, judging whether the semantic state changes of the safety related instructions are consistent, if not, reporting that a suspected safety problem occurs, and submitting a report to an open source code community serving as a test.
In order to perform architectural cross checking for security related semantic state changes in binary codes, the present invention uses IR codes as a bridge for comparison between different architectural binary codes and provides a set of corresponding methods from IR code to binary code.
IR code can be used as a bridge for comparison between different architectural binaries because IR code, as a common intermediate language form for modern compilers, is architecture independent (does not affect architectural cross checking) and facilitates program analysis compared to source code.
The principle that architecture cross checking is feasible is: after the safety related semantic fragments in the same source code are compiled to binary codes of different architectures by a compiler, the state changes of the semantic fragments before and after compiling are consistent; if there is a discrepancy, then there must be some architecture that presents a suspected security problem.
The beneficial effects are that: the invention can automatically find a potential security hole which has wide influence and serious security influence and is difficult to find in some important system software, namely the security hole introduced by a compiler, efficiently (without dynamically running the compiled binary file). The invention introduces the general modeling of the security codes in the compiler testing method for the first time, and overcomes the defect that the prior technical method cannot aim at the security problem or can only aim at the specific security problem. The invention can detect the security problems introduced by the compiler in different architectures in a targeted manner, which is not detected by the prior technical scheme or can not be detected in a targeted manner (can not be detected only in an inefficient and repeated manner for fingers).
Drawings
The foregoing and/or other advantages of the invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings and detailed description.
FIG. 1 is a schematic diagram of a instrumentation compiler.
FIG. 2 is a schematic cross-checking diagram of discriminating changes in the semantic state of security related instructions and implementing an architecture.
Fig. 3 is a flow chart of the method of the present invention.
Detailed Description
The invention provides a compiler security testing method based on architecture cross check, which comprises the following steps:
step 1, compiling a test program to an architecture-independent IR (intermediate representation ) code using a target compiler (compiler as a test object);
step 2, a stake-inserting target compiler can collect new compiling time information for code correspondence of the following step 3;
step 3, defining a safety related code by using the error processing code and a special model, and analyzing and positioning the safety related code on the IR code;
step 4, continuing to compile the IR codes by using the target compiler after the instrumentation to obtain binary codes of different architectures (namely different central processing unit instruction set architectures, commonly known as x86_64, arm64, arm, risc 64 and the like);
step 5, based on the new compiling time information, corresponding the IR level security related code to a two-level system code;
and 6, performing architecture cross check on semantic state change of the safety-related codes.
The step 2 comprises the following steps:
and collecting new compiling time information in a compiling stage by a compiling flow of the instrumentation compiler, wherein the new compiling time information comprises the following components: representative instruction information (instruction type, instruction parameter, number of basic block where the instruction is located and position in the basic block), IR (intermediate representation code) level control flow information (predecessor and successor basic block numbers of each basic block), correspondence of IR and binary code basic block level (IR basic block numbers corresponding to machine-related code basic blocks), storing collected new compile-time information into debug information of related instructions (newly built custom debug information of the present invention); wherein the representative instructions include the following three: compare and branch instructions, function call instructions, memory access instructions.
In step 2, a new data structure is created in the target compiler for compiling intermediate codes of different levels to store dynamic custom debugging information (different from original debugging information maintained by the compiler) in the compiling process;
the instrumentation compiler architecture-dependent compilation process is initiated at a stage prior to the start of the compilation process, acquires the required IR level of compilation-time information (present in the existing analysis process of the compiler), stores the acquired information in the data structure (here the IR level of compilation-time information, which needs to be guaranteed to pass to the binary level;
the compiling process related to the architecture of the instrumentation compiler is tracked, the original debugging information is processed by the compiler, the user-defined debugging information is processed in the same way, namely, the transfer of the debugging information is tracked when different intermediate codes are converted and when the compiler optimizes the leading-in instruction combination or generation, and finally, the IR-level compiling time information is transferred to the corresponding data structure of the binary code level;
and (3) the process of generating the binary codes by the instrumentation compiler, supplementing the compiling time information of the binary code level again, and storing the stored compiling time information of the binary code level as custom debugging information in a data segment of a binary code file in a uniform format (such as Protobuf format). And then reading the specified data segment of the binary code file according to the specific format to obtain the compiling time information of each instruction in the binary code.
The step 3 comprises the following steps:
step 3-1, for the test source code, collecting interfaces of error processing codes according to the general programming specification, so as to locate the error processing codes;
step 3-2, judging whether the security check is based on the error processing code: determining a conditional check as a security check when it has and only one branch is leading to error handling;
step 3-3, traversing the IR code by taking the variable of the security check operation as a key variable (such as a pointer variable in pointer security check and a divisor variable in 0 check), constructing a definition-use chain, and marking instructions with a use relation on the key variable as the security operation so as to locate the security related code.
The detection framework of the invention can also define various special safety related codes according to actual requirements. Can be used to detect a specific security problem of a certain class. As an example, two semantic model examples are provided herein, and corresponding localization methods (i.e. "dedicated model" described in step 3):
race access to the memory cells (Racy Memory Access). Racy Memory Access is a common type of sensitive operation, and because the compiler models the concurrency inadequately, the compiler is likely to introduce a new Racy Memory Access, causing concurrency problems. Referring to the Lockset analysis of RacerX and RELAY, the racy object (memory object with race access present) can be located in these system software.
Dedicated security related function calls. Common system software provides a sophisticated security-related function call interface. These functions encapsulate some measures to address some specific security issues. The definition of these functions can be found in fixed code modules (such as errno.h and errno-base.h of Linux kernel). On this basis, by constructing a function call graph, related calls in a program and wrapper functions (wrapper functions) of the functions can be located according to the graph relationship, and the function calls are used as safety related instructions.
In step 4, when the target compiler after instrumentation continues to compile the IR code, custom debugging information of the compile-time information is stored.
The step 5 comprises the following steps:
step 5-1, analyzing the binary code obtained in the step 4, and obtaining the corresponding relation between three representative instructions and the IR code (namely, which binary instruction is specifically corresponding to one IR instruction) according to the custom debugging information of the binary code, and judging that the two instructions are in one-to-one correspondence if the custom debugging information of one instruction of the IR code and the corresponding instruction in the binary code are consistent; the code is composed of instructions. The IR instructions are part of the IR code.
And 5-2, analyzing the relation between the instruction which needs to be corresponding and the control flow (the basic block number where the instruction is located, the position relation in the same basic block, the control flow dominant relation among the instructions and the like) and the data flow (the Use-defined chain and the definition-Use chain and the like of related data) of the instruction which needs to be corresponding and the existing representative instruction by utilizing a general static analysis method (which can be realized by the self, an analysis framework of a compiler can also provide an analysis interface), and judging that the two instructions are in one-to-one correspondence according to the relation between the control flow and the data flow, the relation between the three representative instructions and the IR code, the type information (the acquisition mode is a plurality of, such as the type information of the IR instruction is acquired through a compiler analysis interface), and the corresponding relation between the target binary code (the instruction type of the binary instruction is obtained through a common disassembly tool such as objdump) and the IR code if the control flow data flow relation of the code instruction and the control flow of the binary code instruction is consistent and the type is matched with each other.
The step 6 comprises the following steps:
step 6-1, based on the corresponding relation between the target binary instruction and the IR code, judging whether the safety related code positioned in step 3 has a semantic state change after compiling in step 4, wherein the semantic state change comprises: the removal of the safety-related instructions and the sequential exchange of the safety-related instructions are judged in the following way: if one binary instruction does not find an IR instruction as a correspondence, it is determined that it is removed, and if the control flow order between binary instructions is different from the order between the corresponding IR instructions, it is determined that their order is swapped;
and 6-2, comparing the binary codes of different architectures compiled in the step 4, judging whether the semantic state changes of the safety related instructions are consistent, if not, reporting that a suspected safety problem occurs, and submitting a report to an open source code community serving as a test.
Examples
As shown in fig. 1, the compiling process of the instrumentation compiler collects representative instruction information, IR level control flow information, IR and binary code basic block level corresponding information at the compiling stage, and stores the information into custom debug information of related instructions.
As shown in fig. 2, the change of the semantic state of the security related instruction is discriminated and the cross check of the architecture is performed. Wherein BISF (back-end idependent semantic fragment) is the safety-related instruction modeled and located according to the method of the present invention. On the basis of locating the safety-related codes, the safety-related codes are corresponding in the binary codes of different architectures, and the change of the semantic states of the safety-related instructions is judged. And finally, cross checking is carried out on the change of the semantic state of the safety-related instruction, and the real safety problem is analyzed and confirmed.
The test case set of the test compiler is the source code of common large-scale open source software, such as Linux kernel, chromium and the like. They typically have good error code specifications to which the modeling of the security related code of the present invention can be applied.
The compiler tested by the invention is an open source compiler with compiler optimization supporting multiple architectures, and commonly comprises multiple languages such as GCC, clang, G++, clang++, and the like.
The architecture referred to herein refers to various central processor instruction set architectures, commonly referred to as x86_64, arm64, arm, riscv64, etc.
IR (Intermediate Representation) code refers to a compiled intermediate file represented by a compiler intermediate representation language, commonly represented by LLVM IR, GCC IR.
Embodiments compile Linux kernel source code into several different architectural binaries (x86_64, arm64, arm, riscv64, mips, ppc 64) using clang.
Referring to fig. 3, the implementation flow of this example is described:
in the preparation phase, linux kernel source code is compiled into IR code (here LLVM IR code) using clang, and the compilation option selects "alloesconfig". As a small amount of architecture related assembly codes exist in the Linux kernel source code, the related source codes are eliminated.
And pile inserting stage. See fig. 1. And collecting representative instruction information, IR level control flow information and IR and binary code basic block level corresponding information in the compiling stage, and storing the information into the custom debugging information of the related instructions. Representative instructions include compare and branch instructions, function call instructions, memory access instructions; their instruction information includes instruction type, instruction parameters, the number of the basic block in which the instruction is located, and the position in the basic block. The IR level control stream information includes a precursor of a basic block and a subsequent basic block number. The IR and binary code basic block level correspondence information includes the Machine IR level basic block and IR basic block correspondence information of LLVM records.
Modeling and analysis stage. In IR codes, the modeling according to the invention, the analysis yields specific safety-relevant instructions.
Corresponding stage. Taking the analysis of whether the security check is removed as an example, as shown in fig. 2, the security check is first corresponded. Determining the state change of the security check before and after compiling according to the corresponding result: if the IR level security check cannot be found to correspond in the binary code, the security check is deemed to be removed by the compiler.
Architecture cross-checking phase. Also taking the example of analyzing whether the security check is removed, if the security check is removed by one architecture in the binary of the different architecture and the security check is not removed by the other architecture after the cross check, the compiler is considered to remove the security check by mistake in the architecture, and thus, the security problem exists. The removed security checks and the involved architecture are analyzed and reported.
The present invention provides a compiler security testing method based on architecture cross checking, and the method and the way for implementing the technical scheme are numerous, the above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several improvements and modifications can be made without departing from the principles of the present invention, and these improvements and modifications should also be regarded as the protection scope of the present invention. The components not explicitly described in this embodiment can be implemented by using the prior art.

Claims (4)

1. A compiler security testing method based on architecture cross check is characterized by comprising the following steps:
step 1, compiling a test program to an IR code irrelevant to an architecture by using a target compiler;
step 2, a pile inserting target compiler is enabled to collect new compiling time information;
step 3, defining a safety related code by using the error processing code and a special model, and analyzing and positioning the safety related code on the IR code;
step 4, continuously compiling the IR codes by using the target compiler after the instrumentation to obtain binary codes with different architectures;
step 5, based on the new compiling time information, corresponding the IR level security related code to a two-level system code;
step 6, performing system structure cross check on semantic state change of the safety related codes;
the step 2 comprises the following steps:
and collecting new compiling time information in a compiling stage by a compiling flow of the instrumentation compiler, wherein the new compiling time information comprises the following components: representative instruction information, IR level control flow information, IR and binary code basic block level correspondence, storing the collected new compile-time information into the debug information of the related instruction; wherein the representative instructions include the following three: comparing and branching instruction, function calling instruction and memory access instruction;
in step 2, creating a data structure for compiling intermediate codes of different levels in a target compiler to store dynamic custom debugging information in the compiling process;
the method comprises the steps of acquiring the required IR level compiling time information at a stage before the beginning of a compiling process related to a instrumentation compiler architecture, and storing the acquired information in the data structure;
the compiling process related to the architecture of the instrumentation compiler is tracked, the original debugging information is processed by the compiler, the user-defined debugging information is processed in the same way, namely, the transfer of the debugging information is tracked when different intermediate codes are converted and when the compiler optimizes the leading-in instruction combination or generation, and finally, the IR-level compiling time information is transferred to the corresponding data structure of the binary code level;
the pile inserting compiler generates binary codes, supplements the compiling time information of the binary code level again, stores the stored compiling time information of the binary code level as custom debugging information into the data section of the binary code file in a unified format, and reads the appointed data section of the binary code file according to the specific format to obtain the compiling time information of each instruction in the binary code;
the step 3 comprises the following steps:
step 3-1, for the test source code, collecting interfaces of error processing codes according to the general programming specification, so as to locate the error processing codes;
step 3-2, judging whether the security check is based on the error processing code: determining a conditional check as a security check when it has and only one branch is leading to error handling;
and 3-3, taking the variable of the security check operation as a key variable, traversing the IR code, constructing a definition-use chain, and marking instructions with use relation to the key variable as security operation so as to locate the security related code.
2. The method of claim 1, wherein in step 4, custom debugging information of compile-time information is stored while continuing to compile IR code using the instrumented target compiler.
3. The method of claim 2, wherein step 5 comprises:
step 5-1, analyzing the binary code obtained in the step 4, and obtaining the corresponding relation between three representative instructions and the IR code according to the custom debugging information of the binary code, and judging that two instructions are in one-to-one correspondence if one instruction of the IR code is consistent with the custom debugging information of the corresponding instruction in the binary code;
and 5-2, analyzing the control flow and data flow relation between the instruction to be corresponded and the existing representative instruction by utilizing a static analysis method, obtaining the corresponding relation between the target binary code and the IR code according to the control flow and data flow relation, the corresponding relation between the three representative instructions and the IR code and the type information of the target instruction, and judging that the two instructions are in one-to-one correspondence if the control flow data flow relation of the IR code instruction and the control flow data flow relation of one binary code instruction are consistent and are matched with each other.
4. A method according to claim 3, wherein step 6 comprises:
step 6-1, based on the corresponding relation between the target binary instruction and the IR code, judging whether the safety related code positioned in step 3 has a semantic state change after compiling in step 4, wherein the semantic state change comprises: the removal of the safety-related instructions and the sequential exchange of the safety-related instructions are judged in the following way: if one binary instruction does not find an IR instruction as a correspondence, it is determined that it is removed, and if the control flow order between binary instructions is different from the order between the corresponding IR instructions, it is determined that their order is swapped;
and 6-2, comparing the binary codes of different architectures compiled in the step 4, judging whether the semantic state changes of the safety related instructions are consistent, if not, reporting that a suspected safety problem occurs, and submitting a report to an open source code community serving as a test.
CN202111128084.3A 2021-09-26 2021-09-26 Compiler security testing method based on architecture cross check Active CN113836023B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111128084.3A CN113836023B (en) 2021-09-26 2021-09-26 Compiler security testing method based on architecture cross check

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111128084.3A CN113836023B (en) 2021-09-26 2021-09-26 Compiler security testing method based on architecture cross check

Publications (2)

Publication Number Publication Date
CN113836023A CN113836023A (en) 2021-12-24
CN113836023B true CN113836023B (en) 2023-06-27

Family

ID=78970363

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111128084.3A Active CN113836023B (en) 2021-09-26 2021-09-26 Compiler security testing method based on architecture cross check

Country Status (1)

Country Link
CN (1) CN113836023B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116383834B (en) * 2023-06-02 2023-08-08 北京邮电大学 Detection method for source code vulnerability detection tool abnormality and related equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733137A (en) * 2020-12-24 2021-04-30 哈尔滨工业大学 Binary code similarity analysis method for vulnerability detection

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101192193A (en) * 2006-11-27 2008-06-04 国际商业机器公司 Method and system for accomplishing observation point
EP2787435A1 (en) * 2013-04-05 2014-10-08 2236008 Ontario Inc. Compilation validation
US9530006B2 (en) * 2014-04-11 2016-12-27 Oracle International Corporation Method and system for performing a memory safety check of a program written in an unmanaged programming language
US9454659B1 (en) * 2014-08-15 2016-09-27 Securisea, Inc. Software vulnerabilities detection system and methods
US10579498B2 (en) * 2016-07-31 2020-03-03 Microsoft Technology Licensing, Llc. Debugging tool for a JIT compiler
CN109634869B (en) * 2018-12-21 2022-02-01 中国人民解放军战略支援部队信息工程大学 Binary translation intermediate representation correctness testing method and device
CN112631893B (en) * 2019-09-24 2022-11-15 无锡江南计算技术研究所 Heterogeneous platform-oriented memory detection method for multi-level storage structure
CN111859388B (en) * 2020-06-30 2022-11-01 广州大学 Multi-level mixed vulnerability automatic mining method
CN113407443A (en) * 2021-06-02 2021-09-17 贝格迈思(深圳)科技有限公司 Efficient fuzzy test method based on GPU binary code translation

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733137A (en) * 2020-12-24 2021-04-30 哈尔滨工业大学 Binary code similarity analysis method for vulnerability detection

Also Published As

Publication number Publication date
CN113836023A (en) 2021-12-24

Similar Documents

Publication Publication Date Title
CN110399300B (en) Python software fuzzy test method based on dynamic type perception
CN101739339B (en) Program dynamic dependency relation-based software fault positioning method
Sridharan et al. Thin slicing
US8898647B2 (en) Method and apparatus for test coverage analysis
US20060253739A1 (en) Method and apparatus for performing unit testing of software modules with use of directed automated random testing
US20230004368A1 (en) Multi-chip compatible compiling method and device
CN101853200B (en) High-efficiency dynamic software vulnerability exploiting method
CN111104335B (en) C language defect detection method and device based on multi-level analysis
CN113497809B (en) MIPS framework vulnerability mining method based on control flow and data flow analysis
JP6342129B2 (en) Source code error position detection apparatus and method for mixed mode program
CN104156311A (en) Embedded type C language target code level unit testing method based on CPU simulator
JPH03188535A (en) Assembly language programming error detecting process
Chowdhury et al. CyFuzz: A differential testing framework for cyber-physical systems development environments
Theodoridis et al. Finding missed optimizations through the lens of dead code elimination
Zhang et al. Intelligen: Automatic driver synthesis for fuzz testing
CN113836023B (en) Compiler security testing method based on architecture cross check
Cheon Automated random testing to detect specification-code inconsistencies
Yang et al. Automatic self-validation for code coverage profilers
Hassan Tackling build failures in continuous integration
CN102662829B (en) Processing method and apparatus for complex data structure in code static state testing
Campbell et al. Extracting behaviour from an executable instruction set model
Liu et al. Towards understanding bugs in python interpreters
CN115080978A (en) Runtime vulnerability detection method and system based on fuzzy test
Yang et al. KernelGPT: Enhanced Kernel Fuzzing via Large Language Models
CN113849814A (en) Configurable system bug reproduction system and reproduction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant