WO2024113831A1 - Memory security management method and device - Google Patents

Memory security management method and device Download PDF

Info

Publication number
WO2024113831A1
WO2024113831A1 PCT/CN2023/103819 CN2023103819W WO2024113831A1 WO 2024113831 A1 WO2024113831 A1 WO 2024113831A1 CN 2023103819 W CN2023103819 W CN 2023103819W WO 2024113831 A1 WO2024113831 A1 WO 2024113831A1
Authority
WO
WIPO (PCT)
Prior art keywords
statement
program
pointer
check
pointer attribute
Prior art date
Application number
PCT/CN2023/103819
Other languages
French (fr)
Chinese (zh)
Inventor
王亚星
林炜鑫
徐茂达
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024113831A1 publication Critical patent/WO2024113831A1/en

Links

Definitions

  • the present application relates to the field of memory management technology, and in particular to a memory security management method and device.
  • pointers In applications written in C or C-like languages, pointers are usually used to achieve flexible access to memory resources. However, if developers use pointers improperly, it will also bring security risks to memory management. The hidden dangers of using pointers to access memory in applications are mainly reflected in spatial memory safety issues, such as null pointer dereference, out-of-bounds read, out-of-bounds write, etc.
  • the solution to the spatial memory safety problem is mainly to modify the source program, manually add some pointer attribute information, use this information to insert check statements when the program is compiled, and perform error processing when the program is running.
  • these inserted check statements are first converted into machine instructions with branch structures during the compilation process, the introduction of these branch structures complicates the logical relationship of the source program, greatly affects the optimization effect of the source program when eliminating redundant code, and brings a large runtime overhead of the application. For some overhead-sensitive fields, such as WiFi chips, routers, etc., it is unacceptable.
  • the present application provides a memory security management method and device, which can improve the optimization effect of the source program during the compilation process, make the runtime overhead of the target executable file within the controllable range of the terminal, and enhance the user experience of using the terminal application.
  • the present application provides a memory safety management method.
  • the method includes: inserting a check statement for checking pointer attributes before a risk statement in a first program; the risk statement is a statement in the first program that calls a pointer for memory access, and the pointer attribute is the attribute of the pointer in the risk statement; performing redundant code elimination on the first converted program and the check statement respectively to obtain a first eliminated program and a second eliminated program; wherein the first converted program is an intermediate representation generated by compiling the first program; performing machine instruction conversion during the compilation process on the second eliminated program to obtain a second converted program; generating a target execution file based on at least the second converted program and the first eliminated program; wherein the target execution file is used to generate fault information at runtime, and the fault information includes the pointer attribute of the pointer in the first risk statement, and the first risk statement is one of the risk statements in the first program.
  • the redundant code of the first converted program is first eliminated, and then the check statement is implemented with the conversion of machine instructions in the compilation process.
  • the redundant code elimination stage of the compilation process the logical relationship structure of the first converted program is not changed, thereby improving the optimization effect of the first converted program in the compilation process.
  • redundant code elimination is performed on the first converted program and the check statement respectively, including: by determining the check object and the check range of the check statement, all identical redundant codes and all partially-ordered redundant codes in the check statement are eliminated to obtain a third eliminated program; the check object and the check range of the check statement are obtained based on at least the risk statement; by hashing the check object of the check statement in the third eliminated program, part of the identical redundant codes and part of the partially-ordered redundant codes in the third eliminated program are eliminated to obtain a second eliminated program.
  • redundant code elimination is also implemented for check statements, further improving the overall optimization effect of the target executable file and effectively reducing the runtime overhead of the application.
  • a check statement for checking pointer attributes is inserted before a risk statement in a first program, including: inserting a label statement after a pointer definition statement in the first program; the label statement is used to obtain pointer attributes of a pointer in the risk statement; inserting a pointer attribute storage statement after the label statement; the pointer attribute storage statement is used to save the pointer attributes obtained by the label statement, and the pointer attribute storage statement contains pointer attribute variables, which are variables used to represent pointer attributes; the pointer attribute storage statement is a first custom statement recognizable by the compilation process; based on the risk statement and the pointer attribute storage statement, a check statement for performing pointer attribute check on the risk statement is determined; wherein the risk statement is used to determine a check object of the check statement, and the pointer attribute variables contained in the pointer attribute storage statement and the pointer attributes saved in the pointer attribute storage statement are used to determine a check scope of the check statement; wherein the check statement is a second custom statement recognizable by the compilation process; and inserting
  • pointer attributes in real time and establishing a connection between pointer attribute storage statements and check statements, the correctness of pointer attributes used in check statements can be maintained, underreporting can be reduced, and more complete security assurance capabilities can be provided.
  • the method further includes: eliminating redundant codes in the pointer attribute storage statement and eliminating pointer attribute variables contained in the redundant codes in the pointer attribute storage statement to obtain a fourth eliminated program; wherein the pointer attributes stored in the redundant codes in the pointer attribute storage statement and the pointer attribute variables contained in the redundant codes in the pointer attribute storage statement are used to determine the inspection scope of the redundant codes in the check statement; generating a target execution file based on at least the second converted program and the first eliminated program, including: generating a target execution file based on the second converted program, the first eliminated program and the fourth eliminated program.
  • the first program is a program written in C or a C-like language.
  • the present application provides a memory safety management device.
  • the device includes: a processing module, which is used to insert a check statement for checking pointer attributes before a risk statement in a first program; the risk statement is a statement in the first program that calls a pointer for memory access, and the pointer attribute is the attribute of the pointer in the risk statement; the processing module is also used to eliminate redundant code from the first converted program and the check statement, respectively, to obtain a first eliminated program and a second eliminated program; wherein the first converted program is an intermediate representation generated by compiling the first program; the processing module is also used to implement the conversion of machine instructions during the compilation process on the second eliminated program, to obtain a second converted program; the processing module is also used to generate a target execution file based on at least the second converted program and the first eliminated program; wherein the target execution file is used to generate fault information at runtime, and the fault information includes the pointer attribute of the pointer in the first risk statement, and the first risk statement is one of the risk statements in the first program.
  • the processing module when the processing module performs redundant code elimination on the first converted program and the check statement respectively, it is used to: eliminate all identical redundant codes and all partially ordered redundant codes in the check statement by judging the check object and the check range of the check statement to obtain a third eliminated program; the check object and the check range of the check statement are obtained based on at least the risk statement; and eliminate part of the identical redundant codes and part of the partially ordered redundant codes in the third eliminated program by hashing the check object of the check statement in the third eliminated program to obtain a second eliminated program.
  • the processing module when the processing module inserts a check statement for checking pointer attributes before a risk statement in a first program, it is used to: insert a label statement after a pointer definition statement of the first program; the label statement is used to obtain the pointer attributes of the pointer in the risk statement; insert a pointer attribute storage statement after the label statement; the pointer attribute storage statement is used to save the pointer attributes obtained by the label statement, and the pointer attribute storage statement contains pointer attribute variables, which are variables used to represent pointer attributes; the pointer attribute storage statement is a first custom statement recognizable by the compilation process; based on the risk statement and the pointer attribute storage statement, a check statement for performing a pointer attribute check on the risk statement is determined; wherein the risk statement is used to determine the check object of the check statement, and the pointer attribute variables contained in the pointer attribute storage statement and the pointer attributes saved in the pointer attribute storage statement are used to determine the check scope of the check statement; wherein the check statement is a second custom statement recognizable by the compilation process
  • the processing module is used to: eliminate the redundant codes in the pointer attribute storage statement, and eliminate the pointer attribute variables contained in the redundant codes in the pointer attribute storage statement, to obtain a fourth eliminated program; wherein the pointer attributes stored in the redundant codes in the pointer attribute storage statement, and the pointer attribute variables contained in the redundant codes in the pointer attribute storage statement, are used to determine the inspection scope of the redundant codes in the check statement; when the processing module generates a target executable file based on at least the second converted program and the first eliminated program, the processing module is used to: generate a target executable file based on the second converted program, the first eliminated program, and the fourth eliminated program.
  • the first program is a program written in C or a C-like language.
  • the present application provides an electronic device comprising: at least one memory for storing programs; and at least one processor for executing programs stored in the memory; wherein, when the program stored in the memory is executed, the processor is used to execute the method described in the first aspect or any possible implementation of the first aspect.
  • the present application provides a computer-readable storage medium, which stores a computer program.
  • the computer program runs on a processor, the processor executes the method described in the first aspect or any possible implementation of the first aspect.
  • the present application provides a computer program product.
  • the computer program product runs on a processor
  • the processor executes the method described in the first aspect or any possible implementation of the first aspect.
  • FIG1 is a schematic diagram of a process of code compilation and optimization using a branch structure form check statement
  • Fig. 2 is a control flow chart after checking the converted sentence structure
  • FIG3 is a system architecture diagram of a memory security management method provided in an embodiment of the present application.
  • FIG4 is a flow chart of a memory security management method provided in an embodiment of the present application.
  • FIG5a is a flow chart of a conventional VRAP algorithm processing provided by an embodiment of the present application.
  • FIG5 b is a flowchart of a customized VRAP algorithm processing provided in an embodiment of the present application.
  • FIG6a is a flowchart of a conventional PRE algorithm processing provided by an embodiment of the present application.
  • FIG6 b is a flowchart of a customized PRE algorithm processing provided in an embodiment of the present application.
  • FIG7 is a flow chart of a conventional DCE algorithm processing provided by an embodiment of the present application.
  • FIG8 is a flow chart of a source program compilation provided by an embodiment of the present application.
  • FIG9 is an implementation architecture diagram of a memory security management method provided in an embodiment of the present application.
  • FIG10 is a flowchart of a memory security management method provided in an embodiment of the present application.
  • FIG11 is an implementation architecture diagram of a memory security management method provided in an embodiment of the present application.
  • FIG12 is an implementation architecture diagram of a memory security management method provided in an embodiment of the present application.
  • FIG. 13 is a schematic diagram of the hardware structure of a memory security management device provided in an embodiment of the present application.
  • FIG. 14 is a schematic diagram of the hardware structure of a memory security management device provided in an embodiment of the present application.
  • words such as “exemplary” or “for example” are used to indicate examples or illustrations. Any embodiment or design described as “exemplary” or “for example” in the embodiments of the present application should not be interpreted as being more preferred or more advantageous than other embodiments or designs. Specifically, the use of words such as “exemplary” or “for example” is intended to present related concepts in a specific way.
  • the term "and/or” is merely a description of the association relationship of associated objects, indicating that three relationships may exist.
  • a and/or B may represent: A exists alone, B exists alone, and A and B exist at the same time.
  • the term “multiple” means two or more.
  • multiple systems refers to two or more systems
  • multiple screen terminals refers to two or more screen terminals.
  • first and second are used for descriptive purposes only and should not be understood as indicating or implying relative importance or implicitly indicating the indicated technical features. Therefore, a feature defined as “first” or “second” may explicitly or implicitly include one or more of the features.
  • the terms “include”, “comprises”, “has” and their variations all mean “including but not limited to”, unless otherwise specifically emphasized.
  • C-like languages refer to languages that perform memory operations through pointers and pointer operations like C.
  • Figure 1 is a schematic diagram of the process of code compilation and optimization using branch structure form check statements.
  • the developer uses C or C-like language to write the source program to implement the application solution.
  • a large number of application pointers in this source program implement the access and reading and writing of the terminal memory.
  • the following steps need to be performed:
  • Step S100 determine the intermediate program, which includes the check statement and source program inserted before the pointer memory access risk statement. The execution of these risky statements may bring pointer space memory security risks. Step S100 may specifically include the following sub-steps S101-S103:
  • Step S101 write the source program according to the requirements of the application solution, and perform pointer attribute annotation on all statements using pointers in the source program.
  • These pointer attribute annotations are mainly used to obtain the length and boundary information of the memory block pointed to by the pointer.
  • This pointer attribute annotation can be similar to a positioning instruction, such as: where p: count(n), indicating that the length of the memory block pointed to by pointer p is n.
  • the compiler provides supporting lexical and semantic analysis to extract the information represented by the pointer attribute annotation, such as: the length of the memory block pointed to by p is 5, and the boundary is [p, p+5).
  • Step S102 in the source program with added pointer attribute annotations, search for risk statements that use pointers to access memory, and insert check statements before these risk statements.
  • space-based memory safety mainly includes three categories: null pointer dereference, out-of-bounds read, and out-of-bounds write.
  • Step S103 after the compilation process extracts the information indicated by the pointer attribute annotation, the pointer attribute annotation is removed to obtain the check statement and source program including the pointer to form an intermediate program.
  • the intermediate program is determined, which includes the check statements inserted at the pointer memory access points and the source program, and then the compilation phase of the intermediate program is entered.
  • Step S110 using compilation technology to perform intermediate conversion of the compilation process, redundant code optimization, and target code generation on the intermediate program with the check statement inserted.
  • common compilation technology in order to make the intermediate program meet the compilation requirements, it is necessary to perform code conversion on the intermediate program after the check statement is inserted according to grammatical and semantic rules in the early stage of compilation.
  • a common conversion method is to convert it into a machine instruction combination of conventional comparison instructions and jump instructions. This conversion expands the check statement from a branchless single instruction structure to a branch execution structure, thereby causing a great change in the logical structure of the source program in the intermediate program.
  • FIG2 is a control flow graph after the check statement structure is converted.
  • the check statements check(i ⁇ upper) and check(j ⁇ upper) in the form of "pseudocode" are respectively inserted into the pointers p[i] and p[j] used in the source program.
  • the check statements are converted into a combination of conventional comparison instructions cmp and jump instructions jump according to syntax and semantics, so that the check statements present a branch mode.
  • the optimization algorithm is used to optimize the redundant code, and in the late stage of compilation, the target code is generated, and finally a binary target execution file with check statements is formed for technical personnel to debug, modify and maintain the code.
  • the check instruction is triggered to meet the conditions during the runtime of the target execution file, the execution program will show a core dump problem caused by the jump exit operation, and the program will terminate. Developers can use the debug information in the binary file output by the program termination to restore the diagnostic information for debugging.
  • the diagnostic information includes the line number and column number of the diagnosed program.
  • Value range analysis and propagation is a code optimization algorithm that uses the value range information of variables to delete redundant branches and redundant expressions.
  • the redundant branches can be deleted based on the comparison result, and the expressions on the redundant branches are deleted together.
  • the value range of the compared object in the comparison statement is updated.
  • Partial redundancy elimination is a code optimization algorithm that eliminates redundant expressions by using the hash result of expressions in such partial redundancy scenarios.
  • Dead code elimination is a code optimization algorithm that deletes codes that have no effect on the program's running results.
  • step S110 in the early stage of compilation, the intermediate program after the insertion of the check statement is uniformly converted according to the requirements of syntax and semantics, including the conversion of the branch mode of the check statement, which will make the control flow graph complex.
  • each inserted check statement will bring an extra node and an extra path, and code optimization algorithms such as VRAP, PRE, and DCE all rely on pattern matching under CFG.
  • code optimization algorithms such as VRAP, PRE, and DCE all rely on pattern matching under CFG.
  • the original code optimization mode is no longer valid and cannot play the original optimization role, resulting in a greatly reduced optimization effect
  • the target executable file finally generated is extremely bloated, and cannot be widely used in terminal systems with limited memory resources.
  • an embodiment of the present application provides a memory safety management method.
  • the check statements and other codes in the intermediate program are stripped, and the step of converting the check statements from a single instruction structure to a branch structure machine instruction based on grammatical and semantic rules in the early stage of compilation is postponed to the later stage of compilation for code generation.
  • a customized VRAP, PRE, and DCE code optimization algorithm is used to eliminate redundancy and optimize the code of the source program and the inserted check statements in the CFG, which can effectively reduce the impact of the inserted check statements on the redundant optimization of the source program, so that the consumption of memory resources by the target execution file finally generated is controlled within a reasonable range, and the runtime overhead is significantly reduced.
  • the cross-statement propagation of pointer attributes is not considered, and the mapping between the check statement and the pointer attribute is wrong, resulting in missed reports.
  • the pointer attributes in the source program are marked in step S101, thereby obtaining the length and boundary function of the memory area pointed to by the pointer, the boundary is not stored and updated in real time, so that the boundary attributes obtained when the subsequent check statement is checked are still the old version. The old version of the boundary information is brought into the check statement, which will lead to invalid checks and then missed reports.
  • pointer p its length is marked as 5, and it can be known that the length of the memory block pointed to by pointer p is [p, p+5).
  • the embodiment of the present application provides a pointer attribute storage statement, stores the results of pointer attribute annotation in real time, and establishes a direct mapping between pointer attributes and check statements, which can maintain the correctness of pointer attributes used in check statements, reduce underreporting, and provide more complete security capabilities.
  • the pointer attribute storage statement can also be redundancy eliminated, further improving the effect of eliminating redundant code.
  • FIG3 is a system architecture diagram of a memory safety management method provided by an embodiment of the present application.
  • the system architecture diagram includes five parts: a language definition module 300, a pointer attribute annotation module 310, a check statement insertion module 320, a code optimization module 330, and a target code generation module 340.
  • the specific functions of each module are described as follows:
  • the language definition module 300 describes the source program, pointer attribute annotations, pointer attribute storage statements, check statements, and the syntax and semantic rules of various intermediate representations (IR) during the compilation process.
  • the intermediate representation refers to the internal representation generated after the compilation process scans the source program, representing the semantic and grammatical structure of the source program. Each stage of the compilation process is analyzed or optimized on the IR. In the actual compilation process, from the beginning of the code compilation stage until the target execution file is generated, multiple progressive intermediate representations can be generated according to the compilation process.
  • the pointer attribute annotation module 310 completes the pointer attribute annotations of all pointer definition statements in the source program.
  • the pointer attribute information generated by pointer attribute annotation can also be used to provide the necessary information needed to solve such problems.
  • the check statement insertion module 320 selects potential risk statements involving memory operations during the compilation process, and inserts pointer check statements before all potential risk statements. During the insertion process, if necessary, pointer attribute storage statements can also be used to establish pointer attribute storage, as well as to implement direct mapping between pointer attribute storage statements and inserted pointer check statements. After completing the insertion of the check statement, the pointer attribute annotation needs to be deleted.
  • check statements need to be designed for the three check requirements of null, crossing the upper bound, and crossing the lower bound during the compilation process.
  • check statements are in single instruction format and are executed in a branchless structure. At the same time, they must comply with the semantic and grammatical rules supported by the compilation process.
  • risky statements that use pointers to access memory are detected, and designed check statements are inserted before each risky statement.
  • the check statement insertion module adds a pointer attribute storage submodule 321 and a pointer attribute mapping submodule 322 during the compilation stage.
  • the pointer attribute storage submodule 321 creates a pointer attribute storage statement to form an intermediate representation of the compilation process, and stores the pointer attribute information identified by the pointer attribute annotation.
  • the pointer attribute mapping submodule 322 associates the inserted check statement with the pointer attribute storage statement to achieve correct mapping between the pointer check statement and the pointer attribute information.
  • the code optimization module 330 optimizes the source program, the inserted check statements, and the inserted pointer attribute storage statements to shorten the running time and occupy less space, etc., under the premise of ensuring functional equivalence, so as to achieve functional improvement of the target code.
  • the code optimization module 330 includes three parts: a conventional code optimization submodule 331, a check statement elimination submodule 332, and a pointer attribute elimination submodule 333.
  • the conventional code optimization submodule 331 is used to convert the intermediate representation of the source program written in C or C-like language during the programming process, and optimize the redundant code.
  • the code optimization module 330 adds a check statement elimination submodule 332 and a pointer attribute elimination submodule 333 during the compilation phase.
  • the check statement elimination submodule 332 eliminates relevant redundant check statements by designing a customized value range analysis and propagation algorithm and a customized partial redundancy algorithm.
  • the pointer attribute elimination submodule 333 eliminates redundant pointer attribute storage statements and pointer attribute variables that have lost reference relationships through the design of a customized dead code elimination algorithm.
  • the pointer attribute storage statement is used to store pointer attribute information
  • the pointer attribute variable is a variable defined by the pointer attribute storage statement, which is used to represent the attributes of the pointer.
  • the pointer attribute storage statement associates the pointer attribute information with the check statement of the pointer.
  • the target code generation module 340 converts the check statements retained in the optimized code into machine instructions, and combines the source program and pointer attribute storage statements after redundancy elimination to finally generate target code that can be supported by the terminal.
  • the target code generation module 340 includes a check statement expansion submodule 341 and a regular code generation submodule 342, wherein the regular code generation submodule 342 is used to finally convert the source program and pointer attribute storage statements after redundancy elimination into machine target code.
  • the target code generation module 340 adds a check statement expansion submodule 341 in the compilation stage, which is used to expand the check statements retained after redundancy elimination.
  • the expansion process converts the check statements from a single instruction format to a branch instruction format, wherein the single instruction format is a design statement executed without a branch structure, and the branch instruction format is a machine instruction executed with a branch structure, including a combination of comparison statements and jump statements.
  • the output results of the inspection statement expansion submodule 341 and the conventional code generation submodule 342 are combined to obtain a target execution file that meets the application requirements.
  • FIG4 is a flow chart of a memory security management method provided by an embodiment of the present application. As shown in FIG4 , the method includes the following steps S401-S404, which are specifically analyzed as follows:
  • Step S401 inserting a check statement for checking pointer attributes before a risk statement in the first program;
  • the risk statement is a statement in the first program that calls a pointer for memory access, and the pointer attribute is an attribute of the pointer in the risk statement.
  • a first program written according to the application scheme needs to be obtained based on C or C-like language editing on a hardware platform, the first program being the source program described in FIG3 , and the hardware platform can be an independent PC, a server connected to the network, or any user input terminal platform that can be edited in C or C-like language.
  • the first program obtained by editing since a large number of pointer operations are used, when using pointers for memory access, destructive access to the memory space may be caused due to improper pointer operations, thereby seriously affecting the security and reliability of the program.
  • a centralized development environment can be used to centrally implement the editing and compiling processes of the first program on the same platform, or the editing of the first program can be implemented on one platform.
  • the edited first program is connected to a compilation environment on another platform.
  • a pointer attribute annotation statement is added after the pointer definition statement manually or in a compilation environment.
  • the pointer attribute is extracted by the syntax and lexical analysis technology matched with the pointer attribute annotation statement, and then the pointer attribute storage statement is used to store the pointer attribute.
  • the pointer attribute refers to the length, boundary, upper boundary address of the memory space, lower boundary address of the memory space, etc. of the memory block pointed to by the pointer, which is necessary information required for the space class memory security check, and is obtained by the lexical and grammatical analysis of the implicit information in the first program.
  • the storage of the pointer attribute information is completed by the pointer attribute storage submodule 321.
  • a pointer annotation statement for obtaining pointer attributes and a pointer attribute storage statement for storing each pointer information need to be inserted.
  • the inserted pointer attribute storage statement is the first custom statement recognizable by the compilation process.
  • the check statement in order to ensure the security of memory access, after implementing the insertion of pointer attribute storage statements, it is necessary to find the risk statements generated by using pointers for memory access in the first program through the compilation environment, and insert the pointer check statement before the risk statement.
  • the check object of the check statement is the pointer used in the risk statement.
  • the inserted check statement is the second custom statement that can be identified by the compilation process.
  • the check statement is usually designed in a single instruction format and executed in a sequential manner.
  • the single instruction format is manifested in that a custom check statement is designed to represent a check rule.
  • the check statement in the single instruction format has the significant advantages of not changing CFG, not affecting code optimization technology, and thus greatly reducing the performance overhead of the target execution file.
  • the risk statement represents those execution statements that may have spatial memory safety. In order to ensure the security of memory access, it is necessary to insert the check statement before the execution of these risk statements. It can be understood that the risk statement can represent a memory access statement, a memory access statement involving pointer operations, and a pointer operation statement.
  • a direct mapping between the check statement and the pointer attribute storage statement is adopted.
  • the direct mapping is manifested in that the check range used in the created check statement is determined by the variable used to represent the pointer attribute information defined in the pointer attribute storage statement and the pointer attribute stored in the pointer attribute storage statement. Compared with the table building and table lookup methods commonly used in the prior art, this direct mapping has a significant advantage of extremely low memory overhead. As shown in FIG3 , the direct mapping between the check statement and the pointer attribute storage statement is completed by the pointer attribute mapping submodule 322.
  • Step S402 performing redundant code elimination on the first converted program and the check statement respectively to obtain a first eliminated program and a second eliminated program; wherein the first converted program is an intermediate representation generated by compiling the first program.
  • a program after adding a pointer attribute storage statement and a check statement of a user-defined type single instruction format is brought into this link.
  • the check statement is regarded as "pseudocode”.
  • a conventional code optimization scheme is adopted to compile the first program according to the grammatical and semantic analysis rules defined in the compilation process, generate an intermediate representation in the compilation process, obtain a first converted program, and perform redundancy optimization on the first converted program to obtain a first eliminated program of the first converted program.
  • the conventional code optimization process is completed by a conventional code optimization submodule 331.
  • the conventional code optimization process and the code compilation and optimization process scheme in FIG1 are similar in processing and processing effects on the source program, and are not described in detail here.
  • check statement insertion module 320 in Figure 3 can also be combined to regard the check statement as an auxiliary statement of the first program, add the logical relationship associated with the check statement, and adapt the conventional code optimization technology to achieve a deeper elimination purpose. This research direction is not the focus of this application and will not be elaborated here.
  • the pointer attribute storage statement is a custom statement inserted during the compilation process.
  • the syntax format of this custom statement can also be inconsistent with C or C-like languages. In this case, in the regular code optimization process of the first program, it can be regarded as "pseudocode" like the check statement.
  • the syntax format of this custom statement is regarded as consistent with C or C-like languages. In the regular code optimization stage, it does not affect the code optimization of the first program. It can be understood that if the syntax format of this custom statement is inconsistent with C or C-like languages, it will be regarded as "pseudocode" when the first program is optimized in the regular code optimization stage. In either case, it will not affect the regular code optimization result of the first program.
  • redundant codes in the check statements are eliminated according to the customized value range analysis and propagation algorithm to obtain a third eliminated program.
  • redundant codes in the check statements in the third eliminated program are further eliminated to obtain a second eliminated program.
  • pointer attribute storage statements and pointer attribute variables that are directly mapped to the eliminated check statements are eliminated to obtain a fourth eliminated program, which is specifically described as follows:
  • redundant codes can be divided into full redundancy and partial redundancy according to the control flow relationship, and can be divided into identical redundancy and partial order redundancy according to the inclusion relationship.
  • customized design of redundant optimization algorithms is performed for these redundant codes.
  • a customized VRAP algorithm is designed and implemented to eliminate all identical redundancy and all partial order redundancy.
  • the check statement is confirmed to meet the conditions by solving and propagating the value range of each checked statement. If it is definitely satisfied, then this check statement is all identical redundancy or all partial order redundancy and is deleted; if it is definitely not satisfied, then there will definitely be a security problem, and a static check error will be reported, providing debugging information to the developer to assist in debugging and modification; if it is impossible to confirm whether it is satisfied, then the check statement will be retained until the runtime stage for real-time dynamic monitoring, and the value range information can be updated according to the inspection range.
  • FIG. 5 a is a flowchart of a conventional VRAP algorithm processing provided in an embodiment of the present application
  • FIG. 5 b is a flowchart of a customized VRAP algorithm processing provided in an embodiment of the present application.
  • the redundancy check process of the conventional VRAP algorithm is used: the value range of the variable i is obtained, conditional judgment is performed, and redundant branches are deleted.
  • the redundancy check process of the customized VRAP algorithm is used: confirm whether the requirements are met, if they are definitely met, the redundant check is deleted, if they are definitely not met, a static error is reported, and if they are not necessarily met, the value range is updated according to the check statement.
  • the present application provides a customized value range analysis and propagation algorithm to achieve the elimination of check statements.
  • the customization is mainly reflected in: "judging the inspection object and inspection scope of the check statement” replaces “judging the comparison result of the comparison statement”, and "deleting the check statement and updating the value range of the check object at the same time” replaces "deleting the comparison statement and updating the value range of the comparison object at the same time”.
  • design and implement the PRE algorithm to eliminate partial identical redundancy and partial partial order redundancy treat the check pointer object as the keyword of the hash, hash all the check statements, and when encountering two check statements with the same keyword, judge the check range of the two to confirm whether it is the same relationship, partial order relationship or unsatisfied relationship, and then combine the principles of correctness, security, computational optimality, and life cycle optimality to determine whether it is partial redundancy.
  • this check statement is redundant, and some paths are deleted by lifting the check statement; if it is an unsatisfied relationship under partial redundancy, there will definitely be security issues, and static check errors will be reported, providing debugging information to developers to assist them in debugging and modification; if it is other cases, the check statement will be retained until the runtime link for real-time monitoring.
  • FIG. 6 a is a flowchart of a conventional PRE algorithm processing provided in an embodiment of the present application
  • FIG. 6 b is a flowchart of a customized PRE algorithm processing provided in an embodiment of the present application.
  • FIG6a (a) belongs to a fully redundant calculation. Deleting the a+b operation of the left branch will not have any effect, and the previous calculation result c is used to replace a+b; FIG6a (b) belongs to a partially redundant calculation. The left branch only executes the a+b operation once, and there is no redundancy. The right branch has redundant calculations.
  • FIG6a (c) is a common cyclic redundancy, which can be attributed to a special case of partially redundant calculations, that is, due to different loop times, the execution times of a+b on different paths are different.
  • the customized PRE algorithm targets the redundant scenarios (b), (c), and (d) in FIG6b .
  • check(i) as the hash keyword
  • check(i ⁇ len1) and check(i ⁇ len2) are regarded as the same hash object.
  • the flag bit solving method is modified to implement the check statement, that is, the renaming, insertion, deletion, and movement of the check instruction.
  • the following table lists the core flag bits of the customized PRE algorithm:
  • the present application provides a customized partial redundancy elimination algorithm to achieve the elimination of check statements.
  • the customization is mainly reflected in: “hashing the check object in the check statement” replaces “hashing the entire expression”, and “comparing different check ranges of the same check object to eliminate the same redundancy and partial order redundancy” replaces "comparing the same statement to only eliminate the same redundancy”.
  • Pointer attribute variables are defined in pointer attribute storage statements (called definition points) and used in check statements (called use points). After a large number of redundant check statements are eliminated, many pointer attribute storage statements and pointer attribute variables are no longer useful. There is no more usage point, so it has no value and becomes dead code, which needs further redundancy elimination.
  • FIG7 is a flow chart of a conventional DCE algorithm processing provided by an embodiment of the present application.
  • the customized DCE algorithm provided by an embodiment of the present application will be described below in conjunction with FIG7.
  • reverse data flow analysis is performed to determine that a variable is useless if it no longer has a use point.
  • the object-oriented is different, and the object-oriented is redundant pointer attribute storage statements and pointer attribute variables.
  • the present application provides designed check statements and pointer attribute storage statements to support the customization of this DCE algorithm.
  • the elimination of redundant check statements is completed by the check statement elimination submodule 332, and then the elimination of redundant pointer attribute storage statements is completed by the pointer attribute elimination submodule 333.
  • the check statement elimination submodule 332 can be executed after the execution of the conventional code optimization submodule 331, or it can be executed synchronously with the conventional code optimization submodule 331, which is not set here.
  • the static debugging file generated during the execution of the code optimization module 330 can be fed back to the program developer, and the program developer's modifications to the source program, check statements, pointer attribute assignment statements, pointer attribute variables, etc. can be received.
  • the static debugging file can be the compilation errors generated by the regular code optimization submodule 331 when the first program is subjected to syntax, semantic conversion and redundant code optimization, including syntax errors, memory access errors, command line errors, etc., or can be the optimization logic errors, syntax errors, etc. generated during the redundant elimination of check statements by the design redundancy elimination submodule 332 and the redundant elimination of pointer attribute storage statements by the pointer attribute elimination submodule 333.
  • Step S403 converting the machine instructions in the compilation process on the second eliminated program to obtain a second converted program.
  • Step S404 generating a target executable file based at least on the second converted program and the first eliminated program; wherein the target executable file is used to generate fault information at runtime, the fault information including the pointer attribute of the pointer in the first risk statement, and the first risk statement is one of the risk statements in the first program.
  • each check statement is expanded into a combination of a comparison and a jump by the check statement expansion submodule 341 to obtain a second converted program.
  • the fourth elimination program generated after redundant elimination of pointer attribute storage statements and pointer attribute variables is also a part of generating the target execution file, the fourth elimination program needs to be added before generating the target execution file, and then, based on the second converted program, the second eliminated program and the fourth eliminated program, a binary code containing a pointer checking function is generated.
  • the dynamic debugging file generated by the target execution file during runtime can be fed back to the program developer, and the program developer's modifications to the source program, check statements, pointer attribute assignment statements, pointer attribute variables, etc. can be received.
  • the dynamic debugging file contains fault information caused by calling pointers for memory access, including the location of risky statements, pointer execution code that generates faults, pointer attributes, pointer out-of-bounds types, etc.
  • FIG8 is a flowchart of a source program compilation provided by an embodiment of the present application, and the code in the source program is described according to the execution process of steps S401-S404, which is specifically described as follows:
  • An intermediate program is obtained based on a source program; a source program is a program written according to an application solution; the intermediate program includes a source program, a pointer attribute storage statement for storing pointer attributes of a pointer in the source program, pointer attribute variables defined in the pointer attribute storage statement, and a check statement for checking pointer attributes of a pointer in a risk statement; wherein the pointer attributes include the length and boundary information of a memory block pointed to by the pointer, and the risk statement is a statement in the source program that uses a pointer to access memory.
  • the source program is converted into an intermediate representation generated by a compilation process to obtain a first converted program.
  • Redundant codes are eliminated from the first converted program to obtain a first eliminated program.
  • the redundant codes in the check statement are eliminated to obtain a third eliminated program.
  • the redundant codes existing in the third eliminated program are further eliminated to obtain a second eliminated program.
  • redundant codes that have lost reference relationships in pointer attribute storage statements and pointer attribute variables are eliminated to obtain a fourth eliminated program.
  • the redundant codes that have lost reference relationships are pointer attribute storage statements and pointer attribute variables that are directly mapped to the redundant codes in the check statement, and the direct mapping maps the pointer attribute variables and the stored pointer attributes contained in the redundant codes that have lost reference relationships to the check range of the redundant codes in the check statement.
  • the second eliminated program is subjected to the conversion of the machine instructions in the compilation process to obtain a second converted program.
  • the source program is compiled and converted and the conventional code is optimized, and the redundancy elimination and execution order conversion of the check statements are performed. These two steps can be performed simultaneously or successively. Changing the execution order will not affect the final compilation result.
  • a target execution file is generated.
  • FIG9 is an implementation architecture diagram of a memory safety management method provided by an embodiment of the present application.
  • a modified integrated development link is designed, and the compiler 900 and the graphical user interface 910 are used to optimize the source program.
  • the two submodules of the pointer attribute storage submodule 901 and the pointer attribute mapping submodule 902 are used to insert the check statement before the risk statement;
  • the three submodules of the conventional code optimization submodule 903, the check statement elimination submodule 904 and the pointer attribute elimination submodule 905 the binary code containing the check function is obtained through the check statement expansion submodule 806 and the conventional code generation submodule 907; at the same time, some static check error information is obtained, which is passed to the developer through the interactive error reporting and debugging submodule 911 in the graphical user interface 910 to assist in debugging and modifying; the binary code is input into the real
  • FIG. 10 is a flow chart of a memory security management method provided by an embodiment of the present application. As shown in FIG. 10 , steps S1000-S1008 are implemented, and each step is specifically described as follows:
  • Step S1000 extracting pointer attribute information in the source program through lexical and grammatical analysis.
  • Step S1001 creating a pointer attribute variable and a pointer attribute storage statement to store pointer attribute information.
  • Step S1002 designing a check statement in a single instruction format using pointer attribute variables, and inserting the check statement before risk statements such as memory access.
  • Step S1003 Calling conventional code optimization technology to perform performance tuning.
  • Step S1004 using a customized VRAP algorithm to eliminate all identical redundant and all partially redundant check statements, while performing static check errors.
  • Step S1005 using a customized PRE algorithm to eliminate partially identical redundant and partially partially redundant check statements, while performing static check error reporting.
  • Step S1006 Eliminate redundant pointer attribute storage statements without usage points using a customized DCE algorithm.
  • Step S1007 expand each check statement into a combination of a comparison instruction and a jump instruction.
  • Step S1008 generating target execution code by combining the code obtained by performing conventional code optimization processing on the source program.
  • Figure 11 is an implementation architecture diagram of a memory safety management method provided by an embodiment of the present application. As shown in Figure 11, it is a modified compiler, and its front-end, middle-end, and back-end parts are optimized and designed respectively.
  • the front-end module 1100 it includes a pointer attribute storage submodule 1101 and a pointer attribute mapping submodule 1102, which are used to store pointer attributes and establish the correct mapping of pointer attributes and check statements;
  • the middle-end module 1110 it includes a conventional code optimization submodule 1111, a check statement elimination submodule 1112, and a pointer attribute elimination submodule 1113.
  • the static error result is output to the developer in the form of debugging information to assist them in debugging and modifying the code; in the back-end module 1120, it includes a check statement expansion submodule 1121 and a conventional code generation submodule 1122, which are used to expand the designed single instruction format check statement into a combination of comparison instructions and jump instructions.
  • a binary code containing a check function can be obtained.
  • debugging information can be provided to the operation and maintenance personnel to assist them in debugging and modifying. Except that the module division and connection method are different from the real-time architecture diagram shown in Figure 9, the specific operation of each step is consistent with the introduction in the flowchart shown in Figure 10, and will not be repeated here.
  • Figure 12 is an implementation architecture diagram of a memory safety management method provided by an embodiment of the present application. As shown in Figure 12, it is a modified program analysis tool, which includes two parts: static analysis and dynamic analysis.
  • the static analysis module 1200 it includes a pointer attribute storage submodule 1201, a pointer attribute mapping submodule 1202, a conventional code optimization submodule 1203, a check statement elimination submodule 1204, and a pointer attribute elimination submodule 1205.
  • Five modules are implemented to achieve the correct mapping of pointer attributes and check statements.
  • the static error report results can also be output to the developer in the form of debugging information to assist in debugging and modifying the code; in the dynamic analysis module 1210, it includes a check statement expansion submodule 1211 and a conventional code generation submodule 1212, which can generate binary code containing a check function, and then input it into the simulation execution environment together with the simulation case, and obtain some error information output to the developer to assist in debugging and modifying the code. Except that the module division and connection method are different from the real-time architecture diagram shown in Figure 9, the specific operation of each step is consistent with the introduction in the flow chart shown in Figure 10, and will not be repeated here.
  • an embodiment of the present application also provides a memory security management device.
  • FIG13 is a schematic diagram of the hardware structure of a memory security management device provided in an embodiment of the present application.
  • the memory security management device 1300 includes: a processing module 1301, and the specific module functions are described as follows:
  • the processing module 1301 inserts a check statement for checking pointer attributes before the risk statement in the first program.
  • the risk statement is In a statement that calls a pointer to access memory, the pointer attributes are the attributes of the pointer in the risk statement.
  • the processing module 1301 also performs redundant code elimination on the first converted program and the check statement to obtain a first eliminated program and a second eliminated program; wherein the first converted program is an intermediate representation generated by compiling the first program.
  • the processing module 1301 further performs conversion of machine instructions in the compilation process on the second eliminated program to obtain a second converted program.
  • the processing module 1301 also generates a target execution file based on at least the second converted program and the first eliminated program; wherein the target execution file is used to generate fault information at runtime, and the fault information includes the pointer attribute of the pointer in the first risk statement, and the first risk statement is one of the risk statements in the first program.
  • the processing module 1301 when the processing module 1301 performs redundant code elimination on the first converted program and the check statement respectively, the check object and the check range of the check statement are judged, and the completely identical redundant code and the completely partial order redundant code in the check statement are eliminated to obtain the third eliminated program; the check object and the check range of the check statement are obtained based on at least the risk statement.
  • the check object of the check statement in the third eliminated program By hashing the check object of the check statement in the third eliminated program, some identical redundant code and some partial order redundant code in the third eliminated program are eliminated to obtain the second eliminated program.
  • a label statement is inserted after the pointer definition statement of the first program; the label statement is used to obtain the pointer attributes of the pointer in the risk statement; a pointer attribute storage statement is inserted after the label statement; the pointer attribute storage statement is used to save the pointer attributes obtained by the label statement, and the pointer attribute storage statement contains pointer attribute variables, which are variables used to represent pointer attributes; the pointer attribute storage statement is a first custom statement recognizable by the compilation process; based on the risk statement and the pointer attribute storage statement, a check statement for performing pointer attribute check on the risk statement is determined; wherein, the risk statement is used to determine the check object of the check statement, and the pointer attribute variables contained in the pointer attribute storage statement and the pointer attributes saved in the pointer attribute storage statement are used to determine the check scope of the check statement; wherein the check statement is a second custom statement recognizable by the compilation process; a check
  • the processing module 1301 after the processing module 1301 eliminates redundant codes from the first converted program and the check statement respectively, it eliminates the redundant codes in the pointer attribute storage statement and the pointer attribute variables contained in the redundant codes in the pointer attribute storage statement to obtain a fourth eliminated program; wherein, the pointer attributes stored in the redundant codes in the pointer attribute storage statement and the pointer attribute variables contained in the redundant codes in the pointer attribute storage statement are used to determine the inspection scope of the redundant codes in the check statement; when the processing module 1301 generates a target executable file based at least on the second converted program and the first eliminated program, it is used to: generate a target executable file based on the second converted program, the first eliminated program and the fourth eliminated program.
  • the first program is a program written in C or a C-like language.
  • FIG14 is a schematic diagram of the hardware structure of a memory security management device provided in an embodiment of the present application.
  • the network device 1400 may be the above-mentioned memory security management device.
  • the network device 1400 includes a processor 1410, a memory 1420, a communication interface 1430, and a bus 1440, and the processor 1410, the memory 1420, and the communication interface 1430 are connected to each other via the bus 1440.
  • the processor 1410, the memory 1420, and the communication interface 1430 may also be connected in other connection modes besides the bus 1440.
  • the memory 1420 can be various types of storage media, such as random access memory (RAM), read-only memory (ROM), non-volatile RAM (NVRAM), programmable ROM (PROM), erasable PROM (EPROM), electrically erasable PROM (EEPROM), flash memory, optical storage, hard disk, etc.
  • RAM random access memory
  • ROM read-only memory
  • NVRAM non-volatile RAM
  • PROM programmable ROM
  • EPROM erasable PROM
  • EEPROM electrically erasable PROM
  • flash memory optical storage, hard disk, etc.
  • the processor 1410 may be a general-purpose processor, which may be a processor that performs specific steps and/or operations by reading and executing the contents stored in a memory (e.g., memory 1420).
  • the general-purpose processor may be a central processing unit (CPU).
  • the processor 1410 may include at least one circuit to perform all or part of the steps of the memory security management method provided in the embodiment shown in FIG. 4 or FIG. 9.
  • the communication interface 1430 includes an input/output (I/O) interface, a physical interface, and a logical interface, etc., which are used to interconnect devices within the network device 1400, and an interface for interconnecting the network device 1400 with other devices (such as other network devices or user equipment).
  • the physical interface can be an Ethernet interface, a fiber optic interface, an ATM interface, etc.
  • the bus 1440 may be any type of communication bus for interconnecting the processor 1410 , the memory 1420 , and the communication interface 1430 , such as a system bus.
  • the above devices may be arranged on independent chips, or at least partially or completely on the same chip. Whether to arrange each device independently on different chips or to integrate them on one or more chips often depends on the needs of product design.
  • the embodiments of the present application do not limit the specific implementation form of the above devices.
  • the network device 1400 shown in FIG. 14 is merely exemplary. During implementation, the network 1400 may further include other components, which are not listed one by one in this document.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions can be transmitted from one website site, computer, server or data center to another website site, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.).
  • the computer-readable storage medium can be any available medium that can be accessed by the computer or a data storage device such as a server or data center that includes one or more available media integrated.
  • the available medium can be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid state drive (SSD)), etc.

Landscapes

  • Debugging And Monitoring (AREA)
  • Stored Programmes (AREA)

Abstract

Disclosed is a memory security management method, comprising: inserting a test statement for checking a pointer attribute before a risk statement in a first program, wherein the risk statement is a statement for calling a pointer to perform memory access in the first program, and the pointer attribute is an attribute of the pointer in the risk statement; separately performing redundant code elimination on a first converted program and the test statement to obtain a first eliminated program and a second eliminated program, wherein the first converted program is an intermediate representation generated by compiling the first program; converting, on the second eliminated program, a machine instruction in a compiling process to obtain a second converted program; and generating a target execution file at least on the basis of the second converted program and the first eliminated program. According to the method, in the compiling process, redundancy optimization is performed on the first program before the test statement performs compiling conversion, so that the code optimization performance in the compiling process can be improved.

Description

一种内存安全管理方法及设备A memory security management method and device
本申请要求于2022年11月30日提交中国国家知识产权局、申请号为202211535703.5、申请名称为“一种内存安全管理方法及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed with the State Intellectual Property Office of China on November 30, 2022, with application number 202211535703.5 and application name “A memory security management method and device”, the entire contents of which are incorporated by reference in this application.
技术领域Technical Field
本申请涉及内存管理技术领域,尤其涉及一种内存安全管理方法及设备。The present application relates to the field of memory management technology, and in particular to a memory security management method and device.
背景技术Background technique
在C或类C语言编写的应用程序中,通常会运用指针来实现对内存资源的灵活访问。但如果开发人员对指针运用不当,也会给内存管理带来安全隐患。在应用程序中运用指针对内存访问的隐患主要体现在空间类内存安全问题上,比如空指针解引用、越界读、越界写等。In applications written in C or C-like languages, pointers are usually used to achieve flexible access to memory resources. However, if developers use pointers improperly, it will also bring security risks to memory management. The hidden dangers of using pointers to access memory in applications are mainly reflected in spatial memory safety issues, such as null pointer dereference, out-of-bounds read, out-of-bounds write, etc.
目前,对于空间类内存安全问题的解决,主要是通过对源程序改造,人工加入一些指针属性信息,利用这些信息进行程序编译时的检查语句插入,在程序运行时进行报错处理。然而,由于插入的这些检查语句在编译过程中,都是先转换成分支结构的机器指令,这些分支结构的引入将源程序的逻辑关系复杂化,极大的影响源程序在冗余代码消除时的优化效果,带来很大的应用程序运行时开销,对于一些开销敏感的领域,比如wifi芯片、路由器等,是难以接受的。At present, the solution to the spatial memory safety problem is mainly to modify the source program, manually add some pointer attribute information, use this information to insert check statements when the program is compiled, and perform error processing when the program is running. However, since these inserted check statements are first converted into machine instructions with branch structures during the compilation process, the introduction of these branch structures complicates the logical relationship of the source program, greatly affects the optimization effect of the source program when eliminating redundant code, and brings a large runtime overhead of the application. For some overhead-sensitive fields, such as WiFi chips, routers, etc., it is unacceptable.
发明内容Summary of the invention
本申请提供一种内存安全管理方法及设备,能提高源程序在编译过程中的优化效果,使目标执行文件运行时开销达到终端可控范围之内,提升用户使用终端应用的体验效果。The present application provides a memory security management method and device, which can improve the optimization effect of the source program during the compilation process, make the runtime overhead of the target executable file within the controllable range of the terminal, and enhance the user experience of using the terminal application.
第一方面,本申请提供一种内存安全管理方法。该方法包括:在第一程序中的风险语句之前插入检查指针属性的检查语句;风险语句是第一程序中调用指针进行内存访问的语句,指针属性是风险语句中指针的属性;对第一转换后程序和检查语句分别进行冗余代码消除,得到第一消除后程序和第二消除后程序;其中,第一转换后程序是对第一程序进行编译产生的中间表示;对第二消除后程序实施编译过程中机器指令的转换,得到第二转换后程序;至少基于第二转换后程序、第一消除后程序,生成目标执行文件;其中,目标执行文件用于在运行时生成故障信息,故障信息包含第一风险语句中指针的指针属性,第一风险语句是第一程序中的风险语句中的一个。In the first aspect, the present application provides a memory safety management method. The method includes: inserting a check statement for checking pointer attributes before a risk statement in a first program; the risk statement is a statement in the first program that calls a pointer for memory access, and the pointer attribute is the attribute of the pointer in the risk statement; performing redundant code elimination on the first converted program and the check statement respectively to obtain a first eliminated program and a second eliminated program; wherein the first converted program is an intermediate representation generated by compiling the first program; performing machine instruction conversion during the compilation process on the second eliminated program to obtain a second converted program; generating a target execution file based on at least the second converted program and the first eliminated program; wherein the target execution file is used to generate fault information at runtime, and the fault information includes the pointer attribute of the pointer in the first risk statement, and the first risk statement is one of the risk statements in the first program.
由此,通过对检查语句的编译转换过程进行改变,先进行第一转换后程序冗余代码消除,再对检查语句实施编译过程中机器指令的转换,从而,在编译过程的冗余代码消除阶段,不改变第一转换后程序的逻辑关系结构,提升第一转换后程序在编译过程的优化效果。Therefore, by changing the compilation conversion process of the check statement, the redundant code of the first converted program is first eliminated, and then the check statement is implemented with the conversion of machine instructions in the compilation process. Thus, in the redundant code elimination stage of the compilation process, the logical relationship structure of the first converted program is not changed, thereby improving the optimization effect of the first converted program in the compilation process.
在一种可能的实现方式中,对第一转换后程序和检查语句分别进行冗余代码消除,包括:通过对检查语句的检查对象和检查范围进行判断,将检查语句中的全相同冗余代码及全偏序冗余代码消除,得到第三消除后程序;检查语句的检查对象和检查范围至少基于风险语句得到;通过对第三消除后程序中检查语句的检查对象进行哈希,将第三消除后程序中的部分相同冗余代码及部分偏序冗余代码消除,得到第二消除后程序。In one possible implementation, redundant code elimination is performed on the first converted program and the check statement respectively, including: by determining the check object and the check range of the check statement, all identical redundant codes and all partially-ordered redundant codes in the check statement are eliminated to obtain a third eliminated program; the check object and the check range of the check statement are obtained based on at least the risk statement; by hashing the check object of the check statement in the third eliminated program, part of the identical redundant codes and part of the partially-ordered redundant codes in the third eliminated program are eliminated to obtain a second eliminated program.
由此,在编译过程中,对检查语句也实施冗余代码消除,进一步提升目标执行文件整体的优化效果,有效降低应用程序运行时开销。Therefore, during the compilation process, redundant code elimination is also implemented for check statements, further improving the overall optimization effect of the target executable file and effectively reducing the runtime overhead of the application.
在一种可能的实现方式中,在第一程序中的风险语句之前插入检查指针属性的检查语句,包括:在第一程序的指针定义语句之后插入标注语句;标注语句用于获取风险语句中指针的指针属性;在标注语句之后插入指针属性存储语句;指针属性存储语句用于保存标注语句获取的指针属性,指针属性存储语句包含指针属性变量,指针属性变量是用于表示指针属性的变量;指针属性存储语句是编译过程可识别的第一自定义语句;基于风险语句和指针属性存储语句,确定对风险语句进行指针属性检查的检查语句;其中,风险语句用于确定检查语句的检查对象,指针属性存储语句包含的指针属性变量和指针属性存储语句保存的指针属性用于确定检查语句的检查范围;其中,检查语句是编译过程可识别的第二自定义语句;在风险语句之前插入检查指针属性的检查语句。 In a possible implementation, a check statement for checking pointer attributes is inserted before a risk statement in a first program, including: inserting a label statement after a pointer definition statement in the first program; the label statement is used to obtain pointer attributes of a pointer in the risk statement; inserting a pointer attribute storage statement after the label statement; the pointer attribute storage statement is used to save the pointer attributes obtained by the label statement, and the pointer attribute storage statement contains pointer attribute variables, which are variables used to represent pointer attributes; the pointer attribute storage statement is a first custom statement recognizable by the compilation process; based on the risk statement and the pointer attribute storage statement, a check statement for performing pointer attribute check on the risk statement is determined; wherein the risk statement is used to determine a check object of the check statement, and the pointer attribute variables contained in the pointer attribute storage statement and the pointer attributes saved in the pointer attribute storage statement are used to determine a check scope of the check statement; wherein the check statement is a second custom statement recognizable by the compilation process; and inserting a check statement for checking pointer attributes before the risk statement.
由此,通过对指针属性进行实时存储,以及建立指针属性存储语句和检查语句的联系,能够维护检查语句中所使用的指针属性的正确性,降低漏报,提供更完善的安全保障能力。Therefore, by storing pointer attributes in real time and establishing a connection between pointer attribute storage statements and check statements, the correctness of pointer attributes used in check statements can be maintained, underreporting can be reduced, and more complete security assurance capabilities can be provided.
在一种可能的实现方式中,对第一转换后程序和检查语句分别进行冗余代码消除之后,还包括:将指针属性存储语句中的冗余代码进行消除,以及将指针属性存储语句中的冗余代码中包含的指针属性变量进行消除,得到第四消除后程序;其中,指针属性存储语句中的冗余代码保存的指针属性,以及指针属性存储语句中的冗余代码中包含的指针属性变量,用于确定检查语句中的冗余代码的检查范围;至少基于第二转换后程序、第一消除后程序,生成目标执行文件,包括:基于第二转换后程序、第一消除后程序和第四消除后程序,生成目标执行文件。In a possible implementation, after eliminating redundant codes from the first converted program and the check statement respectively, the method further includes: eliminating redundant codes in the pointer attribute storage statement and eliminating pointer attribute variables contained in the redundant codes in the pointer attribute storage statement to obtain a fourth eliminated program; wherein the pointer attributes stored in the redundant codes in the pointer attribute storage statement and the pointer attribute variables contained in the redundant codes in the pointer attribute storage statement are used to determine the inspection scope of the redundant codes in the check statement; generating a target execution file based on at least the second converted program and the first eliminated program, including: generating a target execution file based on the second converted program, the first eliminated program and the fourth eliminated program.
由此,在编译过程中,对指针属性存储语句、指针属性存储语句中包含的指针属性变量也实施冗余代码消除,进一步提升目标执行文件整体的优化效果,有效降低应用程序运行时开销。Therefore, during the compilation process, redundant code elimination is also implemented for pointer attribute storage statements and pointer attribute variables contained in pointer attribute storage statements, further improving the overall optimization effect of the target executable file and effectively reducing the runtime overhead of the application.
在一种可能的实现方式中,第一程序是使用C或类C语言编写的程序。In a possible implementation manner, the first program is a program written in C or a C-like language.
第二方面,本申请提供一种内存安全管理设备。该设备包括:处理模块,用于在第一程序中的风险语句之前插入检查指针属性的检查语句;风险语句是第一程序中调用指针进行内存访问的语句,指针属性是风险语句中指针的属性;处理模块,还用于对第一转换后程序和检查语句分别进行冗余代码消除,得到第一消除后程序和第二消除后程序;其中,第一转换后程序是对第一程序进行编译产生的中间表示;处理模块,还用于对第二消除后程序实施编译过程中机器指令的转换,得到第二转换后程序;处理模块,还用于至少基于第二转换后程序、第一消除后程序,生成目标执行文件;其中,目标执行文件用于在运行时生成故障信息,故障信息包含第一风险语句中指针的指针属性,第一风险语句是第一程序中的风险语句中的一个。In a second aspect, the present application provides a memory safety management device. The device includes: a processing module, which is used to insert a check statement for checking pointer attributes before a risk statement in a first program; the risk statement is a statement in the first program that calls a pointer for memory access, and the pointer attribute is the attribute of the pointer in the risk statement; the processing module is also used to eliminate redundant code from the first converted program and the check statement, respectively, to obtain a first eliminated program and a second eliminated program; wherein the first converted program is an intermediate representation generated by compiling the first program; the processing module is also used to implement the conversion of machine instructions during the compilation process on the second eliminated program, to obtain a second converted program; the processing module is also used to generate a target execution file based on at least the second converted program and the first eliminated program; wherein the target execution file is used to generate fault information at runtime, and the fault information includes the pointer attribute of the pointer in the first risk statement, and the first risk statement is one of the risk statements in the first program.
在一种可能的实现方式中,处理模块对第一转换后程序和检查语句分别进行冗余代码消除时,用于:通过对检查语句的检查对象和检查范围进行判断,将检查语句中的全相同冗余代码及全偏序冗余代码消除,得到第三消除后程序;检查语句的检查对象和检查范围至少基于风险语句得到;通过对第三消除后程序中检查语句的检查对象进行哈希,将第三消除后程序中的部分相同冗余代码及部分偏序冗余代码消除,得到第二消除后程序。In one possible implementation, when the processing module performs redundant code elimination on the first converted program and the check statement respectively, it is used to: eliminate all identical redundant codes and all partially ordered redundant codes in the check statement by judging the check object and the check range of the check statement to obtain a third eliminated program; the check object and the check range of the check statement are obtained based on at least the risk statement; and eliminate part of the identical redundant codes and part of the partially ordered redundant codes in the third eliminated program by hashing the check object of the check statement in the third eliminated program to obtain a second eliminated program.
在一种可能的实现方式中,处理模块在第一程序中的风险语句之前插入检查指针属性的检查语句时,用于:在第一程序的指针定义语句之后插入标注语句;标注语句用于获取风险语句中指针的指针属性;在标注语句之后插入指针属性存储语句;指针属性存储语句用于保存标注语句获取的指针属性,指针属性存储语句包含指针属性变量,指针属性变量是用于表示指针属性的变量;指针属性存储语句是编译过程可识别的第一自定义语句;基于风险语句和指针属性存储语句,确定对风险语句进行指针属性检查的检查语句;其中,风险语句用于确定检查语句的检查对象,指针属性存储语句包含的指针属性变量和指针属性存储语句保存的指针属性用于确定检查语句的检查范围;其中,检查语句是编译过程可识别的第二自定义语句;在风险语句之前插入检查指针属性的检查语句。In a possible implementation, when the processing module inserts a check statement for checking pointer attributes before a risk statement in a first program, it is used to: insert a label statement after a pointer definition statement of the first program; the label statement is used to obtain the pointer attributes of the pointer in the risk statement; insert a pointer attribute storage statement after the label statement; the pointer attribute storage statement is used to save the pointer attributes obtained by the label statement, and the pointer attribute storage statement contains pointer attribute variables, which are variables used to represent pointer attributes; the pointer attribute storage statement is a first custom statement recognizable by the compilation process; based on the risk statement and the pointer attribute storage statement, a check statement for performing a pointer attribute check on the risk statement is determined; wherein the risk statement is used to determine the check object of the check statement, and the pointer attribute variables contained in the pointer attribute storage statement and the pointer attributes saved in the pointer attribute storage statement are used to determine the check scope of the check statement; wherein the check statement is a second custom statement recognizable by the compilation process; and the check statement for checking pointer attributes is inserted before the risk statement.
在一种可能的实现方式中,处理模块对第一转换后程序和检查语句分别进行冗余代码消除之后,用于:将指针属性存储语句中的冗余代码进行消除,以及将指针属性存储语句中的冗余代码中包含的指针属性变量进行消除,得到第四消除后程序;其中,指针属性存储语句中的冗余代码保存的指针属性,以及指针属性存储语句中的冗余代码中包含的指针属性变量,用于确定检查语句中的冗余代码的检查范围;处理模块至少基于第二转换后程序、第一消除后程序生成目标执行文件时,用于:基于第二转换后程序、第一消除后程序和第四消除后程序,生成目标执行文件。In a possible implementation, after the processing module eliminates redundant codes from the first converted program and the check statement respectively, the processing module is used to: eliminate the redundant codes in the pointer attribute storage statement, and eliminate the pointer attribute variables contained in the redundant codes in the pointer attribute storage statement, to obtain a fourth eliminated program; wherein the pointer attributes stored in the redundant codes in the pointer attribute storage statement, and the pointer attribute variables contained in the redundant codes in the pointer attribute storage statement, are used to determine the inspection scope of the redundant codes in the check statement; when the processing module generates a target executable file based on at least the second converted program and the first eliminated program, the processing module is used to: generate a target executable file based on the second converted program, the first eliminated program, and the fourth eliminated program.
在一种可能的实现方式中,第一程序是使用C或类C语言编写的程序。In a possible implementation manner, the first program is a program written in C or a C-like language.
第三方面,本申请提供一种电子设备,包括:至少一个存储器,用于存储程序;至少一个处理器,用于执行存储器存储的程序;其中,当存储器存储的程序被执行时,处理器用于执行上述第一方面或第一方面的任意一种可能的实现方式所描述的方法。In a third aspect, the present application provides an electronic device comprising: at least one memory for storing programs; and at least one processor for executing programs stored in the memory; wherein, when the program stored in the memory is executed, the processor is used to execute the method described in the first aspect or any possible implementation of the first aspect.
第四方面,本申请提供一种计算机可读存储介质,计算机可读存储介质内存储有计算机程序,当计算机程序在处理器上运行时,使得处理器执行上述第一方面或第一方面的任意一种可能的实现方式所描述的方法。 In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program. When the computer program runs on a processor, the processor executes the method described in the first aspect or any possible implementation of the first aspect.
第五方面,本申请提供一种计算机程序产品,当计算机程序产品在处理器上运行时,使得处理器执行第一方面或第一方面的任一种可能的实现方式所描述的方法。In a fifth aspect, the present application provides a computer program product. When the computer program product runs on a processor, the processor executes the method described in the first aspect or any possible implementation of the first aspect.
可以理解的是,上述第二方面至第五方面的有益效果可以参见上述第一方面中的相关描述,在此不再赘述。It can be understood that the beneficial effects of the second to fifth aspects mentioned above can be found in the relevant description of the first aspect mentioned above, and will not be repeated here.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1是一种利用分支结构形式检查语句进行代码编译和优化的过程示意图;FIG1 is a schematic diagram of a process of code compilation and optimization using a branch structure form check statement;
图2是一种检查语句结构转换后的控制流程图;Fig. 2 is a control flow chart after checking the converted sentence structure;
图3是本申请实施例提供的一种内存安全管理方法的系统架构图;FIG3 is a system architecture diagram of a memory security management method provided in an embodiment of the present application;
图4是本申请实施例提供的一种内存安全管理方法的流程图;FIG4 is a flow chart of a memory security management method provided in an embodiment of the present application;
图5a是本申请实施例提供的一种常规VRAP算法处理的流程图;FIG5a is a flow chart of a conventional VRAP algorithm processing provided by an embodiment of the present application;
图5b是本申请实施例提供的一种定制化VRAP算法处理的流程图;FIG5 b is a flowchart of a customized VRAP algorithm processing provided in an embodiment of the present application;
图6a是本申请实施例提供的一种常规PRE算法处理的流程图;FIG6a is a flowchart of a conventional PRE algorithm processing provided by an embodiment of the present application;
图6b是本申请实施例提供的一种定制化PRE算法处理的流程图;FIG6 b is a flowchart of a customized PRE algorithm processing provided in an embodiment of the present application;
图7是本申请实施例提供的一种常规DCE算法处理的流程图;FIG7 is a flow chart of a conventional DCE algorithm processing provided by an embodiment of the present application;
图8是本申请实施例提供的一种源程序编译的流程图;FIG8 is a flow chart of a source program compilation provided by an embodiment of the present application;
图9是本申请实施例提供的一种内存安全管理方法的实施架构图;FIG9 is an implementation architecture diagram of a memory security management method provided in an embodiment of the present application;
图10是本申请实施例提供的一种内存安全管理方法的流程图;FIG10 is a flowchart of a memory security management method provided in an embodiment of the present application;
图11是本申请实施例提供的一种内存安全管理方法的实施架构图;FIG11 is an implementation architecture diagram of a memory security management method provided in an embodiment of the present application;
图12是本申请实施例提供的一种内存安全管理方法的实施架构图;FIG12 is an implementation architecture diagram of a memory security management method provided in an embodiment of the present application;
图13是本申请实施例提供的一种内存安全管理设备的硬件结构示意图;13 is a schematic diagram of the hardware structure of a memory security management device provided in an embodiment of the present application;
图14是本申请实施例提供的一种内存安全管理设备的硬件结构示意图。FIG. 14 is a schematic diagram of the hardware structure of a memory security management device provided in an embodiment of the present application.
具体实施方式Detailed ways
为了使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图,对本申请实施例中的技术方案进行描述。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described below in conjunction with the accompanying drawings.
在本申请实施例的描述中,“示例性的”或者“例如”等词用于表示作例子或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。In the description of the embodiments of the present application, words such as "exemplary" or "for example" are used to indicate examples or illustrations. Any embodiment or design described as "exemplary" or "for example" in the embodiments of the present application should not be interpreted as being more preferred or more advantageous than other embodiments or designs. Specifically, the use of words such as "exemplary" or "for example" is intended to present related concepts in a specific way.
在本申请实施例的描述中,术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,单独存在B,同时存在A和B这三种情况。另外,除非另有说明,术语“多个”的含义是指两个或两个以上。例如,多个系统是指两个或两个以上的系统,多个屏幕终端是指两个或两个以上的屏幕终端。In the description of the embodiments of the present application, the term "and/or" is merely a description of the association relationship of associated objects, indicating that three relationships may exist. For example, A and/or B may represent: A exists alone, B exists alone, and A and B exist at the same time. In addition, unless otherwise specified, the term "multiple" means two or more. For example, multiple systems refers to two or more systems, and multiple screen terminals refers to two or more screen terminals.
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多该特征。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。In addition, the terms "first" and "second" are used for descriptive purposes only and should not be understood as indicating or implying relative importance or implicitly indicating the indicated technical features. Therefore, a feature defined as "first" or "second" may explicitly or implicitly include one or more of the features. The terms "include", "comprises", "has" and their variations all mean "including but not limited to", unless otherwise specifically emphasized.
为了解决在C或类C语言编写的应用程序中,通过指针及指针运算进行内存操作时,可能导致的空间类内存安全管理隐患,比如空指针解引用、越界读、越界写等,需要在程序编译和调试阶段尽可能的定位并优化可疑指针,以及在程序运行时输出报错指针的相关信息,包括报错指针在应用程序中所处的位置,报错指针的错误类型,报错指针的出错形态等,以方便开发人员和运维人员对应用程序进行调试、修改和维护。其中,类C语言就是指也像C语言一样,通过指针及指针运算进行内存操作的语言。In order to solve the potential memory security management risks that may arise when memory operations are performed through pointers and pointer operations in applications written in C or C-like languages, such as null pointer dereference, out-of-bounds read, out-of-bounds write, etc., it is necessary to locate and optimize suspicious pointers as much as possible during program compilation and debugging, and output relevant information about error pointers when the program is running, including the location of the error pointer in the application, the error type of the error pointer, the error form of the error pointer, etc., to facilitate developers and operation and maintenance personnel to debug, modify and maintain the application. Among them, C-like languages refer to languages that perform memory operations through pointers and pointer operations like C.
图1是一种利用分支结构形式检查语句进行代码编译和优化的过程示意图。如图1所示,开发人员基于应用方案的设计和实现原理,采用C或类C语言编写实现该应用方案的源程序,此源程序中大量应用指针实现对终端内存的访问和读写。为了尽可能消除指针和内存交互所可能产生的空间类内存安全管理隐患,在一种可选的技术方案中,需要执行以下步骤:Figure 1 is a schematic diagram of the process of code compilation and optimization using branch structure form check statements. As shown in Figure 1, based on the design and implementation principles of the application solution, the developer uses C or C-like language to write the source program to implement the application solution. A large number of application pointers in this source program implement the access and reading and writing of the terminal memory. In order to eliminate the potential space memory security management risks that may arise from the interaction between pointers and memory as much as possible, in an optional technical solution, the following steps need to be performed:
步骤S100,确定中间程序,中间程序包括在指针内存访问风险语句之前插入的检查语句和源程序。这 些风险语句的执行可能带来指针空间类内存安全隐患,步骤S100具体可以包括以下子步骤S101-S103:Step S100, determine the intermediate program, which includes the check statement and source program inserted before the pointer memory access risk statement. The execution of these risky statements may bring pointer space memory security risks. Step S100 may specifically include the following sub-steps S101-S103:
步骤S101,按照应用方案的要求编写源程序,对源程序中所有使用指针的语句进行指针属性标注,这些指针属性标注主要用于获取指针指向的内存块长度与边界信息。这种指针属性标注可以类似于一种定位指令,比如:where p:count(n),表示指针p指向的内存块长度为n。在随后的编译阶段,编译程序提供配套的词法、语义分析,将指针属性标注表示的信息提取出来,比如:p指向的内存块长度为5,边界为[p,p+5)。Step S101, write the source program according to the requirements of the application solution, and perform pointer attribute annotation on all statements using pointers in the source program. These pointer attribute annotations are mainly used to obtain the length and boundary information of the memory block pointed to by the pointer. This pointer attribute annotation can be similar to a positioning instruction, such as: where p: count(n), indicating that the length of the memory block pointed to by pointer p is n. In the subsequent compilation stage, the compiler provides supporting lexical and semantic analysis to extract the information represented by the pointer attribute annotation, such as: the length of the memory block pointed to by p is 5, and the boundary is [p, p+5).
步骤S102,在增加了指针属性标注的源程序中,查找利用指针进行内存访问的风险语句,并在这些风险语句之前插入检查语句。如上文,空间类内存安全主要包括空指针解引用、越界读和越界写三类,为了消除指针可能引起的隐患,需要在可能发生空指针解引用的内存访问风险语句之前插入判空检查语句,比如插入一种常见的判空检查语句,check(p!=null),表示对指针访问内存空间的检查,对于越界读和越界写两类问题需要在指针的内存访问风险语句之前插入越界检查,比如插入一种常见的越界检查语句,check(p>=p),表示对指针访问下界的检查,check(p<p+5),表示对指针访问上界的检查。如果一种指针既有解引用隐患,也有越界隐患,则需要先插入判空检查语句,再继续插入越界检查语句。由此,通过采取插入检查语句描述所判定的条件是否满足,对程序中所使用的指针可能引起的空间类内存问题进行检查,以上提到的这些检查语句在源程序中基于“伪码”形式出现,这些“伪码”是编译过程可识别的自定义语句格式。Step S102, in the source program with added pointer attribute annotations, search for risk statements that use pointers to access memory, and insert check statements before these risk statements. As mentioned above, space-based memory safety mainly includes three categories: null pointer dereference, out-of-bounds read, and out-of-bounds write. In order to eliminate the hidden dangers that may be caused by pointers, it is necessary to insert a null check statement before the memory access risk statement where null pointer dereference may occur. For example, insert a common null check statement, check(p!=null), which indicates the check of pointer access to memory space. For the two types of problems of out-of-bounds read and out-of-bounds write, it is necessary to insert an out-of-bounds check before the pointer's memory access risk statement. For example, insert a common out-of-bounds check statement, check(p>=p), which indicates the check of pointer access to the lower bound, and check(p<p+5), which indicates the check of pointer access to the upper bound. If a pointer has both dereference risks and out-of-bounds risks, it is necessary to insert a null check statement first, and then continue to insert an out-of-bounds check statement. Therefore, by inserting check statements to describe whether the determined conditions are met, the space memory problems that may be caused by the pointers used in the program are checked. The above-mentioned check statements appear in the source program in the form of "pseudocode", which is a custom statement format that can be recognized by the compilation process.
步骤S103,编译过程将指针属性标注表示的信息提取出来后,去除指针属性标注,得到包括关于指针的检查语句和源程序,组成中间程序。Step S103, after the compilation process extracts the information indicated by the pointer attribute annotation, the pointer attribute annotation is removed to obtain the check statement and source program including the pointer to form an intermediate program.
由此,确定了中间程序,中间程序包括在指针内存访问点插入的检查语句和源程序,接下来进入中间程序的编译阶段。Thus, the intermediate program is determined, which includes the check statements inserted at the pointer memory access points and the source program, and then the compilation phase of the intermediate program is entered.
步骤S110,利用编译技术对插入检查语句的中间程序进行编译过程的中间转换,冗余代码优化,目标代码生成。对于常见的编译技术而言,为了使中间程序符合编译要求,需要在编译前期阶段按照语法、语义规则对插入检查语句后的中间程序进行代码转换,对检查语句而言,一种常见的转换方式是转换为常规比较指令和跳转指令的机器指令组合,这种转换将检查语句由无分支单指令结构展开成分支执行结构,从而对中间程序中源程序的逻辑结构造成了很大的改变。Step S110, using compilation technology to perform intermediate conversion of the compilation process, redundant code optimization, and target code generation on the intermediate program with the check statement inserted. For common compilation technology, in order to make the intermediate program meet the compilation requirements, it is necessary to perform code conversion on the intermediate program after the check statement is inserted according to grammatical and semantic rules in the early stage of compilation. For the check statement, a common conversion method is to convert it into a machine instruction combination of conventional comparison instructions and jump instructions. This conversion expands the check statement from a branchless single instruction structure to a branch execution structure, thereby causing a great change in the logical structure of the source program in the intermediate program.
图2是一种检查语句结构转换后的控制流程图,如图2所示,在控制流程图(control flow graph,CFG)中,对源程序用到的指针p[i]、p[j]分别插入“伪码”形式的检查语句check(i<upper)、check(j<upper),在编译初期阶段按照语法、语义将检查语句转换为常规比较指令cmp和跳转指令jump的组合,从而使检查语句呈现分支模式。FIG2 is a control flow graph after the check statement structure is converted. As shown in FIG2, in the control flow graph (CFG), the check statements check(i<upper) and check(j<upper) in the form of "pseudocode" are respectively inserted into the pointers p[i] and p[j] used in the source program. In the early stage of compilation, the check statements are converted into a combination of conventional comparison instructions cmp and jump instructions jump according to syntax and semantics, so that the check statements present a branch mode.
随后在编译中期阶段利用优化算法对冗余代码进行优化,以及在编译后期阶段进行目标代码生成,最终形成携带有检查语句的二进制目标执行文件,供技术人员进行代码的调试、修改和维护。比如在目标执行文件运行时环节,触发到check指令满足条件时,执行程序表现为jump exit操作导致的core dump问题,发生程序终止。开发人员可以借助程序终止输出的二进制文件中的debug信息还原出诊断信息进行调试,诊断信息包括诊断的程序行号、列号等。Then, in the mid-stage of compilation, the optimization algorithm is used to optimize the redundant code, and in the late stage of compilation, the target code is generated, and finally a binary target execution file with check statements is formed for technical personnel to debug, modify and maintain the code. For example, when the check instruction is triggered to meet the conditions during the runtime of the target execution file, the execution program will show a core dump problem caused by the jump exit operation, and the program will terminate. Developers can use the debug information in the binary file output by the program termination to restore the diagnostic information for debugging. The diagnostic information includes the line number and column number of the diagnosed program.
其中,将编译中期阶段可能涉及到的几种代码优化算法介绍如下:Among them, several code optimization algorithms that may be involved in the mid-stage of compilation are introduced as follows:
值域分析与传播算法(value range analysis and propagation,VRAP),借助变量的值范围信息,来删除冗余分支和冗余表达式的一种代码优化算法。当一条比较语句的比较结果可通过变量值范围的分析直接拿到,则可以根据该比较结果进行冗余分支删除,冗余分支上的表达式一起被删除,与此同时,更新比较语句中被比较对象的值域范围。Value range analysis and propagation (VRAP) is a code optimization algorithm that uses the value range information of variables to delete redundant branches and redundant expressions. When the comparison result of a comparison statement can be directly obtained through the analysis of the variable value range, the redundant branches can be deleted based on the comparison result, and the expressions on the redundant branches are deleted together. At the same time, the value range of the compared object in the comparison statement is updated.
部分冗余消除算法(partial redundancy elimination,PRE),当代码中存在多条分支,对其中一条分支来说该表达式是冗余的,对另一条分支来说该表达式是不冗余的,则称该表达式是部分冗余的。部分冗余消除是一种针对这类部分冗余场景,借助表达式的哈希结果进行冗余表达式消除的代码优化算法。Partial redundancy elimination (PRE) is a code optimization algorithm that eliminates redundant expressions by using the hash result of expressions in such partial redundancy scenarios.
死代码消除(dead code elimination,DCE),删除对程序运行结果没有任何影响的代码的一种代码优化算法。Dead code elimination (DCE) is a code optimization algorithm that deletes codes that have no effect on the program's running results.
但是,上述图1中展示的利用分支结构形式检查语句进行代码编译和优化的方法,在实际应用中存在很大的不足和纰漏,主要有以下原因:However, the method of compiling and optimizing code by checking statements in the form of branch structures shown in FIG1 above has great deficiencies and loopholes in practical applications, mainly for the following reasons:
首先,为了达到完善的内存安全,需要在所有利用指针进行内存访问的风险语句之前插入检查语句, 插入量非常大,与代码中内存访问语句的占比正相关,将带来很大的程序运行时开销;并且,由于插入的这些检查语句在编译过程中,都是基于分支形式来进行冗余检查和代码优化,极大的影响了源程序的编译和优化效果,进一步带来更大的程序运行时开销。分支形式的检查语句带入使得控制流程图变得及其复杂,在编译过程中,原有的代码优化技术无法发挥作用。在步骤S110中,在编译的前期阶段,将插入检查语句后的中间程序统一按照语法、语义的要求进行了转换,包括对检查语句进行分支模式的转换,这样会使得控制流程图变得复杂。如图2所示,插入的每一条检查语句都会带来一个多余节点及一条多余路径,而诸如VRAP、PRE、DCE代码优化算法均依赖于CFG下的模式匹配,在复杂的CFG下,原本的代码优化模式不再成立,无法发挥原有的优化作用,导致优化效果大大降低,而使在编译后期阶段,最终生成的目标执行文件极其臃肿,在内存资源受限的终端系统中无法得到广泛应用。First, in order to achieve perfect memory safety, it is necessary to insert a check statement before all risky statements that use pointers to access memory. The amount of insertion is very large, which is positively correlated with the proportion of memory access statements in the code, and will bring a large program runtime overhead; and, since these inserted check statements are based on branch forms for redundant checks and code optimization during the compilation process, they greatly affect the compilation and optimization effect of the source program, and further bring greater program runtime overhead. The introduction of branch-form check statements makes the control flow graph extremely complex, and the original code optimization technology cannot play a role during the compilation process. In step S110, in the early stage of compilation, the intermediate program after the insertion of the check statement is uniformly converted according to the requirements of syntax and semantics, including the conversion of the branch mode of the check statement, which will make the control flow graph complex. As shown in Figure 2, each inserted check statement will bring an extra node and an extra path, and code optimization algorithms such as VRAP, PRE, and DCE all rely on pattern matching under CFG. Under complex CFG, the original code optimization mode is no longer valid and cannot play the original optimization role, resulting in a greatly reduced optimization effect, and in the later stage of compilation, the target executable file finally generated is extremely bloated, and cannot be widely used in terminal systems with limited memory resources.
鉴于此,本申请实施例提供一种内存安全管理方法。在该方法中,将检查语句和中间程序中的其他代码进行剥离,将在编译前期阶段基于语法、语义规则,对检查语句由单指令结构转换为分支结构机器指令的步骤,延后到进行代码生成的编译后期阶段执行。在编译中期阶段的冗余代码代化过程中,基于单指令方式插入的检查语句,采取定制的VRAP、PRE、DCE代码优化算法对CFG中的源程序和插入的检查语句,都进行冗余消除和代码优化,能够有效降低插入的检查语句对源程序冗余优化的影响,使最终生成的目标执行文件对内存资源的消耗控制在合理范围之内,显著降低运行时开销。In view of this, an embodiment of the present application provides a memory safety management method. In this method, the check statements and other codes in the intermediate program are stripped, and the step of converting the check statements from a single instruction structure to a branch structure machine instruction based on grammatical and semantic rules in the early stage of compilation is postponed to the later stage of compilation for code generation. In the redundant code generation process in the mid-stage of compilation, based on the check statements inserted in a single instruction manner, a customized VRAP, PRE, and DCE code optimization algorithm is used to eliminate redundancy and optimize the code of the source program and the inserted check statements in the CFG, which can effectively reduce the impact of the inserted check statements on the redundant optimization of the source program, so that the consumption of memory resources by the target execution file finally generated is controlled within a reasonable range, and the runtime overhead is significantly reduced.
其次,未考虑对指针属性进行跨语句传播,检查语句与指针属性映射出错,导致漏报。虽然在步骤S101中对源程序中的指针属性进行了标注,从而获得了指针所指向内存区域的长度和边界的功能,但是没有对该边界进行实时存储和更新,使得后继检查语句进行检查时拿到的边界属性仍然是旧版的,旧版的边界信息被带入到检查语句,会导致无效检查,进而发生漏报。比如,在进行指针p声明时,标记了其长度为5,可以获知指针p所指向的内存块长度为[p,p+5)。当对p所指向的内存区域进行访问时,需要进行越界检查,越下界检查语句为:check(p>=p),越上界检查语句为:check(p<p+5),当检查指针p所指向的第一内存块的第0个元素时,可以检测出对上、下界的越界行为。可是,如果随后在程序中发生了指针p的自增运算,再次访问指针p所指向的内存区域时,指针p指向的已经是第一内存块的第1个元素,然而检查语句的目标依然是p>=p和p<p+5,显然这种检查是无效的,因为实际程序已经发生了越上界行为,但无法通过检查语句查找得到。Secondly, the cross-statement propagation of pointer attributes is not considered, and the mapping between the check statement and the pointer attribute is wrong, resulting in missed reports. Although the pointer attributes in the source program are marked in step S101, thereby obtaining the length and boundary function of the memory area pointed to by the pointer, the boundary is not stored and updated in real time, so that the boundary attributes obtained when the subsequent check statement is checked are still the old version. The old version of the boundary information is brought into the check statement, which will lead to invalid checks and then missed reports. For example, when declaring pointer p, its length is marked as 5, and it can be known that the length of the memory block pointed to by pointer p is [p, p+5). When accessing the memory area pointed to by p, an out-of-bounds check is required. The out-of-bounds check statement is: check(p>=p), and the out-of-bounds check statement is: check(p<p+5). When checking the 0th element of the first memory block pointed to by pointer p, the out-of-bounds behavior of the upper and lower bounds can be detected. However, if the pointer p is incremented later in the program, when the memory area pointed to by pointer p is accessed again, pointer p will already point to the first element of the first memory block. However, the target of the check statement is still p>=p and p<p+5. Obviously, this check is invalid because the actual program has already exceeded the upper bound, but it cannot be found through the check statement.
鉴于此,本申请实施例提供一种指针属性存储语句,对指针属性标注的结果进行实时存储,以及建立指针属性和检查语句的直接映射,能够维护检查语句中所使用的指针属性的正确性,降低漏报,提供更完善的安全保障能力。除此之外,还可以根据检查语句进行冗余消除的结果,对指针属性存储语句也进行冗余消除,进一步提升冗余代码消除效果。In view of this, the embodiment of the present application provides a pointer attribute storage statement, stores the results of pointer attribute annotation in real time, and establishes a direct mapping between pointer attributes and check statements, which can maintain the correctness of pointer attributes used in check statements, reduce underreporting, and provide more complete security capabilities. In addition, according to the results of redundancy elimination of the check statement, the pointer attribute storage statement can also be redundancy eliminated, further improving the effect of eliminating redundant code.
图3是本申请实施例提供的一种内存安全管理方法的系统架构图,如图3所示,系统架构图包括语言定义模块300,指针属性标注模块310,检查语句插入模块320,代码优化模块330,目标代码生成模块340五个部分,每个模块的具体功能介绍如下:FIG3 is a system architecture diagram of a memory safety management method provided by an embodiment of the present application. As shown in FIG3 , the system architecture diagram includes five parts: a language definition module 300, a pointer attribute annotation module 310, a check statement insertion module 320, a code optimization module 330, and a target code generation module 340. The specific functions of each module are described as follows:
语言定义模块300描述源程序、指针属性标注、指针属性存储语句、检查语句,以及编译过程中各种中间表示(intermediate representation,IR)的语法、语义规则。其中,中间表示指编译过程对于源程序进行扫描后生成的内部表示,代表源程序的语义和语法结构,编译过程的各个阶段都在IR上进行分析或优化变换。在实际编译过程中,从开始进入代码编译阶段,一直到生成目标执行文件之前,可以根据编译的流程推进,生成多次递进式的中间表示。The language definition module 300 describes the source program, pointer attribute annotations, pointer attribute storage statements, check statements, and the syntax and semantic rules of various intermediate representations (IR) during the compilation process. Among them, the intermediate representation refers to the internal representation generated after the compilation process scans the source program, representing the semantic and grammatical structure of the source program. Each stage of the compilation process is analyzed or optimized on the IR. In the actual compilation process, from the beginning of the code compilation stage until the target execution file is generated, multiple progressive intermediate representations can be generated according to the compilation process.
指针属性标注模块310补全源程序中所有指针定义语句的指针属性标注。The pointer attribute annotation module 310 completes the pointer attribute annotations of all pointer definition statements in the source program.
在一个示例中,当用于解决空间类内存安全以外的其他问题时,比如由于调用指针执行内存访问所导致的时间类内存安全问题,通常包括释放后再使用、内存泄漏、重复释放等,指针属性标注所产生的指针属性信息也可以用于解决这类问题所需要的必要信息。In one example, when used to solve other problems besides spatial memory safety, such as temporal memory safety problems caused by calling pointers to perform memory access, which usually include use after free, memory leaks, repeated releases, etc., the pointer attribute information generated by pointer attribute annotation can also be used to provide the necessary information needed to solve such problems.
检查语句插入模块320在编译过程中选择涉及内存操作的潜在风险语句,并在所有潜在风险语句之前插入指针检查语句,在插入的过程中,必要时,还可以通过指针属性存储语句建立指针属性的存储,以及实现指针属性存储语句和插入指针检查语句的直接映射,完成检查语句的插入后,需要将指针属性标注进行删除。 The check statement insertion module 320 selects potential risk statements involving memory operations during the compilation process, and inserts pointer check statements before all potential risk statements. During the insertion process, if necessary, pointer attribute storage statements can also be used to establish pointer attribute storage, as well as to implement direct mapping between pointer attribute storage statements and inserted pointer check statements. After completing the insertion of the check statement, the pointer attribute annotation needs to be deleted.
在一个示例中,需要在编译过程中针对判空、越上界、越下界三种检查需求,分别设计三种设计的检查语句,这些检查语句是单指令格式,按照无分支结构的方式进行执行,同时,要符合编译过程支持的语义、语法规则。在检查编译过程中检测利用指针进行内存访问的风险语句,并在各个风险语句之前插入设计的检查语句。In one example, three check statements need to be designed for the three check requirements of null, crossing the upper bound, and crossing the lower bound during the compilation process. These check statements are in single instruction format and are executed in a branchless structure. At the same time, they must comply with the semantic and grammatical rules supported by the compilation process. During the compilation process, risky statements that use pointers to access memory are detected, and designed check statements are inserted before each risky statement.
相比图1的代码编译和优化过程,检查语句插入模块在编译阶段增加了指针属性存储子模块321和指针属性映射子模块322。Compared with the code compilation and optimization process of FIG. 1 , the check statement insertion module adds a pointer attribute storage submodule 321 and a pointer attribute mapping submodule 322 during the compilation stage.
在一个示例中,指针属性存储子模块321创建指针属性存储语句,形成编译过程的一种中间表示,对由指针属性标注识别到的指针属性信息进行存储。In one example, the pointer attribute storage submodule 321 creates a pointer attribute storage statement to form an intermediate representation of the compilation process, and stores the pointer attribute information identified by the pointer attribute annotation.
在一个示例中,指针属性映射子模块322,将插入的检查语句和指针属性存储语句进行关联,实现指针检查语句与指针属性信息的正确映射。In one example, the pointer attribute mapping submodule 322 associates the inserted check statement with the pointer attribute storage statement to achieve correct mapping between the pointer check statement and the pointer attribute information.
代码优化模块330在保障功能等价的前提下,对源程序、插入的检查语句、插入的指针属性存储语句进行运行时间更短、占用空间更小等方面的代码优化,从而实现目标代码的功能提升。代码优化模块330包括常规代码优化子模块331、检查语句消除子模块332以及指针属性消除子模块333三个部分,其中,常规代码优化子模块331用于对C或类C语言编写的源程序进行编程过程中间表示的转换,以及冗余代码的优化。The code optimization module 330 optimizes the source program, the inserted check statements, and the inserted pointer attribute storage statements to shorten the running time and occupy less space, etc., under the premise of ensuring functional equivalence, so as to achieve functional improvement of the target code. The code optimization module 330 includes three parts: a conventional code optimization submodule 331, a check statement elimination submodule 332, and a pointer attribute elimination submodule 333. Among them, the conventional code optimization submodule 331 is used to convert the intermediate representation of the source program written in C or C-like language during the programming process, and optimize the redundant code.
相比图1的代码编译和优化过程,代码优化模块330在编译阶段增加了检查语句消除子模块332和指针属性消除子模块333。Compared with the code compilation and optimization process of FIG. 1 , the code optimization module 330 adds a check statement elimination submodule 332 and a pointer attribute elimination submodule 333 during the compilation phase.
在一个示例中,检查语句消除子模块332通过定制的值域分析与传播算法和定制的部分冗余算法的设计,实现对相关冗余检查语句的消除。In one example, the check statement elimination submodule 332 eliminates relevant redundant check statements by designing a customized value range analysis and propagation algorithm and a customized partial redundancy algorithm.
在一个示例中,指针属性消除子模块333通过定制的死代码消除算法的设计,实现对丧失引用关系的冗余指针属性存储语句和指针属性变量进行消除。指针属性存储语句用于存储指针属性信息,指针属性变量是指针属性存储语句定义的变量,用于表示指针的属性。指针属性存储语句将指针属性信息与该指针的检查语句进行关联,当该指针的某个检查语句因冗余被消除后,指针属性存储语句定义的一些指针属性变量没有了使用点,属于死代码,也是一种冗余,本申请方案在编译过程实现将该冗余做进一步消除。In one example, the pointer attribute elimination submodule 333 eliminates redundant pointer attribute storage statements and pointer attribute variables that have lost reference relationships through the design of a customized dead code elimination algorithm. The pointer attribute storage statement is used to store pointer attribute information, and the pointer attribute variable is a variable defined by the pointer attribute storage statement, which is used to represent the attributes of the pointer. The pointer attribute storage statement associates the pointer attribute information with the check statement of the pointer. When a check statement of the pointer is eliminated due to redundancy, some pointer attribute variables defined by the pointer attribute storage statement have no usage points, which are dead codes and also a kind of redundancy. The present application solution further eliminates this redundancy during the compilation process.
目标代码生成模块340将优化后代码中保留的检查语句进行机器指令的转换,并结合进行冗余消除后的源程序、指针属性存储语句,最终生成终端可支持的目标代码。目标代码生成模块340包括检查语句展开子模块341和常规代码生成子模块342两个部分,其中,常规代码生成子模块342用于将进行冗余消除后源程序、指针属性存储语句最终转换成机器目标代码。The target code generation module 340 converts the check statements retained in the optimized code into machine instructions, and combines the source program and pointer attribute storage statements after redundancy elimination to finally generate target code that can be supported by the terminal. The target code generation module 340 includes a check statement expansion submodule 341 and a regular code generation submodule 342, wherein the regular code generation submodule 342 is used to finally convert the source program and pointer attribute storage statements after redundancy elimination into machine target code.
相比图1的代码编译和优化过程,目标代码生成模块340在编译阶段增加了检查语句展开子模块341,用于将冗余消除后保留的检查语句进行展开,展开过程将检查语句由单指令格式转换为分支指令格式,其中,单指令格式是无分支结构执行的设计语句,分支指令格式是以分支结构执行的机器指令,包括比较语句和跳转语句的组合。Compared with the code compilation and optimization process of Figure 1, the target code generation module 340 adds a check statement expansion submodule 341 in the compilation stage, which is used to expand the check statements retained after redundancy elimination. The expansion process converts the check statements from a single instruction format to a branch instruction format, wherein the single instruction format is a design statement executed without a branch structure, and the branch instruction format is a machine instruction executed with a branch structure, including a combination of comparison statements and jump statements.
结合检查语句展开子模块341和常规代码生成子模块342的输出结果,得到符合应用要求的目标执行文件。The output results of the inspection statement expansion submodule 341 and the conventional code generation submodule 342 are combined to obtain a target execution file that meets the application requirements.
接下来,基于图3中的内容对本申请实施例提供的一种内存安全管理方案进行介绍。Next, based on the content in Figure 3, a memory security management solution provided in an embodiment of the present application is introduced.
图4是本申请实施例提供的一种内存安全管理方法的流程图。如图4所示,该方法包括以下的步骤S401-S404,具体分析如下:FIG4 is a flow chart of a memory security management method provided by an embodiment of the present application. As shown in FIG4 , the method includes the following steps S401-S404, which are specifically analyzed as follows:
步骤S401,在第一程序中的风险语句之前插入检查指针属性的检查语句;风险语句是第一程序中调用指针进行内存访问的语句,指针属性是风险语句中指针的属性。Step S401, inserting a check statement for checking pointer attributes before a risk statement in the first program; the risk statement is a statement in the first program that calls a pointer for memory access, and the pointer attribute is an attribute of the pointer in the risk statement.
在本实施例中,首先需要根据用户需求,在硬件平台上基于C或类C语言编辑,获取根据应用方案编写的第一程序,第一程序也就是图3中所述的源程序,该硬件平台可以是一台独立的PC机,接入网络的服务器,或者任何可以进行C或类C语言编辑的用户可输入终端平台。在编辑得到的第一程序中,由于使用了大量的指针操作,在使用指针进行内存访问时,有可能因为指针操作不当造成内存空间的破坏性访问,从而严重影响程序的安全性和可靠性。In this embodiment, firstly, according to user needs, a first program written according to the application scheme needs to be obtained based on C or C-like language editing on a hardware platform, the first program being the source program described in FIG3 , and the hardware platform can be an independent PC, a server connected to the network, or any user input terminal platform that can be edited in C or C-like language. In the first program obtained by editing, since a large number of pointer operations are used, when using pointers for memory access, destructive access to the memory space may be caused due to improper pointer operations, thereby seriously affecting the security and reliability of the program.
基于该第一程序,需要使该第一程序进入编译环节。为了实现对第一程序的编译,可以使用一种集中开发环境,将第一程序的编辑和编译过程在同一平台集中实现,也可以在一个平台实现第一程序的编辑, 在另一个平台将编辑得到的第一程序接入编译环境。Based on the first program, it is necessary to make the first program enter the compilation phase. In order to compile the first program, a centralized development environment can be used to centrally implement the editing and compiling processes of the first program on the same platform, or the editing of the first program can be implemented on one platform. The edited first program is connected to a compilation environment on another platform.
在一个示例中,在第一程序的基础上,通过人工或编译环境在指针定义语句的后面添加指针属性标注语句。首先通过与指针属性标注语句相配套的语法、词法分析技术对指针属性进行提取,然后指针属性存储语句进行指针属性存储,指针属性指的是指针所指向内存块的长度、边界、内存空间上界地址、内存空间下界地址等用于空间类内存安全检查所需要的必要信息,通过对第一程序中隐含信息的词法、语法分析获得,如图3所示,指针属性信息的存储由指针属性存储子模块321完成。In one example, based on the first program, a pointer attribute annotation statement is added after the pointer definition statement manually or in a compilation environment. First, the pointer attribute is extracted by the syntax and lexical analysis technology matched with the pointer attribute annotation statement, and then the pointer attribute storage statement is used to store the pointer attribute. The pointer attribute refers to the length, boundary, upper boundary address of the memory space, lower boundary address of the memory space, etc. of the memory block pointed to by the pointer, which is necessary information required for the space class memory security check, and is obtained by the lexical and grammatical analysis of the implicit information in the first program. As shown in FIG3 , the storage of the pointer attribute information is completed by the pointer attribute storage submodule 321.
一般而言,需要在第一程序中的每一个指针的每一个指针定义语句后,插入获取指针属性的指针标注语句,以及存储每一个指针信息的指针属性存储语句,插入的指针属性存储语句是编译过程可识别的第一自定义语句。Generally speaking, after each pointer definition statement of each pointer in the first program, a pointer annotation statement for obtaining pointer attributes and a pointer attribute storage statement for storing each pointer information need to be inserted. The inserted pointer attribute storage statement is the first custom statement recognizable by the compilation process.
在一个示例中,为了内存访问的保障安全性,需要在实现插入指针属性存储语句后,通过编译环境在第一程序中寻找利用指针进行内存访问产生的风险语句,并在风险语句之前的位置插入关于指针的检查语句,检查语句的检查对象就是风险语句中所使用的指针,插入的检查语句是编译过程可识别的第二自定义语句,为了便于执行冗余代码的优化,检查语句通常设计成单指令格式,按照顺序执行的方式进行执行。单指令格式表现在,设计一条自定义检查语句用于表示一条检查规则,与图1的代码编译和优化过程方案中用一条比较指令加一条跳转指令表示一条检查规则的方式相比,单指令格式的检查语句具有不改变CFG、不影响代码优化技术、进而可大幅降低目标执行文件性能开销的显著优势。另外,风险语句表示那些可能存在空间类内存安全的执行语句,为了保障内存访问安全性,需要在这些风险语句执行之前插入检查语句。可以理解的是,风险语句可以表示内存访问语句,可以表示涉及指针运算的内存访问语句,还可以表示指针运算语句。In one example, in order to ensure the security of memory access, after implementing the insertion of pointer attribute storage statements, it is necessary to find the risk statements generated by using pointers for memory access in the first program through the compilation environment, and insert the pointer check statement before the risk statement. The check object of the check statement is the pointer used in the risk statement. The inserted check statement is the second custom statement that can be identified by the compilation process. In order to facilitate the optimization of redundant code, the check statement is usually designed in a single instruction format and executed in a sequential manner. The single instruction format is manifested in that a custom check statement is designed to represent a check rule. Compared with the method of using a comparison instruction plus a jump instruction to represent a check rule in the code compilation and optimization process scheme of Figure 1, the check statement in the single instruction format has the significant advantages of not changing CFG, not affecting code optimization technology, and thus greatly reducing the performance overhead of the target execution file. In addition, the risk statement represents those execution statements that may have spatial memory safety. In order to ensure the security of memory access, it is necessary to insert the check statement before the execution of these risk statements. It can be understood that the risk statement can represent a memory access statement, a memory access statement involving pointer operations, and a pointer operation statement.
在插入检查语句的过程中,采用检查语句和指针属性存储语句的直接映射,直接映射表现在,所创建的检查语句中使用的检查范围由指针属性存储语句中定义的用于表示指针属性信息的变量和指针属性存储语句中保存的指针属性确定。这种直接映射相比于现有技术中通常采用的建表、查表方式,具有内存开销极小的显著优势。如图3所示,检查语句和指针属性存储语句的直接映射,由指针属性映射子模块322完成,在实现映射时,对一个具体的指针,比如指针P,找到一个执行指针P进行内存访问的风险语句,在该风险语句插入判空检查语句,check(p!=null),寻找执行该风险语句之前指针P的指针属性,将存储该指针属性的指针属性存储语句包含的关于指针P的相关空间信息,映射为该检查语句的调用参数。In the process of inserting the check statement, a direct mapping between the check statement and the pointer attribute storage statement is adopted. The direct mapping is manifested in that the check range used in the created check statement is determined by the variable used to represent the pointer attribute information defined in the pointer attribute storage statement and the pointer attribute stored in the pointer attribute storage statement. Compared with the table building and table lookup methods commonly used in the prior art, this direct mapping has a significant advantage of extremely low memory overhead. As shown in FIG3 , the direct mapping between the check statement and the pointer attribute storage statement is completed by the pointer attribute mapping submodule 322. When implementing the mapping, for a specific pointer, such as pointer P, a risk statement for executing the pointer P to perform memory access is found, and a null check statement, check(p!=null), is inserted into the risk statement to find the pointer attribute of the pointer P before executing the risk statement, and the relevant space information about the pointer P contained in the pointer attribute storage statement storing the pointer attribute is mapped as the call parameter of the check statement.
在编译过程中,在插入检查语句完成后,将代码中加入的指针属性标注语句进行删除。During the compilation process, after the insertion of the check statement is completed, the pointer attribute annotation statement added to the code is deleted.
步骤S402,对第一转换后程序和检查语句分别进行冗余代码消除,得到第一消除后程序和第二消除后程序;其中,第一转换后程序是对第一程序进行编译产生的中间表示。Step S402, performing redundant code elimination on the first converted program and the check statement respectively to obtain a first eliminated program and a second eliminated program; wherein the first converted program is an intermediate representation generated by compiling the first program.
在一个示例中,加入指针属性存储语句、自定义类型单指令格式的检查语句后的程序,被带入到本环节,在一个常规的代码优化过程中,将检查语句视为“伪码”,在不改变第一程序原有逻辑的情况下,采用常规的代码优化方案,对第一程序根据编译过程定义的语法、语义分析规则进行编译,生成编译过程中的一种中间表示,得到第一转换后程序,并将该第一转换后程序进行冗余优化,得到第一转换后程序的第一消除后程序。如图3所示,该常规的代码优化过程由常规代码优化子模块331完成。该常规的代码优化过程和图1中的代码编译和优化过程方案对源程序的处理过程和处理效果类似,在此不做详细描述。In one example, a program after adding a pointer attribute storage statement and a check statement of a user-defined type single instruction format is brought into this link. In a conventional code optimization process, the check statement is regarded as "pseudocode". Without changing the original logic of the first program, a conventional code optimization scheme is adopted to compile the first program according to the grammatical and semantic analysis rules defined in the compilation process, generate an intermediate representation in the compilation process, obtain a first converted program, and perform redundancy optimization on the first converted program to obtain a first eliminated program of the first converted program. As shown in FIG3 , the conventional code optimization process is completed by a conventional code optimization submodule 331. The conventional code optimization process and the code compilation and optimization process scheme in FIG1 are similar in processing and processing effects on the source program, and are not described in detail here.
进一步的,也可以结合图3中的检查语句插入模块320,将检查语句视为第一程序的辅助语句,加入与检查语句相关联的逻辑关系,对常规的代码优化技术进行适配性改造,以达到更加深度消除的目的,此研究方向不是本申请阐述的重点,在此不再赘述。Furthermore, the check statement insertion module 320 in Figure 3 can also be combined to regard the check statement as an auxiliary statement of the first program, add the logical relationship associated with the check statement, and adapt the conventional code optimization technology to achieve a deeper elimination purpose. This research direction is not the focus of this application and will not be elaborated here.
在一个示例中,指针属性存储语句是在编译过程中插入的一种自定义语句,这种自定义语句的语法格式可以与C或类C等语言一致,由于这种自定义语句只是一类变量定义语句,比如,定义指针属性变量p_lower、p_upper,在标注语句后加入指针属性存储语句,p_lower=p,p_upper=p+100,因此不会影响第一程序的常规优化过程。另外,这种自定义语句的语法格式也可以与C或类C等语言不一致,那么在第一程序的常规代码优化过程中,可以和检查语句一样视为“伪码”。在本实施例中,为了方便描述,将这种自定义语句的语法格式视为与C或类C等语言一致,在常规代码优化阶段,不影响对第一程序进行代码优化,可以理解的是,对于这种自定义语句的语法格式与C或类C等语言不一致的情况,则在常规代码优化阶段对第一程序进行代码优化时,将其视为“伪码”,无论哪种情况,都不会影响第一程序的常规代码优化结果。 In one example, the pointer attribute storage statement is a custom statement inserted during the compilation process. The syntax format of this custom statement can be consistent with C or C-like languages. Since this custom statement is only a type of variable definition statement, for example, to define pointer attribute variables p_lower and p_upper, add the pointer attribute storage statement after the annotation statement, p_lower=p, p_upper=p+100, so it will not affect the regular optimization process of the first program. In addition, the syntax format of this custom statement can also be inconsistent with C or C-like languages. In this case, in the regular code optimization process of the first program, it can be regarded as "pseudocode" like the check statement. In this embodiment, for the convenience of description, the syntax format of this custom statement is regarded as consistent with C or C-like languages. In the regular code optimization stage, it does not affect the code optimization of the first program. It can be understood that if the syntax format of this custom statement is inconsistent with C or C-like languages, it will be regarded as "pseudocode" when the first program is optimized in the regular code optimization stage. In either case, it will not affect the regular code optimization result of the first program.
在本实施例中,根据定制的值域分析与传播算法将检查语句中的冗余代码进行消除,得到第三消除后程序。根据定制的部分冗余消除算法,将第三消除后程序中检查语句的冗余代码进行进一步消除,得到第二消除后程序。根据定制的死代码消除算法,将与消除的检查语句有直接映射的指针属性存储语句和指针属性变量进行消除,得到第四消除后程序,具体介绍如下:In this embodiment, redundant codes in the check statements are eliminated according to the customized value range analysis and propagation algorithm to obtain a third eliminated program. According to the customized partial redundancy elimination algorithm, redundant codes in the check statements in the third eliminated program are further eliminated to obtain a second eliminated program. According to the customized dead code elimination algorithm, pointer attribute storage statements and pointer attribute variables that are directly mapped to the eliminated check statements are eliminated to obtain a fourth eliminated program, which is specifically described as follows:
一般而言,冗余代码根据控制流关系可分为全冗余和部分冗余,根据包含关系可分为相同冗余和偏序冗余,在检查语句消除过程中,针对这些冗余代码进行冗余优化算法的定制化设计。Generally speaking, redundant codes can be divided into full redundancy and partial redundancy according to the control flow relationship, and can be divided into identical redundancy and partial order redundancy according to the inclusion relationship. In the process of checking statement elimination, customized design of redundant optimization algorithms is performed for these redundant codes.
针对检查程序中可能存在的全冗余、部分冗余、相同冗余和偏序冗余四种冗余检查语句,分别设计定制的优化算法进行深度消除。Aiming at the four kinds of redundant check statements, namely full redundancy, partial redundancy, identical redundancy and partial order redundancy, which may exist in the check program, customized optimization algorithms are designed for deep elimination.
首先,设计与实现定制的VRAP算法来消除全相同冗余和全偏序冗余,通过对每个被检查语句的值域范围进行求解与传播,确认检查语句的满足条件。若一定会满足,则此检查语句是全相同冗余或全偏序冗余的,进行删除;若一定不满足,则一定会有安全问题发生,进行静态检查报错,提供调试信息给开发人员,辅助其进行调试修改;若无法确认是否满足,则将该检查语句保留到运行时环节再进行实时动态监测,同时可根据检查范围更新值域信息。First, a customized VRAP algorithm is designed and implemented to eliminate all identical redundancy and all partial order redundancy. The check statement is confirmed to meet the conditions by solving and propagating the value range of each checked statement. If it is definitely satisfied, then this check statement is all identical redundancy or all partial order redundancy and is deleted; if it is definitely not satisfied, then there will definitely be a security problem, and a static check error will be reported, providing debugging information to the developer to assist in debugging and modification; if it is impossible to confirm whether it is satisfied, then the check statement will be retained until the runtime stage for real-time dynamic monitoring, and the value range information can be updated according to the inspection range.
下面将结合图5a-5b对本申请实施例提供的定制化VRAP算法进行描述,其中,The customized VRAP algorithm provided in the embodiment of the present application will be described below in conjunction with FIGS. 5a-5b , wherein:
图5a是本申请实施例提供的一种常规VRAP算法处理的流程图,图5b是本申请实施例提供的一种定制化VRAP算法处理的流程图。FIG. 5 a is a flowchart of a conventional VRAP algorithm processing provided in an embodiment of the present application, and FIG. 5 b is a flowchart of a customized VRAP algorithm processing provided in an embodiment of the present application.
在一个示例中,针对一份插入有检查语句的程序代码,如图5a所示,利用常规VRAP算法的冗余检查过程:获知变量i的值域,进行条件判断,删除冗余分支,如图5b所示,利用定制的VRAP算法的冗余检查过程:确认是否满足要求,若一定满足,删除冗余检查,若一定不满足进行静态报错,若不一定满足,根据检查语句更新值域范围。In one example, for a program code with a check statement inserted, as shown in FIG5a , the redundancy check process of the conventional VRAP algorithm is used: the value range of the variable i is obtained, conditional judgment is performed, and redundant branches are deleted. As shown in FIG5b , the redundancy check process of the customized VRAP algorithm is used: confirm whether the requirements are met, if they are definitely met, the redundant check is deleted, if they are definitely not met, a static error is reported, and if they are not necessarily met, the value range is updated according to the check statement.
总之,本申请中提供一种定制的值域分析与传播算法实现对检查语句的消除,定制化主要表现在:“对检查语句的检查对象和检查范围进行判断”取代“对比较语句的比较结果进行判断”,“删除检查语句同时更新检查对象的值域范围”取代“删除比较语句同时更新比较对象的值域范围”。In summary, the present application provides a customized value range analysis and propagation algorithm to achieve the elimination of check statements. The customization is mainly reflected in: "judging the inspection object and inspection scope of the check statement" replaces "judging the comparison result of the comparison statement", and "deleting the check statement and updating the value range of the check object at the same time" replaces "deleting the comparison statement and updating the value range of the comparison object at the same time".
其次,设计与实现设计PRE算法来消除部分相同冗余和部分偏序冗余:将检查指针对象视为哈希的关键字,对所有检查语句进行哈希,遇到具有相同关键字的两条检查语句时,判断两者的检查范围以确认是否是相同关系、偏序关系或不满足关系,再结合正确性、安全性、计算最优性、生命周期最优的原则判断是否是部分冗余。若是部分冗余下的相同关系或偏序关系,则此检查语句是冗余的,通过检查语句上提对其进行部分路径的删除;若是部分冗余下的不满足关系,则一定会有安全问题发生,进行静态检查报错,提供调试信息给开发人员,辅助其进行调试修改;若是其他情况,则将该检查语句保留到运行时环节再进行实时监测。Secondly, design and implement the PRE algorithm to eliminate partial identical redundancy and partial partial order redundancy: treat the check pointer object as the keyword of the hash, hash all the check statements, and when encountering two check statements with the same keyword, judge the check range of the two to confirm whether it is the same relationship, partial order relationship or unsatisfied relationship, and then combine the principles of correctness, security, computational optimality, and life cycle optimality to determine whether it is partial redundancy. If it is the same relationship or partial order relationship under partial redundancy, then this check statement is redundant, and some paths are deleted by lifting the check statement; if it is an unsatisfied relationship under partial redundancy, there will definitely be security issues, and static check errors will be reported, providing debugging information to developers to assist them in debugging and modification; if it is other cases, the check statement will be retained until the runtime link for real-time monitoring.
下面将结合图6a-6b对本申请实施例提供的定制化PRE算法进行描述,其中,The customized PRE algorithm provided in the embodiment of the present application will be described below in conjunction with FIG. 6a-6b, wherein:
图6a是本申请实施例提供的一种常规PRE算法处理的流程图,图6b是本申请实施例提供的一种定制化PRE算法处理的流程图。FIG. 6 a is a flowchart of a conventional PRE algorithm processing provided in an embodiment of the present application, and FIG. 6 b is a flowchart of a customized PRE algorithm processing provided in an embodiment of the present application.
在一个示例中,如图6a所示,对于常规PRE算法面向的冗余场景:图6a中(a)属于全部冗余计算,删掉左分支的a+b运算不会有任何影响,用之前的运算结果c来代替a+b;图6a中(b)属于部分冗余计算,左分支只执行了一次a+b操作,不存在冗余,右分支存在冗余计算,由于功能上需要每个分支都执行一次a+b操作,消除的核心思想在于让每个分支都只出现一次a+b操作;图6a中(c)为常见的循环冗余,可归属于部分冗余计算的特殊情况,即由于循环次数不同,不同路径上a+b的执行次数不同,当执行一次a+b就退出循环时,不存在冗余计算,但是进入循环后,会产生冗余计算。In an example, as shown in FIG6a , for the redundant scenarios faced by the conventional PRE algorithm: FIG6a (a) belongs to a fully redundant calculation. Deleting the a+b operation of the left branch will not have any effect, and the previous calculation result c is used to replace a+b; FIG6a (b) belongs to a partially redundant calculation. The left branch only executes the a+b operation once, and there is no redundancy. The right branch has redundant calculations. Since each branch needs to execute the a+b operation once, the core idea of eliminating it is to make each branch only have one a+b operation; FIG6a (c) is a common cyclic redundancy, which can be attributed to a special case of partially redundant calculations, that is, due to different loop times, the execution times of a+b on different paths are different. When a+b is executed once and the loop is exited, there is no redundant calculation, but after entering the loop, redundant calculation will be generated.
从图6a中可以看出,部分冗余a+b消除需要涉及重命名、插入、删除、移动等各项操作。若变量c和d存在某种联系、各个分支对c或d的赋值不同、分支与循环叠加等场景的出现,会使得计算会变得非常复杂。As can be seen from Figure 6a, the elimination of partial redundancy a+b requires various operations such as renaming, inserting, deleting, and moving. If there is a certain connection between variables c and d, different branches assign different values to c or d, and branches and loops are superimposed, the calculation will become very complicated.
为了保障计算正确性和执行优化的正向效果,在部分冗余消除过程中,必须同时满足如下四项基本原则:In order to ensure the correctness of calculations and the positive effects of execution optimization, the following four basic principles must be met simultaneously during the process of partial redundancy elimination:
(1)正确性:只在a+b冗余的路径上删除a+b,保障不会删错;(1) Correctness: a+b is deleted only on the redundant path of a+b to ensure that no mistakes are made.
(2)安全性:只在原本执行a+b的路径上插入a+b,保障不会多插;(2) Security: a+b is only inserted on the path where a+b is originally executed, ensuring that no more than one insertion occurs.
(3)计算优化性:无论真实输入数据采用什么样的执行路径,都不会有更少的a+b计算次数;(3) Computational optimization: No matter what execution path the real input data takes, there will be no fewer a+b calculations;
(4)生命周期最优(寄存器压力最小):在图6a中(b)和图6a中(c)的基础上,最小化a+b存储的寄存 (4) Optimal life cycle (minimum register pressure): Based on (b) in Figure 6a and (c) in Figure 6a, minimize the register pressure of a+b storage.
器周期,越晚存储越好。The later the storage, the better.
遵循上述四项原则来实现a+b的重命名、插入、删除、移动等操作:设计了向下安全(DownSafe,ds)、可被使用(CanBeAvail,cba)、置后(Later,later)三个标志位,沿着代码顺序对三个标志位进行计算,根据计算结果进行相对应的操作。下表中列举了常规PRE算法的核心标志位:
Follow the above four principles to implement the renaming, insertion, deletion, and movement of a+b: design three flags: DownSafe (ds), CanBeAvail (cba), and Later (later). Calculate the three flags in the code sequence and perform corresponding operations based on the calculation results. The following table lists the core flags of the conventional PRE algorithm:
如图6b所示,对于定制化PRE算法面向的冗余场景是图6b中的(b)、(c)、(d)三种,将check(i)作为哈希的关键字,check(i<len1)和check(i<len2)将被视为相同的哈希对象;获取到相同哈希关系后,确认两者之间检查范围的包含关系:若len1==len2,为相同关系;若len1<len2,则check(i<len1)满足的情况下check(i<len2)一定被满足,为偏序关系;若len1>len2,则check(i<len1)满足的情况下check(i<len2)不一定被满足,为非冗余关系;再结合正确性、安全性、计算最优性、生命周期最优的原则判断是否是部分冗余。As shown in FIG6b , the customized PRE algorithm targets the redundant scenarios (b), (c), and (d) in FIG6b . With check(i) as the hash keyword, check(i<len1) and check(i<len2) are regarded as the same hash object. After obtaining the same hash relationship, the inclusion relationship of the check range between the two is confirmed: if len1==len2, it is the same relationship. If len1<len2, then if check(i<len1) is satisfied, then check(i<len2) must be satisfied, which is a partial order relationship. If len1>len2, then if check(i<len1) is satisfied, then check(i<len2) may not be satisfied, which is a non-redundant relationship. Then, the principles of correctness, security, computational optimality, and life cycle optimality are combined to determine whether it is partial redundancy.
在一个示例中,对上述图6a中的四项基本原则的含义做如下修改:In one example, the meanings of the four basic principles in FIG. 6a above are modified as follows:
(1)正确性:只在check(i<len)冗余的路径上删除check(i<len),保障不会删错;(1) Correctness: check(i<len) is only deleted on the path where check(i<len) is redundant to ensure that no mistakes are made.
(2)安全性:只在原本执行check(i<len)的路径上插入check(i<len),保障不会多插;(2) Safety: check(i<len) is only inserted on the path where check(i<len) is originally executed to ensure that no more than one check(i<len) is inserted.
(3)计算优化性:无论真实输入数据采用什么样的执行路径,都不会有更少的check次数;(3) Computational optimization: No matter what execution path the real input data takes, there will be no fewer checks;
(4)生命周期最优(寄存器压力最小):因为不涉及寄存器存储,删除该原则。(4) Optimal life cycle (minimum register pressure): This principle is deleted because it does not involve register storage.
相应的,对标志位求解方式进行修改,实现检查语句,即check指令的重命名、插入、删除、移动等操作,下表中列举了定制化PRE算法的核心标志位:
Accordingly, the flag bit solving method is modified to implement the check statement, that is, the renaming, insertion, deletion, and movement of the check instruction. The following table lists the core flag bits of the customized PRE algorithm:
总之,本申请中提供一种定制的部分冗余消除算法实现对检查语句的消除,定制化主要表现在:“对检查语句中的检查对象进行哈希”取代“对整个表达式进行哈希”,“对相同检查对象的不同检查范围进行比较,可消除相同冗余和偏序冗余”取代“对相同语句进行比较,仅可消除相同冗余”。In summary, the present application provides a customized partial redundancy elimination algorithm to achieve the elimination of check statements. The customization is mainly reflected in: "hashing the check object in the check statement" replaces "hashing the entire expression", and "comparing different check ranges of the same check object to eliminate the same redundancy and partial order redundancy" replaces "comparing the same statement to only eliminate the same redundancy".
最后,设计与实现定制的DCE算法来消除冗余的指针属性,冗余的指针属性指的是那些不再有价值的指针属性存储语句和指针属性变量。指针属性变量在指针属性存储语句中被定义(称为定义点),在检查语句中被使用(称为使用点)。在冗余检查语句被大量消除后,很多指针属性存储语句和指针属性变量不 再有使用点,因而没有了价值,成为了死代码,需要进一步进行冗余消除。Finally, a customized DCE algorithm is designed and implemented to eliminate redundant pointer attributes, which are pointer attribute storage statements and pointer attribute variables that are no longer valuable. Pointer attribute variables are defined in pointer attribute storage statements (called definition points) and used in check statements (called use points). After a large number of redundant check statements are eliminated, many pointer attribute storage statements and pointer attribute variables are no longer useful. There is no more usage point, so it has no value and becomes dead code, which needs further redundancy elimination.
图7是本申请实施例提供的一种常规DCE算法处理的流程图,下面将结合图7对本申请实施例提供的定制化DCE算法进行描述,如图7所示,在常规DCE算法中,进行反向数据流分析,判断一个变量不再有使用点,就是没用的。在定制化DCE算法中,虽然实现原理和常规DCE算法是一样的,但面向的对象不同,面向的对象是冗余的指针属性存储语句和指针属性变量。本申请中提供设计的检查语句和指针属性存储语句来支持这种DCE算法的定制化。FIG7 is a flow chart of a conventional DCE algorithm processing provided by an embodiment of the present application. The customized DCE algorithm provided by an embodiment of the present application will be described below in conjunction with FIG7. As shown in FIG7, in a conventional DCE algorithm, reverse data flow analysis is performed to determine that a variable is useless if it no longer has a use point. In a customized DCE algorithm, although the implementation principle is the same as that of a conventional DCE algorithm, the object-oriented is different, and the object-oriented is redundant pointer attribute storage statements and pointer attribute variables. The present application provides designed check statements and pointer attribute storage statements to support the customization of this DCE algorithm.
如上,如图3所示,对于冗余检查语句的消除由检查语句消除子模块332完成,之后,对于冗余指针属性存储语句的消除由指针属性消除子模块333完成。在以上执行过程中,对于检查语句消除子模块332,可以在常规代码优化子模块331执行完成后执行,也可以和常规代码优化子模块331同步执行,在此不做设定。As shown in FIG3 , the elimination of redundant check statements is completed by the check statement elimination submodule 332, and then the elimination of redundant pointer attribute storage statements is completed by the pointer attribute elimination submodule 333. In the above execution process, the check statement elimination submodule 332 can be executed after the execution of the conventional code optimization submodule 331, or it can be executed synchronously with the conventional code optimization submodule 331, which is not set here.
在一个示例中,可以将执行代码优化模块330期间产生的静态调试文件反馈给程序开发人员,并接收程序开发人员对源程序、检查语句、指针属性赋值语句、指针属性变量等的修改。一般而言,静态调试文件可以是通过常规代码优化子模块331,对第一程序进行语法、语义转换和冗余代码优化所产生的编译错误,包括语法错误,内存存取错误,命令行错误等,也可以是设计冗余消除子模块332进行检查语句冗余消除,和指针属性消除子模块333进行指针属性存储语句冗余消除过程中所产生的优化逻辑错误、语法错误等。In one example, the static debugging file generated during the execution of the code optimization module 330 can be fed back to the program developer, and the program developer's modifications to the source program, check statements, pointer attribute assignment statements, pointer attribute variables, etc. can be received. Generally speaking, the static debugging file can be the compilation errors generated by the regular code optimization submodule 331 when the first program is subjected to syntax, semantic conversion and redundant code optimization, including syntax errors, memory access errors, command line errors, etc., or can be the optimization logic errors, syntax errors, etc. generated during the redundant elimination of check statements by the design redundancy elimination submodule 332 and the redundant elimination of pointer attribute storage statements by the pointer attribute elimination submodule 333.
步骤S403,对第二消除后程序实施编译过程中机器指令的转换,得到第二转换后程序。Step S403, converting the machine instructions in the compilation process on the second eliminated program to obtain a second converted program.
步骤S404,至少基于第二转换后程序、第一消除后程序,生成目标执行文件;其中,目标执行文件用于在运行时生成故障信息,故障信息包含第一风险语句中指针的指针属性,第一风险语句是第一程序中的风险语句中的一个。Step S404, generating a target executable file based at least on the second converted program and the first eliminated program; wherein the target executable file is used to generate fault information at runtime, the fault information including the pointer attribute of the pointer in the first risk statement, and the first risk statement is one of the risk statements in the first program.
在对第一转换后程序和检查语句的冗余代码进行消除后,如图3所示,通过检查语句展开子模块341将每条检查语句展开成一条比较与一条跳转的组合形式,得到第二转换后程序,After eliminating the redundant codes of the first converted program and the check statements, as shown in FIG3 , each check statement is expanded into a combination of a comparison and a jump by the check statement expansion submodule 341 to obtain a second converted program.
由于对指针属性存储语句和指针属性变量进行冗余消除后生成的第四消除程序,也是生成目标执行文件的一部分,所以在生成目标执行文件之前,还需要加入第四消除程序,然后,基于第二转换后程序、第二消除后程序和第四消除后程序,生成一种含有指针检查功能的二进制代码。Since the fourth elimination program generated after redundant elimination of pointer attribute storage statements and pointer attribute variables is also a part of generating the target execution file, the fourth elimination program needs to be added before generating the target execution file, and then, based on the second converted program, the second eliminated program and the fourth eliminated program, a binary code containing a pointer checking function is generated.
在一个示例中,可以将目标执行文件在运行时产生的动态调试文件反馈给程序开发人员,并接收程序开发人员对源程序、检查语句、指针属性赋值语句、指针属性变量等的修改。动态调试文件中包含因为调用指针进行内存访问所导致的故障信息,包括风险语句位置、产生故障的指针执行代码、指针属性、指针越界类型等。In one example, the dynamic debugging file generated by the target execution file during runtime can be fed back to the program developer, and the program developer's modifications to the source program, check statements, pointer attribute assignment statements, pointer attribute variables, etc. can be received. The dynamic debugging file contains fault information caused by calling pointers for memory access, including the location of risky statements, pointer execution code that generates faults, pointer attributes, pointer out-of-bounds types, etc.
图8是本申请实施例提供的一种源程序编译的流程图,将源程序中的代码按照步骤S401-S404的执行过程进行描述,具体介绍如下:FIG8 is a flowchart of a source program compilation provided by an embodiment of the present application, and the code in the source program is described according to the execution process of steps S401-S404, which is specifically described as follows:
基于源程序得到中间程序;源程序是根据应用方案编写的程序;中间程序包括源程序,对源程序中的指针进行指针属性存储的指针属性存储语句,指针属性存储语句中定义的指针属性变量,以及对风险语句中的指针进行指针属性检查的检查语句;其中,指针属性包括指针所指向内存块的长度、边界信息,风险语句是源程序中利用指针进行内存访问的语句。An intermediate program is obtained based on a source program; a source program is a program written according to an application solution; the intermediate program includes a source program, a pointer attribute storage statement for storing pointer attributes of a pointer in the source program, pointer attribute variables defined in the pointer attribute storage statement, and a check statement for checking pointer attributes of a pointer in a risk statement; wherein the pointer attributes include the length and boundary information of a memory block pointed to by the pointer, and the risk statement is a statement in the source program that uses a pointer to access memory.
将源程序转换为编译过程产生的中间表示,得到第一转换后程序。The source program is converted into an intermediate representation generated by a compilation process to obtain a first converted program.
对第一转换后程序进行冗余代码消除,得到第一消除后程序。Redundant codes are eliminated from the first converted program to obtain a first eliminated program.
根据定制的值域分析与传播算法将检查语句中的冗余代码消除,得到第三消除后程序,根据定制的部分冗余消除算法将第三消除后程序中存在的冗余代码进一步消除,得到第二消除后程序。According to the customized value range analysis and propagation algorithm, the redundant codes in the check statement are eliminated to obtain a third eliminated program. According to the customized partial redundancy elimination algorithm, the redundant codes existing in the third eliminated program are further eliminated to obtain a second eliminated program.
根据定制的死代码消除算法将指针属性存储语句和指针属性变量中丧失引用关系的冗余代码进行消除,得到第四消除后程序。其中,丧失引用关系的冗余代码是与检查语句中的冗余代码存在直接映射的指针属性存储语句和指针属性变量,直接映射将丧失引用关系的冗余代码包含的指针属性变量以及存储的指针属性映射为检查语句中的冗余代码的检查范围。According to the customized dead code elimination algorithm, redundant codes that have lost reference relationships in pointer attribute storage statements and pointer attribute variables are eliminated to obtain a fourth eliminated program. The redundant codes that have lost reference relationships are pointer attribute storage statements and pointer attribute variables that are directly mapped to the redundant codes in the check statement, and the direct mapping maps the pointer attribute variables and the stored pointer attributes contained in the redundant codes that have lost reference relationships to the check range of the redundant codes in the check statement.
对第二消除后程序实施编译过程中机器指令的转换,得到第二转换后程序。The second eliminated program is subjected to the conversion of the machine instructions in the compilation process to obtain a second converted program.
在一个示例中,中间程序在编译过程中,除了要求指针属性存储语句和指针属性变量的冗余消除在检查语句冗余消除之后进行,其他执行动作的执行顺序均无特殊要求,比如,对源程序进行编译转换及常规代码优化、对检查语句进行冗余消除及执行顺序转换,这两个步骤可以同时执行,也可以先后执行,两者 执行顺序的改变不会对最终的编译效果产生影响。In one example, during the compilation of the intermediate program, except for requiring the redundancy elimination of pointer attribute storage statements and pointer attribute variables to be performed after the redundancy elimination of check statements, there are no special requirements for the execution order of other execution actions. For example, the source program is compiled and converted and the conventional code is optimized, and the redundancy elimination and execution order conversion of the check statements are performed. These two steps can be performed simultaneously or successively. Changing the execution order will not affect the final compilation result.
基于第二转换后程序、第一消除后程序和第四消除后程序,生成目标执行文件。Based on the second post-conversion program, the first post-elimination program, and the fourth post-elimination program, a target execution file is generated.
图9是本申请实施例提供的一种内存安全管理方法的实施架构图,如图9所示,设计了一种改造的集成开发环节,由编译器900及图形用户界面910进行源程序的优化设计。在开发人员在编辑器920的源码编写子模块921中完成源程序撰写与改造后,通过指针属性存储子模块901以及指针属性映射子模块902两个子模块,在风险语句之前进行检查语句插入;经过常规代码优化子模块903、检查语句消除子模块904以及指针属性消除子模块905三个子模块后,通过检查语句展开子模块806以及常规代码生成子模块907得到含有检查功能的二进制代码;同时,得到一些静态检查报错信息,通过图形用户界面910中的交互式报错与调试子模块911传递给开发人员,辅助其进行调试修改;二进制代码输入到真实执行环境中可以根据实时输入信息对安全问题进行实时检查,得到的调试信息传递给运维人员,辅助其进行代码调试。FIG9 is an implementation architecture diagram of a memory safety management method provided by an embodiment of the present application. As shown in FIG9 , a modified integrated development link is designed, and the compiler 900 and the graphical user interface 910 are used to optimize the source program. After the developer completes the writing and modification of the source program in the source code writing submodule 921 of the editor 920, the two submodules of the pointer attribute storage submodule 901 and the pointer attribute mapping submodule 902 are used to insert the check statement before the risk statement; after the three submodules of the conventional code optimization submodule 903, the check statement elimination submodule 904 and the pointer attribute elimination submodule 905, the binary code containing the check function is obtained through the check statement expansion submodule 806 and the conventional code generation submodule 907; at the same time, some static check error information is obtained, which is passed to the developer through the interactive error reporting and debugging submodule 911 in the graphical user interface 910 to assist in debugging and modifying; the binary code is input into the real execution environment, and the security issues can be checked in real time according to the real-time input information, and the obtained debugging information is passed to the operation and maintenance personnel to assist in code debugging.
为了实现低开销高安全的空间类内存安全问题检测,参照图4的内存安全管理方法,图10是本申请实施例提供的一种内存安全管理方法的流程图,如图10所示,实施步骤S1000-S1008,各步骤具体介绍如下:In order to achieve low-overhead and high-security space-based memory security problem detection, referring to the memory security management method of FIG. 4 , FIG. 10 is a flow chart of a memory security management method provided by an embodiment of the present application. As shown in FIG. 10 , steps S1000-S1008 are implemented, and each step is specifically described as follows:
步骤S1000,通过词法、语法分析提取源程序中的指针属性信息。Step S1000, extracting pointer attribute information in the source program through lexical and grammatical analysis.
步骤S1001,创建指针属性变量和指针属性存储语句对指针属性信息进行存储。Step S1001, creating a pointer attribute variable and a pointer attribute storage statement to store pointer attribute information.
步骤S1002,利用指针属性变量设计单指令格式的检查语句,在内存访问等风险语句之前插入检查语句。Step S1002, designing a check statement in a single instruction format using pointer attribute variables, and inserting the check statement before risk statements such as memory access.
步骤S1003,调用常规代码优化技术进行性能调优。Step S1003: Calling conventional code optimization technology to perform performance tuning.
步骤S1004,利用定制化VRAP算法消除全相同冗余和全偏序冗余的检查语句,同时进行静态检查报错。Step S1004, using a customized VRAP algorithm to eliminate all identical redundant and all partially redundant check statements, while performing static check errors.
步骤S1005,利用定制化PRE算法消除部分相同冗余和部分偏序冗余的检查语句,同时进行静态检查报错。Step S1005, using a customized PRE algorithm to eliminate partially identical redundant and partially partially redundant check statements, while performing static check error reporting.
步骤S1006,利用定制化DCE算法消除没有使用点的冗余指针属性存储语句。Step S1006: Eliminate redundant pointer attribute storage statements without usage points using a customized DCE algorithm.
步骤S1007,将每一条检查语句展开成一条比较指令和一条跳转指令的组合。Step S1007, expand each check statement into a combination of a comparison instruction and a jump instruction.
步骤S1008,结合对源程序进行常规代码优化处理得到的代码,生成目标执行代码。Step S1008, generating target execution code by combining the code obtained by performing conventional code optimization processing on the source program.
图11是本申请实施例提供的一种内存安全管理方法的实施架构图,如图11所示,是一种改造的编译器,对其前端、中端、后端三部分分别进行了优化设计。首先,在前端模块1100中,包含指针属性存储子模块1101,指针属性映射子模块1102,用于存储指针属性并建立指针属性和检查语句的正确映射;在中端模块1110中,包含常规代码优化子模块1111,检查语句消除子模块1112、指针属性消除子模块1113,在中端模块1110中,将静态报错结果以调试信息的形式输出给开发人员,辅助其对代码进行调试修改;在后端模块1120中,包含检查语句展开子模块1121,常规代码生成子模块1122,用以将设计的单指令格式的检查语句展开成比较指令和跳转指令的组合。由此,可得到一种含有检查功能的二进制代码,当该二进制代码在真实执行环境中运行时,可给运维人员提供调试信息,辅助其进行调试修改。除模块划分和连接方式与图9所示实时架构图中不同,每个步骤的具体操作都和图10所示流程图中的介绍一致,此处不再赘述。Figure 11 is an implementation architecture diagram of a memory safety management method provided by an embodiment of the present application. As shown in Figure 11, it is a modified compiler, and its front-end, middle-end, and back-end parts are optimized and designed respectively. First, in the front-end module 1100, it includes a pointer attribute storage submodule 1101 and a pointer attribute mapping submodule 1102, which are used to store pointer attributes and establish the correct mapping of pointer attributes and check statements; in the middle-end module 1110, it includes a conventional code optimization submodule 1111, a check statement elimination submodule 1112, and a pointer attribute elimination submodule 1113. In the middle-end module 1110, the static error result is output to the developer in the form of debugging information to assist them in debugging and modifying the code; in the back-end module 1120, it includes a check statement expansion submodule 1121 and a conventional code generation submodule 1122, which are used to expand the designed single instruction format check statement into a combination of comparison instructions and jump instructions. Thus, a binary code containing a check function can be obtained. When the binary code is running in a real execution environment, debugging information can be provided to the operation and maintenance personnel to assist them in debugging and modifying. Except that the module division and connection method are different from the real-time architecture diagram shown in Figure 9, the specific operation of each step is consistent with the introduction in the flowchart shown in Figure 10, and will not be repeated here.
图12是本申请实施例提供的一种内存安全管理方法的实施架构图,如图12所示,是一种改造的程序分析工具,包含了静态分析和动态分析两部分。在静态分析模块1200中,包含指针属性存储子模块1201,指针属性映射子模块1202,常规代码优化子模块1203,检查语句消除子模块1204,指针属性消除子模块1205五个模块,实现了指针属性和检查语句的正确映射,在消除冗余检查语句的同时还可以将静态报错结果以调试信息的形式输出给开发人员,辅助其对代码进行调试修改;在动态分析模块1210中,包含检查语句展开子模块1211,常规代码生成子模块1212,可以生成含有检查功能的二进制代码,随后和模拟用例一起输入给模拟执行环境,得到一些报错信息输出给开发人员,辅助其对代码进行调试修改。除模块划分和连接方式与图9所示实时架构图中不同,每个步骤的具体操作都和图10所示流程图中的介绍一致,此处不再赘述。Figure 12 is an implementation architecture diagram of a memory safety management method provided by an embodiment of the present application. As shown in Figure 12, it is a modified program analysis tool, which includes two parts: static analysis and dynamic analysis. In the static analysis module 1200, it includes a pointer attribute storage submodule 1201, a pointer attribute mapping submodule 1202, a conventional code optimization submodule 1203, a check statement elimination submodule 1204, and a pointer attribute elimination submodule 1205. Five modules are implemented to achieve the correct mapping of pointer attributes and check statements. While eliminating redundant check statements, the static error report results can also be output to the developer in the form of debugging information to assist in debugging and modifying the code; in the dynamic analysis module 1210, it includes a check statement expansion submodule 1211 and a conventional code generation submodule 1212, which can generate binary code containing a check function, and then input it into the simulation execution environment together with the simulation case, and obtain some error information output to the developer to assist in debugging and modifying the code. Except that the module division and connection method are different from the real-time architecture diagram shown in Figure 9, the specific operation of each step is consistent with the introduction in the flow chart shown in Figure 10, and will not be repeated here.
基于上述实施例中的方法,本申请实施例还提供一种内存安全管理设备。Based on the method in the above embodiment, an embodiment of the present application also provides a memory security management device.
图13是本申请实施例提供的一种内存安全管理设备的硬件结构示意图。如图13所示,该内存安全管理设备1300包括:处理模块1301,具体模块功能介绍如下:FIG13 is a schematic diagram of the hardware structure of a memory security management device provided in an embodiment of the present application. As shown in FIG13 , the memory security management device 1300 includes: a processing module 1301, and the specific module functions are described as follows:
处理模块1301在第一程序中的风险语句之前插入检查指针属性的检查语句,风险语句是第一程序中 调用指针进行内存访问的语句,指针属性是风险语句中指针的属性。The processing module 1301 inserts a check statement for checking pointer attributes before the risk statement in the first program. The risk statement is In a statement that calls a pointer to access memory, the pointer attributes are the attributes of the pointer in the risk statement.
处理模块1301还对第一转换后程序和检查语句分别进行冗余代码消除,得到第一消除后程序和第二消除后程序;其中,第一转换后程序是对第一程序进行编译产生的中间表示。The processing module 1301 also performs redundant code elimination on the first converted program and the check statement to obtain a first eliminated program and a second eliminated program; wherein the first converted program is an intermediate representation generated by compiling the first program.
处理模块1301还对第二消除后程序实施编译过程中机器指令的转换,得到第二转换后程序。The processing module 1301 further performs conversion of machine instructions in the compilation process on the second eliminated program to obtain a second converted program.
处理模块1301还至少基于第二转换后程序、第一消除后程序,生成目标执行文件;其中,目标执行文件用于在运行时生成故障信息,故障信息包含第一风险语句中指针的指针属性,第一风险语句是第一程序中的风险语句中的一个。The processing module 1301 also generates a target execution file based on at least the second converted program and the first eliminated program; wherein the target execution file is used to generate fault information at runtime, and the fault information includes the pointer attribute of the pointer in the first risk statement, and the first risk statement is one of the risk statements in the first program.
在一些实施例中,处理模块1301对第一转换后程序和检查语句分别进行冗余代码消除时,通过对检查语句的检查对象和检查范围进行判断,将检查语句中的全相同冗余代码及全偏序冗余代码消除,得到第三消除后程序;检查语句的检查对象和检查范围至少基于风险语句得到。通过对第三消除后程序中检查语句的检查对象进行哈希,将第三消除后程序中的部分相同冗余代码及部分偏序冗余代码消除,得到第二消除后程序。In some embodiments, when the processing module 1301 performs redundant code elimination on the first converted program and the check statement respectively, the check object and the check range of the check statement are judged, and the completely identical redundant code and the completely partial order redundant code in the check statement are eliminated to obtain the third eliminated program; the check object and the check range of the check statement are obtained based on at least the risk statement. By hashing the check object of the check statement in the third eliminated program, some identical redundant code and some partial order redundant code in the third eliminated program are eliminated to obtain the second eliminated program.
在一些实施例中,处理模块1301在第一程序中的风险语句之前插入检查指针属性的检查语句时,在第一程序的指针定义语句之后插入标注语句;标注语句用于获取风险语句中指针的指针属性;在标注语句之后插入指针属性存储语句;指针属性存储语句用于保存标注语句获取的指针属性,指针属性存储语句包含指针属性变量,指针属性变量是用于表示指针属性的变量;指针属性存储语句是编译过程可识别的第一自定义语句;基于风险语句和指针属性存储语句,确定对风险语句进行指针属性检查的检查语句;其中,风险语句用于确定检查语句的检查对象,指针属性存储语句包含的指针属性变量和指针属性存储语句保存的指针属性用于确定检查语句的检查范围;其中,检查语句是编译过程可识别的第二自定义语句;在风险语句之前插入检查指针属性的检查语句。In some embodiments, when the processing module 1301 inserts a check statement for checking pointer attributes before the risk statement in the first program, a label statement is inserted after the pointer definition statement of the first program; the label statement is used to obtain the pointer attributes of the pointer in the risk statement; a pointer attribute storage statement is inserted after the label statement; the pointer attribute storage statement is used to save the pointer attributes obtained by the label statement, and the pointer attribute storage statement contains pointer attribute variables, which are variables used to represent pointer attributes; the pointer attribute storage statement is a first custom statement recognizable by the compilation process; based on the risk statement and the pointer attribute storage statement, a check statement for performing pointer attribute check on the risk statement is determined; wherein, the risk statement is used to determine the check object of the check statement, and the pointer attribute variables contained in the pointer attribute storage statement and the pointer attributes saved in the pointer attribute storage statement are used to determine the check scope of the check statement; wherein the check statement is a second custom statement recognizable by the compilation process; a check statement for checking pointer attributes is inserted before the risk statement.
在一些实施例中,处理模块1301对第一转换后程序和检查语句分别进行冗余代码消除之后,将指针属性存储语句中的冗余代码进行消除,以及将指针属性存储语句中的冗余代码中包含的指针属性变量进行消除,得到第四消除后程序;其中,指针属性存储语句中的冗余代码保存的指针属性,以及指针属性存储语句中的冗余代码中包含的指针属性变量,用于确定检查语句中的冗余代码的检查范围;处理模块1301至少基于第二转换后程序、第一消除后程序生成目标执行文件时,用于:基于第二转换后程序、第一消除后程序和第四消除后程序,生成目标执行文件。In some embodiments, after the processing module 1301 eliminates redundant codes from the first converted program and the check statement respectively, it eliminates the redundant codes in the pointer attribute storage statement and the pointer attribute variables contained in the redundant codes in the pointer attribute storage statement to obtain a fourth eliminated program; wherein, the pointer attributes stored in the redundant codes in the pointer attribute storage statement and the pointer attribute variables contained in the redundant codes in the pointer attribute storage statement are used to determine the inspection scope of the redundant codes in the check statement; when the processing module 1301 generates a target executable file based at least on the second converted program and the first eliminated program, it is used to: generate a target executable file based on the second converted program, the first eliminated program and the fourth eliminated program.
在一些实施例中,第一程序是使用C或类C语言编写的程序。In some embodiments, the first program is a program written in C or a C-like language.
图14是本申请实施例提供的一种内存安全管理设备的硬件结构示意图。该网络设备1400可以为上述内存安全管理设备。如图14所示,该网络设备1400包括处理器1410、存储器1420、通信接口1430和总线1440,处理器1410、存储器1420和通信接口1430通过总线1440彼此连接。处理器1410、存储器1420和通信接口1430也可以采用除了总线1440之外的其他连接方式连接。FIG14 is a schematic diagram of the hardware structure of a memory security management device provided in an embodiment of the present application. The network device 1400 may be the above-mentioned memory security management device. As shown in FIG14 , the network device 1400 includes a processor 1410, a memory 1420, a communication interface 1430, and a bus 1440, and the processor 1410, the memory 1420, and the communication interface 1430 are connected to each other via the bus 1440. The processor 1410, the memory 1420, and the communication interface 1430 may also be connected in other connection modes besides the bus 1440.
其中,存储器1420可以是各种类型的存储介质,例如随机存取存储器(random access memory,RAM)、只读存储器(read-only memory,ROM)、非易失性RAM(non-volatile RAM,NVRAM)、可编程ROM(programmable ROM,PROM)、可擦除PROM(erasable PROM,EPROM)、电可擦除PROM(electrically erasable PROM,EEPROM)、闪存、光存储器、硬盘等。Among them, the memory 1420 can be various types of storage media, such as random access memory (RAM), read-only memory (ROM), non-volatile RAM (NVRAM), programmable ROM (PROM), erasable PROM (EPROM), electrically erasable PROM (EEPROM), flash memory, optical storage, hard disk, etc.
其中,处理器1410可以是通用处理器,通用处理器可以是通过读取并执行存储器(例如存储器1420)中存储的内容来执行特定步骤和/或操作的处理器。例如,通用处理器可以是中央处理器(central processing unit,CPU)。处理器1410可以包括至少一个电路,以执行图4或者图9所示实施例提供的内存安全管理方法的全部或部分步骤。The processor 1410 may be a general-purpose processor, which may be a processor that performs specific steps and/or operations by reading and executing the contents stored in a memory (e.g., memory 1420). For example, the general-purpose processor may be a central processing unit (CPU). The processor 1410 may include at least one circuit to perform all or part of the steps of the memory security management method provided in the embodiment shown in FIG. 4 or FIG. 9.
其中,通信接口1430包括输入/输出(input/output,I/O)接口、物理接口和逻辑接口等用于实现网络设备1400内部的器件互连的接口,以及用于实现网络设备1400与其他设备(例如其他网络设备或用户设备)互连的接口。物理接口可以是以太网接口,光纤接口,ATM接口等。The communication interface 1430 includes an input/output (I/O) interface, a physical interface, and a logical interface, etc., which are used to interconnect devices within the network device 1400, and an interface for interconnecting the network device 1400 with other devices (such as other network devices or user equipment). The physical interface can be an Ethernet interface, a fiber optic interface, an ATM interface, etc.
其中,总线1440可以是任何类型的,用于实现处理器1410、存储器1420和通信接口1430互连的通信总线,例如系统总线。The bus 1440 may be any type of communication bus for interconnecting the processor 1410 , the memory 1420 , and the communication interface 1430 , such as a system bus.
上述器件可以分别设置在彼此独立的芯片上,也可以至少部分的或者全部的设置在同一块芯片上。将各个器件独立设置在不同的芯片上,还是整合设置在一个或者多个芯片上,往往取决于产品设计的需要。本申请实施例对上述器件的具体实现形式不做限定。 The above devices may be arranged on independent chips, or at least partially or completely on the same chip. Whether to arrange each device independently on different chips or to integrate them on one or more chips often depends on the needs of product design. The embodiments of the present application do not limit the specific implementation form of the above devices.
图14所示的网络设备1400仅仅是示例性的,在实现过程中,网络1400还可以包括其他组件,本文不再一一列举。The network device 1400 shown in FIG. 14 is merely exemplary. During implementation, the network 1400 may further include other components, which are not listed one by one in this document.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本发明实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如,固态硬盘(solid state disk,SSD))等。In the above embodiments, it can be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented by software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the process or function according to the embodiment of the present invention is generated in whole or in part. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions can be transmitted from one website site, computer, server or data center to another website site, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by the computer or a data storage device such as a server or data center that includes one or more available media integrated. The available medium can be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid state drive (SSD)), etc.
可以理解的是,在本申请的实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。应理解,在本申请实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,不应对本申请实施例的实施过程构成任何限定。It is understood that the various numerical numbers involved in the embodiments of the present application are only for the convenience of description and are not used to limit the scope of the embodiments of the present application. It should be understood that in the embodiments of the present application, the size of the sequence number of the above-mentioned processes does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
以上的具体实施方式,对本申请的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上仅为本发明的具体实施方式而已,并不用于限定本申请的保护范围,凡在本申请的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本申请的保护范围之内。 The above specific implementation methods further illustrate the purpose, technical solutions and beneficial effects of the present application in detail. It should be understood that the above are only specific implementation methods of the present invention and are not intended to limit the scope of protection of the present application. Any modifications, equivalent substitutions, improvements, etc. made on the basis of the technical solutions of the present application should be included in the scope of protection of the present application.

Claims (13)

  1. 一种内存安全管理方法,其特征在于,所述方法包括:A memory security management method, characterized in that the method comprises:
    在第一程序中的风险语句之前插入检查指针属性的检查语句;所述风险语句是所述第一程序中调用指针进行内存访问的语句,所述指针属性是所述风险语句中指针的属性;Inserting a check statement for checking pointer attributes before a risk statement in the first program; the risk statement is a statement in the first program that calls a pointer for memory access, and the pointer attribute is an attribute of the pointer in the risk statement;
    对第一转换后程序和所述检查语句分别进行冗余代码消除,得到第一消除后程序和第二消除后程序;其中,第一转换后程序是对所述第一程序进行编译产生的中间表示;Eliminating redundant codes from the first converted program and the check statement respectively to obtain a first eliminated program and a second eliminated program; wherein the first converted program is an intermediate representation generated by compiling the first program;
    对所述第二消除后程序实施编译过程中机器指令的转换,得到第二转换后程序;Performing the conversion of machine instructions in the compilation process on the second eliminated program to obtain a second converted program;
    至少基于所述第二转换后程序、第一消除后程序,生成目标执行文件;其中,所述目标执行文件用于在运行时生成故障信息,所述故障信息包含第一风险语句中指针的指针属性,所述第一风险语句是所述第一程序中的风险语句中的一个。A target execution file is generated based at least on the second converted program and the first eliminated program; wherein the target execution file is used to generate fault information at runtime, and the fault information includes a pointer attribute of a pointer in a first risk statement, and the first risk statement is one of the risk statements in the first program.
  2. 根据权利要求1所述的方法,其特征在于,所述对第一转换后程序和所述检查语句分别进行冗余代码消除,包括:The method according to claim 1, characterized in that the step of eliminating redundant codes from the first converted program and the check statement respectively comprises:
    通过对所述检查语句的检查对象和检查范围进行判断,将所述检查语句中的全相同冗余代码及全偏序冗余代码消除,得到第三消除后程序;所述检查语句的检查对象和检查范围至少基于所述风险语句得到;By judging the inspection object and inspection scope of the inspection statement, all identical redundant codes and all partially ordered redundant codes in the inspection statement are eliminated to obtain a third eliminated program; the inspection object and inspection scope of the inspection statement are obtained based on at least the risk statement;
    通过对所述第三消除后程序中检查语句的检查对象进行哈希,将所述第三消除后程序中的部分相同冗余代码及部分偏序冗余代码消除,得到第二消除后程序。By hashing the check objects of the check statements in the third eliminated program, some identical redundant codes and some partial-order redundant codes in the third eliminated program are eliminated to obtain a second eliminated program.
  3. 根据权利要求1所述的方法,其特征在于,所述在第一程序中的风险语句之前插入检查指针属性的检查语句,包括:The method according to claim 1, characterized in that the step of inserting a check statement for checking pointer attributes before the risk statement in the first program comprises:
    在所述第一程序的指针定义语句之后插入标注语句;所述标注语句用于获取风险语句中指针的指针属性;Inserting a label statement after the pointer definition statement of the first program; the label statement is used to obtain the pointer attribute of the pointer in the risk statement;
    在标注语句之后插入指针属性存储语句;所述指针属性存储语句用于保存所述标注语句获取的指针属性,所述指针属性存储语句包含指针属性变量,所述指针属性变量是用于表示指针属性的变量;所述指针属性存储语句是编译过程可识别的第一自定义语句;Inserting a pointer attribute storage statement after the annotation statement; the pointer attribute storage statement is used to store the pointer attribute obtained by the annotation statement, the pointer attribute storage statement includes a pointer attribute variable, and the pointer attribute variable is a variable used to represent the pointer attribute; the pointer attribute storage statement is the first custom statement identifiable by the compilation process;
    基于风险语句和所述指针属性存储语句,确定对风险语句进行指针属性检查的检查语句;其中,所述风险语句用于确定检查语句的检查对象,所述指针属性存储语句包含的指针属性变量和所述指针属性存储语句保存的指针属性用于确定检查语句的检查范围;其中,所述检查语句是编译过程可识别的第二自定义语句;Based on the risk statement and the pointer attribute storage statement, a check statement for performing a pointer attribute check on the risk statement is determined; wherein the risk statement is used to determine a check object of the check statement, and the pointer attribute variables contained in the pointer attribute storage statement and the pointer attributes stored in the pointer attribute storage statement are used to determine a check scope of the check statement; wherein the check statement is a second custom statement identifiable by the compilation process;
    在风险语句之前插入检查指针属性的检查语句。Insert a check statement that checks the pointer properties before the risk statement.
  4. 根据权利要求3所述的方法,其特征在于,所述对第一转换后程序和所述检查语句分别进行冗余代码消除之后,还包括:The method according to claim 3, characterized in that after the first converted program and the check statement are respectively subjected to redundant code elimination, the method further comprises:
    将所述指针属性存储语句中的冗余代码进行消除,以及将所述指针属性存储语句中的冗余代码中包含的指针属性变量进行消除,得到第四消除后程序;其中,所述指针属性存储语句中的冗余代码保存的指针属性,以及所述指针属性存储语句中的冗余代码中包含的指针属性变量,用于确定所述检查语句中的冗余代码的检查范围;Eliminating the redundant code in the pointer attribute storage statement and eliminating the pointer attribute variable contained in the redundant code in the pointer attribute storage statement to obtain a fourth post-elimination program; wherein the pointer attribute stored in the redundant code in the pointer attribute storage statement and the pointer attribute variable contained in the redundant code in the pointer attribute storage statement are used to determine the inspection range of the redundant code in the inspection statement;
    所述至少基于所述第二转换后程序、第一消除后程序,生成目标执行文件,包括:The generating a target executable file based at least on the second converted program and the first eliminated program comprises:
    基于所述第二转换后程序、第一消除后程序和第四消除后程序,生成目标执行文件。A target execution file is generated based on the second converted program, the first eliminated program, and the fourth eliminated program.
  5. 根据权利要求1-4任一所述的方法,其特征在于,所述第一程序是使用C或类C语言编写的程序。The method according to any one of claims 1 to 4 is characterized in that the first program is a program written in C or a C-like language.
  6. 一种内存安全管理设备,其特征在于,所述设备包括:A memory security management device, characterized in that the device comprises:
    处理模块,用于在第一程序中的风险语句之前插入检查指针属性的检查语句;所述风险语句是所述第一程序中调用指针进行内存访问的语句,所述指针属性是所述风险语句中指针的属性;A processing module, configured to insert a check statement for checking pointer attributes before a risk statement in a first program; the risk statement is a statement in the first program that calls a pointer for memory access, and the pointer attribute is an attribute of the pointer in the risk statement;
    所述处理模块,还用于对第一转换后程序和所述检查语句分别进行冗余代码消除,得到第一消除后程 序和第二消除后程序;其中,第一转换后程序是对所述第一程序进行编译产生的中间表示;The processing module is further used to eliminate redundant codes from the first converted program and the check statement respectively to obtain a first eliminated program. The first converted program is an intermediate representation generated by compiling the first program;
    所述处理模块,还用于对所述第二消除后程序实施编译过程中机器指令的转换,得到第二转换后程序;The processing module is further used to convert the machine instructions in the compilation process on the second eliminated program to obtain a second converted program;
    所述处理模块,还用于至少基于所述第二转换后程序、第一消除后程序,生成目标执行文件;其中,所述目标执行文件用于在运行时生成故障信息,所述故障信息包含第一风险语句中指针的指针属性,所述第一风险语句是所述第一程序中的风险语句中的一个。The processing module is also used to generate a target execution file based on at least the second converted program and the first eliminated program; wherein the target execution file is used to generate fault information at runtime, and the fault information includes a pointer attribute of a pointer in a first risk statement, and the first risk statement is one of the risk statements in the first program.
  7. 根据权利要求6所述的设备,其特征在于,所述处理模块对第一转换后程序和所述检查语句分别进行冗余代码消除时,用于:The device according to claim 6, characterized in that when the processing module performs redundant code elimination on the first converted program and the check statement respectively, it is used to:
    通过对所述检查语句的检查对象和检查范围进行判断,将所述检查语句中的全相同冗余代码及全偏序冗余代码消除,得到第三消除后程序;所述检查语句的检查对象和检查范围至少基于所述风险语句得到;By judging the inspection object and inspection scope of the inspection statement, all identical redundant codes and all partially ordered redundant codes in the inspection statement are eliminated to obtain a third eliminated program; the inspection object and inspection scope of the inspection statement are obtained based on at least the risk statement;
    通过对所述第三消除后程序中检查语句的检查对象进行哈希,将所述第三消除后程序中的部分相同冗余代码及部分偏序冗余代码消除,得到第二消除后程序。By hashing the check objects of the check statements in the third eliminated program, some identical redundant codes and some partial-order redundant codes in the third eliminated program are eliminated to obtain a second eliminated program.
  8. 根据权利要求6所述的设备,其特征在于,所述处理模块在第一程序中的风险语句之前插入检查指针属性的检查语句时,用于:The device according to claim 6, characterized in that when the processing module inserts a check statement for checking pointer attributes before the risk statement in the first program, it is used to:
    在所述第一程序的指针定义语句之后插入标注语句;所述标注语句用于获取风险语句中指针的指针属性;Inserting a label statement after the pointer definition statement of the first program; the label statement is used to obtain the pointer attribute of the pointer in the risk statement;
    在标注语句之后插入指针属性存储语句;所述指针属性存储语句用于保存所述标注语句获取的指针属性,所述指针属性存储语句包含指针属性变量,所述指针属性变量是用于表示指针属性的变量;所述指针属性存储语句是编译过程可识别的第一自定义语句;Inserting a pointer attribute storage statement after the annotation statement; the pointer attribute storage statement is used to store the pointer attribute obtained by the annotation statement, the pointer attribute storage statement includes a pointer attribute variable, and the pointer attribute variable is a variable used to represent the pointer attribute; the pointer attribute storage statement is the first custom statement identifiable by the compilation process;
    基于风险语句和所述指针属性存储语句,确定对风险语句进行指针属性检查的检查语句;其中,所述风险语句用于确定检查语句的检查对象,所述指针属性存储语句包含的指针属性变量和所述指针属性存储语句保存的指针属性用于确定检查语句的检查范围;其中,所述检查语句是编译过程可识别的第二自定义语句;Based on the risk statement and the pointer attribute storage statement, a check statement for performing a pointer attribute check on the risk statement is determined; wherein the risk statement is used to determine a check object of the check statement, and the pointer attribute variables contained in the pointer attribute storage statement and the pointer attributes stored in the pointer attribute storage statement are used to determine a check scope of the check statement; wherein the check statement is a second custom statement identifiable by the compilation process;
    在风险语句之前插入检查指针属性的检查语句。Insert a check statement that checks the pointer properties before the risk statement.
  9. 根据权利要求8所述的设备,其特征在于,所述处理模块对第一转换后程序和所述检查语句分别进行冗余代码消除之后,用于:The device according to claim 8, characterized in that after the processing module eliminates redundant codes on the first converted program and the check statement respectively, it is used to:
    将所述指针属性存储语句中的冗余代码进行消除,以及将所述指针属性存储语句中的冗余代码中包含的指针属性变量进行消除,得到第四消除后程序;其中,所述指针属性存储语句中的冗余代码保存的指针属性,以及所述指针属性存储语句中的冗余代码中包含的指针属性变量,用于确定所述检查语句中的冗余代码的检查范围;Eliminating the redundant code in the pointer attribute storage statement and eliminating the pointer attribute variable contained in the redundant code in the pointer attribute storage statement to obtain a fourth post-elimination program; wherein the pointer attribute stored in the redundant code in the pointer attribute storage statement and the pointer attribute variable contained in the redundant code in the pointer attribute storage statement are used to determine the inspection range of the redundant code in the inspection statement;
    所述处理模块至少基于所述第二转换后程序、第一消除后程序生成目标执行文件时,用于:基于所述第二转换后程序、第一消除后程序和第四消除后程序,生成目标执行文件。When the processing module generates the target execution file based at least on the second converted program and the first eliminated program, it is used to: generate the target execution file based on the second converted program, the first eliminated program and the fourth eliminated program.
  10. 根据权利要求6-9任一所述的设备,其特征在于,所述第一程序是使用C或类C语言编写的程序。The device according to any one of claims 6 to 9 is characterized in that the first program is a program written in C or a C-like language.
  11. 一种电子设备,其特征在于,包括:至少一个存储器,用于存储程序;至少一个处理器,用于执行存储器存储的程序;其中,当存储器存储的程序被执行时,处理器用于执行如实现权利要求1-5任一所述的方法。An electronic device, characterized in that it includes: at least one memory for storing programs; and at least one processor for executing the programs stored in the memory; wherein, when the program stored in the memory is executed, the processor is used to execute the method as described in any one of claims 1-5.
  12. 一种计算机可读存储介质,其特征在于,包括指令,当所述指令在计算机上运行时,使得所述计算机执行如实现权利要求1-5任一所述的方法。A computer-readable storage medium, characterized in that it includes instructions, and when the instructions are executed on a computer, the computer is caused to execute the method for implementing any one of claims 1-5.
  13. 一种计算机程序产品,其特征在于,包括程序代码,当计算机运行所述计算机程序产品时,使得所述计算机执行如实现权利要求1-5任一所述的方法。 A computer program product, characterized in that it includes program code, when a computer runs the computer program product, the computer executes the method as described in any one of claims 1-5.
PCT/CN2023/103819 2022-11-30 2023-06-29 Memory security management method and device WO2024113831A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211535703.5 2022-11-30
CN202211535703.5A CN118113291A (en) 2022-11-30 2022-11-30 Memory security management method and equipment

Publications (1)

Publication Number Publication Date
WO2024113831A1 true WO2024113831A1 (en) 2024-06-06

Family

ID=91207521

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/103819 WO2024113831A1 (en) 2022-11-30 2023-06-29 Memory security management method and device

Country Status (2)

Country Link
CN (1) CN118113291A (en)
WO (1) WO2024113831A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1188933A (en) * 1998-02-06 1998-07-29 深圳市华为技术有限公司 Recognition method for internal stored operation error in programming
US20080134159A1 (en) * 2006-12-05 2008-06-05 Intel Corporation Disambiguation in dynamic binary translation
US8060869B1 (en) * 2007-06-08 2011-11-15 Oracle America, Inc. Method and system for detecting memory problems in user programs
CN102243609A (en) * 2011-06-15 2011-11-16 惠州运通信息技术有限公司 Embedded software-based test analysis method and system
CN106940654A (en) * 2017-02-15 2017-07-11 南京航空航天大学 The automatic detection and localization method of EMS memory error in source code

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1188933A (en) * 1998-02-06 1998-07-29 深圳市华为技术有限公司 Recognition method for internal stored operation error in programming
US20080134159A1 (en) * 2006-12-05 2008-06-05 Intel Corporation Disambiguation in dynamic binary translation
US8060869B1 (en) * 2007-06-08 2011-11-15 Oracle America, Inc. Method and system for detecting memory problems in user programs
CN102243609A (en) * 2011-06-15 2011-11-16 惠州运通信息技术有限公司 Embedded software-based test analysis method and system
CN106940654A (en) * 2017-02-15 2017-07-11 南京航空航天大学 The automatic detection and localization method of EMS memory error in source code

Also Published As

Publication number Publication date
CN118113291A (en) 2024-05-31

Similar Documents

Publication Publication Date Title
US7890941B1 (en) Binary profile instrumentation framework
US7571427B2 (en) Methods for comparing versions of a program
US7346486B2 (en) System and method for modeling, abstraction, and analysis of software
JP6524021B2 (en) Parsed header for compilation
US8762949B2 (en) Method and apparatus for incremental analysis of one or more properties of a program
CN107704382B (en) Python-oriented function call path generation method and system
US7698692B1 (en) Preparing a binary file for future instrumentation
Prähofer et al. Opportunities and challenges of static code analysis of IEC 61131-3 programs
Kirby Reflection and hyper-programming in persistent programming systems
US20040205720A1 (en) Augmenting debuggers
CN100405294C (en) System, method and program product to optimize code during run time
US20040117771A1 (en) Preprocessor-based source code instrumentation
US11579856B2 (en) Multi-chip compatible compiling method and device
CN104573503B (en) The detection method and device that a kind of internal storage access overflows
JPS62164136A (en) Data base access control system
JP2001166949A (en) Method and device for compiling source code by using symbolic execution
US20230113783A1 (en) Cross-platform code conversion method and device
CN111736846B (en) Dynamic analysis-oriented source code instrumentation improvement method
US10839124B1 (en) Interactive compilation of software to a hardware language to satisfy formal verification constraints
CN115809063A (en) Storage process compiling method, system, electronic equipment and storage medium
Gao et al. APIfix: output-oriented program synthesis for combating breaking changes in libraries
CN114356964A (en) Data blood margin construction method and device, storage medium and electronic equipment
WO2024113831A1 (en) Memory security management method and device
CN112445706A (en) Program abnormal code acquisition method and device, electronic equipment and storage medium
CN111966578A (en) Automatic evaluation method for android compatibility defect repair effect