CN114610320A - LLVM-based variable type information repairing and comparing method and system - Google Patents

LLVM-based variable type information repairing and comparing method and system Download PDF

Info

Publication number
CN114610320A
CN114610320A CN202210279549.3A CN202210279549A CN114610320A CN 114610320 A CN114610320 A CN 114610320A CN 202210279549 A CN202210279549 A CN 202210279549A CN 114610320 A CN114610320 A CN 114610320A
Authority
CN
China
Prior art keywords
type
variable
llvm
information
intermediate representation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210279549.3A
Other languages
Chinese (zh)
Inventor
纪守领
刘丁豪
何钦铭
陈建海
刘二腾
许端清
王文海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202210279549.3A priority Critical patent/CN114610320A/en
Publication of CN114610320A publication Critical patent/CN114610320A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses a variable type information repairing and comparing method and a system based on LLVM (Linear programming model), which comprises variable type information repairing analysis and variable type comparing analysis; the variable type information repair analysis comprises compiling target program source codes into LLVM IR, extracting target variables, matching LLVM IR variable types with source code information and storing type analysis results. The variable type information repair analysis and the variable type comparison analysis are realized through two LLVM analysis flows, and the variable type comparison analysis result applies the variable type information repair analysis result. The method and the system can solve the problems that the type information loss and/or the related type of the complex type exist in the front LLVM IR, so that the type comparison analysis cannot be carried out and the comparison analysis is inaccurate.

Description

LLVM-based variable type information repairing and comparing method and system
Technical Field
The invention belongs to the technical field of software program analysis, and particularly relates to a variable type information repairing and comparing method and system based on LLVM.
Background
With the rapid development of computer software, the code scale and the functional complexity of the software are continuously improved, and the requirements for analyzing the computer software, such as program vulnerability detection, program compiling optimization and the like, are increasing day by day. LLVM is one of the most popular program analysis frameworks at present, can convert source codes of multiple programming languages into LLVM Intermediate Representation (IR) with rich semantic information and uniform format, and supports developers to design and implement custom program analysis streams on the IR, and is widely applied to many fields such as compilation optimization, automatic vulnerability mining, automatic vulnerability repair, patch analysis, clone detection, and the like.
The LLVM reconstructs the type system of its LLVM IR once in version 3.0, and the main framework of the type system is used up to now. In the type system of the current LLVM, the types of all variables are divided into a void type, a function type and a primary type; the primary type comprises a single-value type, a label type, a token type, a metadata type and an aggregation type; the aggregation type includes an array type, a structure type and an opaque structure type. In the set of type system, the comparison between variable types in the same context (LLVMContext) can be completed by pointer comparison, thereby greatly improving the efficiency of program analysis. The variable type comparison task is the basis of a large number of upper-layer program analyses, such as global call graph construction, control flow integrity protection, pointer alias analysis and the like, so that the construction of a complete type comparison method is of great significance.
However, in the process of compiling the source code into LLVM IR, there are cases where type information is lost, such as: after the partial structure type and the function type are compiled into LLVM IR, a function pointer field in the structure or partial parameters of the function are compiled into a null pointer type; and the structure name of part of the structure type is lost. In addition, the type system of LLVM IR does not design a separate type for a complex (unity) type in C/C + + language, and when compiling a complex type variable, it will be treated as a structure type variable, and then when in use, the variable will be switched to the required type through type conversion. When a structure type contains a domain of a complex type, the same structure type variables that result in different contexts using may have different types of domain member variables. The above-mentioned problems may significantly affect the type comparison task, and cause the equivalent type that should be recognized as the equivalent type to be actually recognized as the non-equivalent type, thereby further causing the upper layer task based on the type analysis to have the situations of false alarm, missing report or analysis error (such as pointer alias analysis based on the type analysis, indirect call target analysis based on the type analysis, etc.). Such problems can also seriously threaten program security and stability if the upper layer tasks are security related tasks (e.g., control flow integrity).
In view of the above problems, the existing implementation schemes are not complete. The type comparison method implemented inside the LLVM designs different comparison strategies aiming at different types, but does not perform additional check and processing on type information loss and type comparison related to the union type; the indirect calling target recognition tool TypeDive based on the multi-layer type analysis compares types among different contexts by comparing character string information represented by the types, has higher comparison efficiency for simple types such as single-value types and label types, and the like, but also cannot deal with type comparison related to type information loss and union types.
Disclosure of Invention
In view of the above, an object of the present invention is to provide a method and a system for repairing and comparing variable type information based on LLVM, so as to solve the problems that type comparison analysis cannot be performed and the comparison analysis is inaccurate due to the absence of type information and/or related types of complex types in the current LLVM IR.
To achieve the above object, an embodiment provides a method for repairing and comparing variable type information based on LLVM, including the following steps:
step 1, acquiring and compiling a source code of a target program into an LLVM intermediate representation with debugging information;
step 2, extracting target variables from the LLVM intermediate representation, wherein the target variables comprise a structural body which is related to an analysis task and related to the absence of type information or a structural body of which the type is a union type;
step 3, acquiring a source code structural body and a source code definition type thereof corresponding to the structural body and the intermediate representation type thereof contained in the target variable in the source code of the target program according to the debugging information, comparing and analyzing the intermediate representation type of the structural body and the source code definition type of the corresponding source code structural body, and outputting a structural body pair consisting of the structural body and the corresponding source code structural body with inconsistent comparison results;
step 4, for each structure body pair, carrying out variable type information repair by using the source code definition type of the source code structure body and storing the variable type information in a repair database;
and 5, when the intermediate representation type comparison analysis is carried out on the two variables to be compared and analyzed, calling the structural body information stored in the repair database to carry out missing type information repair on the structural body, and then carrying out the intermediate representation type comparison analysis on the variables.
In one embodiment, step 1 comprises:
configuring a compiling environment, and preparing a compiler and a target program source code according to actual requirements;
configuring compiling options of the source codes of the target programs, wherein the compiling options comprise options for starting and retaining debugging information;
and executing a compiling flow, checking the correctness and the integrity of the LLVM intermediate representation after the compiling is finished, and outputting and storing the LLVM intermediate representation with the debugging information after the check is correct.
In one embodiment, step 2, comprises:
step 2-1, extracting LLVM variables needing to be analyzed from the LLVM intermediate representation according to the analysis task;
step 2-2, extracting an intermediate representation type of the LLVM variable in the LLVM intermediate representation, and screening a pointer type containing a structure body, an array type containing the structure body and the LLVM variable corresponding to the structure body type from the intermediate representation type to be used as a candidate LLVM variable;
and 2-3, screening variables with structural body type information missing or containing a union type from the candidate LLVM variables as target variables and outputting the variables.
In one embodiment, before extracting the target variable from the LLVM intermediate representation, the method further comprises: checking the read version information and debugging information represented in the middle of the LLVM, extracting target variables when the version information is matched with the current analysis framework and the debugging information exists, and otherwise, terminating the extraction of the target variables and sending an alarm to request manual processing.
Step 3 in one embodiment comprises:
step 3-1, acquiring a target variable, debugging information corresponding to the target variable and an intermediate representation type;
step 3-2, when the intermediate representation type is judged to be the pointer type, the array type or the structure body type, executing the step 3-2 to the step 3-6; otherwise, the comparison range is considered to be exceeded, and the type comparison result is considered to be consistent in type;
3-3, when the intermediate representation type is the pointer type, acquiring the type of the pointer pointing variable, extracting the corresponding variable corresponding to the pointing variable and the source code definition type of the corresponding variable from the target program source code according to the debugging information, taking the type of the pointing variable as the intermediate representation type, and skipping to execute the step 3-2;
3-4, when the intermediate representation type is an array type, acquiring the type of the array member variable, extracting a corresponding variable corresponding to the array member variable and a source code definition type of the corresponding variable from a target program source code according to debugging information, taking the type of the array member variable as the intermediate representation type, and skipping to execute the step 3-2;
3-5, when the intermediate representation type is the structure type, obtaining the structure and the type thereof contained in the target variable, extracting the source code structure corresponding to the structure and the source code definition type thereof from the source code of the target program according to the debugging information, and entering the step 3-6; then, acquiring the type of the child member variable of the structure body, extracting a corresponding variable corresponding to the child member variable and a source code definition type of the corresponding variable from a target program source code according to debugging information, taking the type of the child member variable as an intermediate representation type, and skipping to execute the step 3-2;
and 3-6, comparing the type of the analysis structure body with the source code definition type of the source code structure body, and outputting a structure body pair consisting of the structure body and the source code structure body if the type of the analysis structure body is inconsistent, wherein the type inconsistency comprises type inconsistency caused by the fact that the type name of the structure body contains union and type inconsistency caused by the fact that the type of the structure body is absent.
In one embodiment, step 4, comprises:
when the type information of the structure body in the structure body pair is lost, the source code definition type of the source code structure body is used as the lost type of the structure body to realize the repair of the type information, the middle representation type of the structure body is used as Key, the source code definition type of the corresponding structure body is used as Value, and the Value is stored in a repair database in a K-V Key Value pair mode;
when the type of the structure in the structure pair is a complex type, the complex type is expressed by a user-defined character string, the intermediate expression type of the structure is used as Key, the user-defined character string is used as Value, and the Key-Value pair is stored in a repair database in a K-V Key-Value pair mode.
Step 5 in one embodiment comprises:
5-1, the upper layer program analysis task extracts the intermediate representation types of two variables to be analyzed and compared in the LLVM intermediate representation corresponding to the target program source code;
step 5-2, calling a type comparison method carried by the LLVM analysis framework to compare and analyze the intermediate representation types of the two variables, outputting a comparison result if the comparison results are consistent, and executing the step 5-3 if the comparison results are inconsistent;
step 5-3, when the intermediate representation type comprises the structure type, if the type name information of the two structure types is not null, comparing the structure type name information after removing struct in the structure type name, and if the comparison result is consistent, outputting the comparison result; if the type information of the structure type is empty, executing step 5-4;
step 5-4, inquiring structural body information from a repair database aiming at the structural body type with empty type information, and if the structural body information corresponding to the structural body type with empty type information cannot be found, considering that the comparison result is inconsistent and outputting the result; if the structural body information corresponding to the structural body type with the empty type information can be found, executing the step 5-5;
and 5-5, judging that the type information contained in the called structural body information is a custom character string, considering that the comparison result is unknown and outputting the comparison result, judging that the type information contained in the called structural body information is a non-custom character string, comparing the structural body type name information after the type information is used as the type name information of the structural body type with empty type name information, and outputting the comparison result.
To achieve the above object, an embodiment of the present invention provides a variable type information repairing and comparing system based on LLVM, including:
the compiling module is used for acquiring and compiling a source code of a target program into LLVM intermediate representation with debugging information;
an extraction module, configured to extract a target variable from the LLVM intermediate representation, where the target variable includes a structure related to an analysis task and related to a missing type information or a structure of which the type is a complex type;
the type matching module is used for acquiring a source code structural body and a source code definition type thereof corresponding to the structural body and the intermediate representation type contained in the target variable in the source code of the target program according to the debugging information, comparing and analyzing the intermediate representation type of the structural body and the source code definition type of the corresponding source code structural body, and outputting a structural body pair consisting of the structural body and the corresponding source code structural body with inconsistent comparison results;
the type repairing module is used for repairing variable type information by using the source code definition type of the source code structure body aiming at each structure body pair and storing the variable type information in a repairing database;
and the analysis comparison module is used for calling the structural body information stored in the repair database to repair the missing type information of the structural body and then performing the intermediate representation type comparison analysis of the variables when the intermediate representation type comparison analysis is performed on the two variables to be compared and analyzed.
Compared with the prior art, the invention has the beneficial effects that at least:
(1) the repairing of the variable type information missing in the LLVM IR is completed by analyzing in combination with the intermediate representation of the target program source code and the LLVM, and the repaired type name is used for identifying and comparing the composite type information, so that the method has higher accuracy and analysis robustness.
(2) Before the intermediate representation type of the variable is repaired, the range of the variable to be analyzed is reduced through target variable extraction and intermediate representation type check, only the variable concerned by a user and possibly having information loss is analyzed, and meanwhile, the user can add a customized variable screening rule in the part, so that the method has high efficiency.
(3) The variable type information repairing and comparing system based on the LLVM has high portability, the type information repairing and comparing are realized through an LLVM analysis flow (LLVM Pass), the upper layer analysis tasks based on various different types of LLVM analysis frameworks are supported to be embedded, developers of related programs can use the LLVM analysis frameworks without modifying the LLVM analysis frameworks, and flexible type comparison function services are provided.
(4) The provided variable type information repairing and comparing method and system have high efficiency, the type information repairing analysis flow only needs to be executed once aiming at a set of LLVM intermediate representation, the analysis result can be reused for many times in the subsequent type comparison process by carrying out external storage, the existing source code or LLVM intermediate representation is not modified in the analysis process, and other program analysis tools based on LLVM cannot be influenced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow diagram of an LLVM-based variable type information repair and comparison method according to an embodiment;
FIG. 2 is a flow chart of target variable extraction provided by an embodiment;
fig. 3 is a flowchart of an LLVM-based variable type information repair and comparison system provided by an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
The embodiment provides a variable type information repairing and comparing method and system based on an LLVM (Linear programming model), which mainly comprise a variable type information repairing part and a variable type comparison analyzing part, wherein the two parts are realized through an LLVM analysis flow, and specifically, the variable type information repairing analysis flow is mainly used for analyzing variables needing type repairing in a target program, completing a type repairing task and storing a type repairing result; and the variable type comparison and analysis flow is used for reading in a variable type comparison request sent by the upper program analysis task, completing the variable type comparison task and returning a variable comparison and analysis result.
Fig. 1 is a flowchart of an LLVM-based variable type information repair and comparison method according to an embodiment. As shown in fig. 1, the method for repairing and comparing variable type information based on LLVM according to the embodiment includes the following steps:
step 1, acquiring and compiling a target program source code into an LLVM intermediate representation with debugging information.
In an embodiment, compiling the target program source code into LLVM IR comprises: and providing a target software source code to be analyzed by a user, configuring compiling information meeting related requirements, starting a debugging information option reserved during compiling, compiling the target source code into LLVM IR, and storing an LLVM IR file obtained after compiling.
When configuring the compiling environment, the user needs to prepare a suitable version of compiler according to actual needs, wherein the compiler includes but is not limited to using a Clang compiler.
When a user configures the compiling option, the reserved debugging information option needs to be started, and other compiling option information is configured according to the actual requirement of the user. Wherein, the manner of enabling the option of keeping debugging information includes but is not limited to adding-g option; methods of configuring other compilation options include, but are not limited to, by configuring Makefile.
And the user starts the compiling process, carries out the compiling process and checks whether the output LLVM IR file is correct and complete after the compiling is finished. And storing the output LLVM IR file to the local after checking without errors. The storage mode includes, but is not limited to, writing into a MySQL database, and writing into a local hard disk or a memory for storage.
And 2, extracting target variables from the LLVM intermediate representation.
In an embodiment, the target variable extraction comprises: the analysis program reads the LLVM IR with the debugging information, scans all the LLVM IRs, and extracts all variables which are related to analysis tasks and have type information missing and all variables related to the type of the complex according to requirements, wherein the type of the variable with the type information missing is a structure variable type, and the variables related to the type of the complex are structure type variables including the unit.
Fig. 2 is a flowchart of target variable extraction provided by the embodiment. As shown in fig. 2, specifically, the target variable extraction includes:
step 2-1, acquisition and inspection of LLVM IR files.
In the embodiment, the analysis program reads in the LLVM IR file to be analyzed, checks LLVM IR version information and checks whether debugging information is included, and if the LLVM IR version information is not in accordance with the LLVM IR version which can be processed by the currently implemented type completion analysis flow or the input LLVM IR file does not include the debugging information, terminates the subsequent analysis flow, issues an alarm and requests manual processing.
And 2-2, extracting the LLVM variable to be analyzed.
In the embodiment, after the LLVM IR file passes the detection, LLVM variables needing to be analyzed are extracted from the LLVM intermediate representation according to the analysis task; the LLVM variables are extracted according to a custom rule designed by an analysis task, wherein the custom rule includes, but is not limited to, extracting all global variables, function definitions and all instructions in the functions.
And 2-3, screening candidate LLVM variables from the LLVM variables.
In the embodiment, the intermediate representation type of the LLVM variable in the LLVM intermediate representation is extracted, the pointer type containing the structure, the array type containing the structure and the LLVM variable corresponding to the structure type are screened from the intermediate representation type to be used as candidate LLVM variables, and the candidate LLVM variables are used as data bases for determining target variables. It should be noted that the LLVM intermediate representation includes not only the LLVM variable but also the intermediate representation type corresponding to the LLVM variable.
In one embodiment, the step of screening candidate LLVM variables from the LLVM variables comprises:
step 2-3-1, when the intermediate representation type is judged to be the pointer type, the array type or the structure body type, executing the step 2-3-2 to the step 2-3-4; otherwise, terminating the judgment;
step 2-3-2, when the intermediate representation type is the structure type, taking an input LLVM variable corresponding to the structure type as a candidate LLVM variable;
step 2-3-3, when the intermediate representation type is a pointer type, acquiring the type of a pointer pointing variable, and when the type of the pointing variable is judged to be a structure type, considering the original pointer type to be the pointer type containing the structure, and taking an input LLVM variable corresponding to the pointer type containing the structure as a candidate LLVM variable; otherwise, taking the type of the pointing variable as an intermediate representation type, and skipping to execute the step 2-3-1;
step 2-3-4, when the intermediate representation type is an array type, obtaining the type of the array member variable, and when the type of the array member variable is judged to be a structure type, considering the original array type as the array type containing the structure, and taking the input LLVM variable corresponding to the pointer type containing the structure as a candidate LLVM variable; otherwise, taking the type of the array member variable as an intermediate representation type, and skipping to execute the step 2-3-1.
And 2-4, screening target variables from the candidate LLVM variables and outputting the target variables.
In the embodiment, variables with structural body type information missing or containing a complex type are screened from the candidate LLVM variables to be used as target variables and output. The method specifically comprises the following steps:
in one embodiment, the step of screening the candidate LLVM variables for a target variable comprises:
step 2-4-1, when the intermediate representation type is a structure type, checking whether the structure type name is empty, if so, considering that the structure type information is missing, and taking the input candidate LLVM variable as a target variable; if the current LLVM variable is not empty and the name of the structure type comprises the unit, the current LLVM variable is considered to be a complex type, and the input candidate LLVM variable is also taken as a target variable; otherwise, acquiring the types of all the child member variables of the structure body, taking the types of the child member variables as intermediate representation types, and skipping to execute the step 2-4;
step 2-4-2, when the intermediate representation type is a pointer type, checking whether the type name of the pointer pointing to the structure body is null, if so, the structure body type is absent, and taking the input candidate LLVM variable as a target variable; if the pointer is not null and the type name of the pointer pointing to the structure contains union, the pointer is regarded as a complex type, and the candidate LLVM variable which is input in the same way is used as a target variable; otherwise, acquiring the type of the variable pointed by the pointer, taking the type of the variable pointed by the pointer as an intermediate representation type, and skipping to execute the step 2-4;
step 2-4-3, when the intermediate representation type is an array type, checking whether the type name of the array containing the structure body is empty, if so, the structure body type is absent, and taking the input candidate LLVM variable as a target variable; if the data is not empty and the type name of the structure body contained in the array contains. union, the data is regarded as a complex type, the input candidate LLVM variable is also taken as a target variable, otherwise, the type of the array member variable is obtained and taken as an intermediate representation type, and the step 2-4 is skipped to execute;
the candidate LLVM variables comprise a structure type variable, an array type variable and a pointer type variable.
And 3, matching the structural body of the target variable with the corresponding structural body of the source code information to form a structural body pair.
In the embodiment, the corresponding structural body of the structural body included in the target variable in the target program source code and the definition type of the corresponding structural body are obtained according to the debugging information, the type of the structural body is compared and analyzed with the definition type of the corresponding structural body, and the structural body pair consisting of the structural body and the corresponding structural body with inconsistent comparison results is output.
In one embodiment, the matching process of the structure of the target variable and the corresponding structure of the source code information includes:
and 3-1, acquiring the target variable, debugging information corresponding to the target variable and the intermediate representation type. Specifically, the debugging information of the variables can be extracted through the LLVM MDNode.
Step 3-2, when the intermediate representation type is judged to be the pointer type, the array type or the structure body type, executing the step 3-2 to the step 3-6; otherwise, the comparison range is considered to be exceeded, and the type comparison result is considered to be consistent in type;
3-3, when the intermediate representation type is the pointer type, acquiring the type of the pointer pointing variable, extracting the corresponding variable corresponding to the pointing variable and the source code definition type of the corresponding variable from the target program source code according to the debugging information, taking the type of the pointing variable as the intermediate representation type, and skipping to execute the step 3-2;
3-4, when the intermediate representation type is an array type, acquiring the type of the array member variable, extracting a corresponding variable corresponding to the array member variable and a source code definition type of the corresponding variable from a target program source code according to debugging information, taking the type of the array member variable as the intermediate representation type, and skipping to execute the step 3-2;
3-5, when the intermediate representation type is the structure type, obtaining the structure and the type thereof contained in the target variable, extracting the source code structure corresponding to the structure and the source code definition type thereof from the source code of the target program according to the debugging information, and entering the step 3-6; then, acquiring the type of the child member variable of the structure body, extracting a corresponding variable corresponding to the child member variable and a source code definition type of the corresponding variable from a target program source code according to debugging information, taking the type of the child member variable as an intermediate representation type, and skipping to execute the step 3-2;
and 3-6, comparing the type of the analyzed structural body with the defined type of the corresponding structural body, and outputting a structural body pair consisting of the structural body and the corresponding structural body if the type is inconsistent, wherein the inconsistent type comprises inconsistent type caused by the suffix of the type name of the structural body being. unity and inconsistent caused by the missing type of the structural body.
And 4, repairing by using the definition type of the corresponding structural body as the type of the structural body and storing the repaired structural body in a repair database.
In an embodiment, the obtaining of the structural body pair constructed in step 3 and adopting different repairing methods according to the structural body type in the structural body pair include:
when the type information of the structure body in the structure body pair is lost, the source code definition type of the source code structure body is used as the lost type of the structure body to realize the repair of the type information, the middle representation type of the structure body is used as Key, the source code definition type of the corresponding structure body is used as Value, and the Value is stored in a repair database in a K-V Key Value pair mode;
when the type of the structure in the structure pair is a union type and the type has a union character string, the union type is expressed by a user-defined character string, the middle expression type of the structure is taken as Key, the user-defined character string is taken as Value, and the Value is stored in a repair database in a K-V Key Value pair mode.
It should be noted that the structural body information stored in the form of key value pairs may be stored using a MySQL database. It is not excluded that the storing of the structure information is achieved using other methods or techniques. The defined character string may be escape or the like, and is not limited as long as it does not conflict with an existing structure type name.
And 5, calling structural body information in the repair database to repair the intermediate representation type, and then performing comparative analysis on the intermediate representation type.
In the embodiment, when the intermediate representation type comparison analysis is performed on two variables to be compared and analyzed, the structural body information stored in the repair database is called to repair the structural body included in the intermediate representation type for the missing type information, and then the variable intermediate representation type comparison analysis is performed.
In one embodiment, the comparative analysis process comprises:
5-1, the upper layer program analysis task extracts the intermediate representation types of two variables to be analyzed and compared in the LLVM intermediate representation corresponding to the target program source code;
in the embodiment, an analysis program reads a variable type comparison analysis request sent by an upper program analysis task, extracts the LLVM intermediate representation types of two variable types to be compared from the request, and terminates a subsequent process, sends an alarm and requests manual processing if the type extraction fails; otherwise, go to step 5-2. The upper-layer program analysis tasks comprise other program analysis tasks needing variable type comparison, such as indirect call analysis based on type analysis, alias analysis based on type information and the like, and the analysis tasks send type comparison requests to variable type comparison analysis streams.
Step 5-2, calling a type comparison method carried by the LLVM analysis framework to compare and analyze the intermediate representation types of the two variables, outputting a comparison result if the comparison results are consistent, and executing the step 5-3 if the comparison results are inconsistent;
in an embodiment, the LLVM's own type comparison method uses the function comparator:: cmpType () method for comparison.
Step 5-3, when the intermediate representation type comprises the structure type, if the type name information of the two structure types is not null, comparing the structure type name information after removing struct in the structure type name, and if the comparison result is consistent, outputting the comparison result; if the type information of the structure type is empty, executing step 5-4;
step 5-4, inquiring structural body information from a repair database aiming at the structural body type with the empty type information, and if structural body information corresponding to the structural body type with the empty type information cannot be found, namely, a Key corresponding to the structural body type with the empty type information cannot be found, considering that the comparison result is inconsistent and outputting the result; if the structure information corresponding to the structure type with the empty type information can be found, namely the Key is found, executing the step 5-5;
and 5-5, judging that the type information contained in the called structural information is a custom character string, namely Value corresponding to Key is a custom character string, considering that the comparison result is unknown (such as unknown) and outputting the unknown result, judging that the type information contained in the called structural information is a non-custom character string, comparing the structural type name information after the type information is used as the type name information of the structural type with empty type name information, and outputting the comparison result.
Fig. 3 is a flowchart of an LLVM-based variable type information repair and comparison system provided by an embodiment. As shown in fig. 3, an embodiment provides a variable type information repairing and comparing system, including:
the compiling module is used for acquiring and compiling a source code of a target program into LLVM intermediate representation with debugging information;
an extraction module, configured to extract a target variable from the LLVM intermediate representation, where the target variable includes a structure related to an analysis task and related to a missing type information or a structure of which the type is a complex type;
the type matching module is used for acquiring a source code structure body and a source code definition type thereof corresponding to the structure body and the intermediate representation type contained in the target variable in the source code of the target program according to the debugging information, comparing and analyzing the intermediate representation type of the structure body and the source code definition type of the corresponding source code structure body, and outputting a structure body pair consisting of the structure body and the corresponding source code structure body with inconsistent comparison results;
the type repairing module is used for repairing variable type information by using a source code definition type of a source code structure body aiming at each structure body pair and storing the variable type information in a repairing database;
and the analysis comparison module is used for calling the structural body information stored in the repair database to repair the missing type information of the structural body and then performing the intermediate representation type comparison analysis of the variables when the intermediate representation type comparison analysis is performed on the two variables to be compared and analyzed.
It should be noted that, when the variable type information repairing and comparing device based on the LLVM provided in the foregoing embodiment performs the variable type information repairing and comparing method, it should be exemplified by the division of the functional modules, and the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the terminal or the server is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the LLVM-based variable type information repairing and comparing device and the LLVM-based variable type information repairing and comparing method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in detail in the LLVM-based variable type information repairing and comparing method embodiments, and are not described herein again.
The variable type information repairing and comparing system based on the LLVM provided by the embodiment supports the operation as an independent LLVM analysis flow, provides a type comparison query interface for other upper-layer tasks needing type comparison analysis in a pluggable mode, and has high portability.
The above-mentioned embodiments are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions, equivalents, etc. made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A variable type information repairing and comparing method based on LLVM is characterized by comprising the following steps:
step 1, acquiring and compiling a source code of a target program into an LLVM intermediate representation with debugging information;
step 2, extracting target variables from the LLVM intermediate representation, wherein the target variables comprise a structural body which is related to an analysis task and related to the absence of type information or a structural body of which the type is a union type;
step 3, acquiring a source code structural body and a source code definition type thereof corresponding to the structural body and the intermediate representation type thereof contained in the target variable in the source code of the target program according to the debugging information, comparing and analyzing the intermediate representation type of the structural body and the source code definition type of the corresponding source code structural body, and outputting a structural body pair consisting of the structural body and the corresponding source code structural body with inconsistent comparison results;
step 4, for each structure body pair, carrying out variable type information repair by using the source code definition type of the source code structure body and storing the variable type information in a repair database;
and 5, when the intermediate representation type comparison analysis is carried out on the two variables to be compared and analyzed, calling the structural body information stored in the repair database to carry out missing type information repair on the structural body, and then carrying out the intermediate representation type comparison analysis on the variables.
2. The LLVM-based variable type information repairing and comparing method according to claim 1, wherein the step 1 comprises:
configuring a compiling environment, and preparing a compiler and a target program source code according to actual requirements;
configuring compiling options of the source codes of the target programs, wherein the compiling options comprise options for starting and retaining debugging information;
and executing a compiling flow, checking the correctness and the integrity of the LLVM intermediate representation after the compiling is finished, and outputting and storing the LLVM intermediate representation with the debugging information after the check is correct.
3. The LLVM-based variable type information repairing and comparing method according to claim 1, wherein the step 2 comprises:
step 2-1, extracting LLVM variables needing to be analyzed from the LLVM intermediate representation according to the analysis task;
step 2-2, extracting an intermediate representation type of the LLVM variable in the LLVM intermediate representation, and screening a pointer type containing a structure body, an array type containing the structure body and the LLVM variable corresponding to the structure body type from the intermediate representation type to be used as a candidate LLVM variable;
and 2-3, screening variables with structural body type information missing or containing a union type from the candidate LLVM variables as target variables and outputting the variables.
4. The LLVM-based variable type information repairing and comparing method according to claim 3, wherein the step 2-2 comprises:
step 2-2-1, when judging that the intermediate representation type is a pointer type, an array type or a structure body type, executing step 2-2-2 to step 2-2-4; otherwise, terminating the judgment;
step 2-2-2, when the intermediate representation type is the structure type, taking an input LLVM variable corresponding to the structure type as a candidate LLVM variable;
step 2-2-3, when the intermediate representation type is a pointer type, acquiring the type of a pointer pointing variable, and when the type of the pointing variable is judged to be a structure type, considering the original pointer type to be the pointer type containing the structure, and taking an input LLVM variable corresponding to the pointer type containing the structure as a candidate LLVM variable; otherwise, taking the type of the pointing variable as an intermediate representation type, and skipping to execute the step 2-2-1;
step 2-2-4, when the intermediate representation type is an array type, obtaining the type of the array member variable, and when the type of the array member variable is judged to be a structure type, considering the original array type as the array type containing the structure, and taking the input LLVM variable corresponding to the pointer type containing the structure as a candidate LLVM variable; otherwise, taking the type of the array member variable as an intermediate representation type, and skipping to execute the step 2-2-1.
5. The LLVM-based variable type information repairing and comparing method according to claim 3, wherein the step 2-3 comprises:
step 2-3-1, when the intermediate representation type is the structure type, checking whether the structure type name is empty, if so, considering that the structure type information is missing, and taking the input candidate LLVM variable as a target variable; if the current LLVM variable is not empty and the name of the structure type comprises the unit, the current LLVM variable is considered to be a complex type, and the input candidate LLVM variable is also taken as a target variable; otherwise, acquiring the types of all the child member variables of the structure body, taking the types of the child member variables as intermediate representation types, and skipping to execute the step 2-3;
step 2-3-2, when the intermediate representation type is a pointer type, checking whether the type name of the pointer pointing to the structure body is null, if so, the structure body type is absent, and taking the input candidate LLVM variable as a target variable; if the pointer is not null and the type name of the pointer pointing to the structure body contains. union, the pointer is regarded as a complex type, and candidate LLVM variables input in the same way are used as target variables; otherwise, acquiring the type of the variable pointed by the pointer, taking the type of the variable pointed by the pointer as an intermediate representation type, and skipping to execute the step 2-3;
step 2-3-3, when the intermediate representation type is an array type, checking whether the type name of the array containing the structure body is empty, if so, the structure body type is absent, and taking the input candidate LLVM variable as a target variable; if the data is not empty and the type name of the structure body contained in the array contains union, the data is regarded as a complex type, the input candidate LLVM variable is also taken as a target variable, otherwise, the type of the array member variable is obtained and taken as an intermediate representation type, and the step 2-3 is skipped to execute;
the candidate LLVM variables comprise a structure type variable, an array type variable and a pointer type variable.
6. The LLVM-based variable type information repairing and comparing method according to claim 1, before extracting the target variable from the LLVM intermediate representation, further comprising: checking the read version information and debugging information represented in the middle of the LLVM, extracting target variables when the version information is matched with the current analysis framework and the debugging information exists, and otherwise, terminating the extraction of the target variables and sending an alarm to request manual processing.
7. The LLVM-based variable type information repairing and comparing method according to claim 1, wherein step 3 comprises:
step 3-1, acquiring a target variable, debugging information corresponding to the target variable and an intermediate representation type;
step 3-2, when the intermediate representation type is judged to be the pointer type, the array type or the structure body type, executing the step 3-2 to the step 3-6; otherwise, the comparison range is considered to be exceeded, and the type comparison result is considered to be consistent in type;
3-3, when the intermediate representation type is the pointer type, acquiring the type of the pointer pointing variable, extracting a corresponding variable corresponding to the pointing variable and a source code definition type of the corresponding variable from a target program source code according to debugging information, taking the type of the pointing variable as the intermediate representation type, and skipping to execute the step 3-2;
3-4, when the intermediate representation type is an array type, acquiring the type of the array member variable, extracting a corresponding variable corresponding to the array member variable and a source code definition type of the corresponding variable from a target program source code according to debugging information, taking the type of the array member variable as the intermediate representation type, and skipping to execute the step 3-2;
3-5, when the intermediate representation type is the structure type, obtaining the structure and the type thereof contained in the target variable, extracting a source code structure corresponding to the structure and a source code definition type thereof from the source code of the target program according to the debugging information, and entering the step 3-6; then, acquiring the type of the child member variable of the structure body, extracting a corresponding variable corresponding to the child member variable and a source code definition type of the corresponding variable from a target program source code according to debugging information, taking the type of the child member variable as an intermediate representation type, and skipping to execute the step 3-2;
and 3-6, comparing the type of the analysis structure body with the source code definition type of the source code structure body, and outputting a structure body pair consisting of the structure body and the source code structure body if the type of the analysis structure body is inconsistent, wherein the type inconsistency comprises type inconsistency caused by the fact that the type name of the structure body contains union and type inconsistency caused by the fact that the type of the structure body is absent.
8. The LLVM-based variable type information repair and comparison method according to claim 1, wherein the step 4 comprises:
when the type information of the structure body in the structure body pair is lost, the source code definition type of the source code structure body is used as the lost type of the structure body to realize the repair of the type information, the middle representation type of the structure body is used as Key, the source code definition type of the corresponding structure body is used as Value, and the Value is stored in a repair database in a K-V Key Value pair mode;
when the type of the structure in the structure pair is the combo type, the combo type is expressed by a user-defined character string, the middle expression type of the structure is used as Key, the user-defined character string is used as Value, and the Value is stored in a repair database in a K-V Key Value pair mode.
9. The LLVM-based variable type information repair and comparison method according to claim 1, wherein the step 5 comprises:
5-1, the upper layer program analysis task extracts the intermediate representation types of two variables to be analyzed and compared in the LLVM intermediate representation corresponding to the target program source code;
step 5-2, calling a type comparison method carried by the LLVM analysis framework to compare and analyze the intermediate representation types of the two variables, outputting a comparison result if the comparison results are consistent, and executing the step 5-3 if the comparison results are inconsistent;
step 5-3, when the intermediate representation type comprises the structure type, if the type name information of the two structure types is not null, comparing the structure type name information after removing struct in the structure type name, and if the comparison result is consistent, outputting the comparison result; if the type information of the structure type is empty, executing the step 5-4;
step 5-4, inquiring structural body information from a repair database aiming at the structural body type with empty type information, and if the structural body information corresponding to the structural body type with empty type information cannot be found, considering that the comparison result is inconsistent and outputting the result; if the structure body information corresponding to the structure body type with the empty type information can be found, executing the step 5-5;
and 5-5, judging that the type information contained in the called structural body information is a custom character string, considering that the comparison result is unknown and outputting the comparison result, judging that the type information contained in the called structural body information is a non-custom character string, comparing the structural body type name information after the type information is used as the type name information of the structural body type with empty type name information, and outputting the comparison result.
10. A LLVM-based variable type information repair and comparison system, comprising:
the compiling module is used for acquiring and compiling a source code of a target program into an LLVM intermediate representation with debugging information;
an extraction module, configured to extract a target variable from the LLVM intermediate representation, where the target variable includes a structure related to an analysis task and related to a missing type information or a structure of which the type is a complex type;
the type matching module is used for acquiring a source code structural body and a source code definition type thereof corresponding to the structural body and the intermediate representation type contained in the target variable in the source code of the target program according to the debugging information, comparing and analyzing the intermediate representation type of the structural body and the source code definition type of the corresponding source code structural body, and outputting a structural body pair consisting of the structural body and the corresponding source code structural body with inconsistent comparison results;
the type repairing module is used for repairing variable type information by using the source code definition type of the source code structure body aiming at each structure body pair and storing the variable type information in a repairing database;
and the analysis comparison module is used for calling the structural body information stored in the repair database to repair the missing type information of the structural body and then performing the intermediate representation type comparison analysis of the variables when the intermediate representation type comparison analysis is performed on the two variables to be compared and analyzed.
CN202210279549.3A 2022-03-21 2022-03-21 LLVM-based variable type information repairing and comparing method and system Pending CN114610320A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210279549.3A CN114610320A (en) 2022-03-21 2022-03-21 LLVM-based variable type information repairing and comparing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210279549.3A CN114610320A (en) 2022-03-21 2022-03-21 LLVM-based variable type information repairing and comparing method and system

Publications (1)

Publication Number Publication Date
CN114610320A true CN114610320A (en) 2022-06-10

Family

ID=81865702

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210279549.3A Pending CN114610320A (en) 2022-03-21 2022-03-21 LLVM-based variable type information repairing and comparing method and system

Country Status (1)

Country Link
CN (1) CN114610320A (en)

Similar Documents

Publication Publication Date Title
US11797298B2 (en) Automating identification of code snippets for library suggestion models
US11354225B2 (en) Automating identification of test cases for library suggestion models
US11494181B2 (en) Automating generation of library suggestion engine models
US11340896B2 (en) Library model addition
US7934205B2 (en) Restructuring computer programs
US20230004368A1 (en) Multi-chip compatible compiling method and device
CN106843849B (en) Automatic synthesis method of code model based on library function of document
US11327722B1 (en) Programming language corpus generation
CN116450616A (en) General heterogeneous relational database SQL migration method based on parse tree
CN111309301B (en) Program language conversion method, device and conversion equipment
CN115310095A (en) Block chain intelligent contract mixed formal verification method and system
CN114610320A (en) LLVM-based variable type information repairing and comparing method and system
CN115438341A (en) Method and device for extracting code loop counter, storage medium and electronic equipment
CN112905232B (en) Program code parallel corpus mining method and system based on syntax analysis tree
CN114625633A (en) Method, system and storage medium for interface testing
CN113434430A (en) SQL query statement performance detection method and device
CN114153447A (en) Method for automatically generating AI training code
CN112445492A (en) ANTLR 4-based source code translation method
CN113568662B (en) Code change influence range analysis method and system based on calling relation
CN116594687A (en) Cross-platform transplanting system and transplanting method for C language source codes
CN115373988A (en) Test case generation method, test method, electronic device, and storage medium
Liu et al. ConFL: Constraint-guided Fuzzing for Machine Learning Framework
CN117851101A (en) Warehouse level code defect automatic repairing method based on large language model
CN116561005A (en) Special control flow jump procedure verification method based on semantic interpretation and legal stack constraint
CN117950671A (en) Code generation method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination