CN117520199B

CN117520199B - Numerical software defect detection method and device based on static symbol execution

Info

Publication number: CN117520199B
Application number: CN202311666744.2A
Authority: CN
Inventors: 梁洪亮; 马冬雨
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2023-12-06
Filing date: 2023-12-06
Publication date: 2024-05-14
Anticipated expiration: 2043-12-06
Also published as: CN117520199A

Abstract

The invention provides a numerical software defect detection method and device based on static sign execution, wherein the method comprises the following steps: inputting a program to be tested, analyzing an abstract syntax tree of the program to be tested, and constructing a control flow graph; traversing the control flow graph by a symbol execution engine in a path sensitive mode, executing a program to be tested in a path-by-path symbolization mode, and generating an expansion graph for recording parameter states in the process of executing the program to be tested in the path sensitive mode; the symbol execution engine contains a portion that supports floating point types; in the process of symbolizing and executing the program to be tested: under the condition that the parameters in the mathematical function are floating point type parameters, classifying the mathematical function, calculating the output value range of the current input parameters based on the classification result, binding the value range with the symbol value of the mathematical function, generating a new state of the parameters, and adding the generated new state into the expansion diagram; and obtaining a floating point defect detection result by using a mixed constraint solver through a value range bound by the symbol value.

Description

Numerical software defect detection method and device based on static symbol execution

Technical Field

The invention relates to the technical field of numerical software defect detection, in particular to a numerical software defect detection method and device based on static sign execution.

Background

The sum of numerical software (Numerical software) standard algorithm programs is an important link for converting a calculation method into social productivity. Numerical software includes a number of numerical values and calculations between numerical values, such as integer and floating point variables, conditional statements, basic operations, mathematical function call related operations and computations, and the like. Numerical software is increasingly used in modern technical fields, and defect detection is very necessary for the numerical software. The static analysis technology is a program analysis technology, can perform program analysis under the condition of not running a program, and has the advantages of high code coverage rate, high speed and low resource expense. The symbolic execution is a popular static analysis technology, and can symbolically execute a program, replace specific values by symbolic values, simulate the running of the program, collect semantic information in the program in the running process of the simulation program, explore reachable paths in the program and analyze hidden errors in the program. However, existing static symbol execution detection tools for numerical software are very few and have limited detection capabilities. First, there are a large number of floating point operations in numerical software, and most symbolic execution engines tend to ignore floating point types and operations due to the constraint solver's restriction on floating point constraints, failing to accurately analyze program behavior and failing to detect floating point type defects (floating point exceptions). Second, the transfer of many floating point number parameters, return values, and floating point operations are implemented by mathematical function calls, and existing static sign execution engines ignore the processing of mathematical function calls, resulting in an inability to accurately infer the path that mathematical function calls are taken (without knowledge of the mathematical function, without exploring the path that the floating point type is taken), and an inability to detect floating point type defects caused by mathematical function calls (without knowledge of the mathematical function, without constraints to detect errors caused by mathematical function operations and floating point exceptions caused by mathematical functions). The problems have serious consequences, namely false alarm and missing alarm caused by light weight and serious disaster caused by heavy weight.

The Clang static analyzer is a static analysis tool based on Clang in the LLVM (Low-Level Virtual Machine, underlying virtual machine) compilation framework, which only supports integer type defect analysis and not floating point type defect analysis. In recent years, code analysis tools supporting floating point reasoning and defect detection, such as Frama-c and Fpse-student, etc., have been proposed in academia and industry at home and abroad to find defects in numerical software. Wherein Frama-C is a C language static analysis tool that can perform static analysis on a program to find potential problems such as null pointer dereferencing, integer overflows, buffer overflows, etc. Frama-c also provides several numerical analysis plug-ins for abstract interpretation of floating point expressions, helping to find potential numerical overflow, zero removal errors, loss of precision, etc. Fpse-student is a sign-based floating point program code analysis tool that can analyze a floating point program, collect branch conditions, and detect floating point exceptions. However, these existing code analysis tools supporting floating point reasoning and defect detection have the defects of incomplete detection, more false positives and more missing positives for floating point defect types, and Fpse-student also has the defects of high analysis time cost and the like.

Therefore, how to improve the integrity of the numerical software defect detection, reduce false alarm and missing report, and reduce the analysis time cost is a problem to be solved at present.

Disclosure of Invention

In view of the above, the present invention provides a method and apparatus for detecting a numerical software defect based on static symbol execution, so as to solve at least one technical problem in the prior art.

One aspect of the present invention provides a numerical software defect detection method based on static symbol execution, the method comprising the steps of:

analyzing an abstract syntax tree of a program to be tested, and constructing a control flow graph;

inputting the program to be tested, loading a defect detector into a symbol execution engine, and placing the symbol execution engine in a control flow inlet of the program to be tested, wherein the defect detector contains defect generation conditions set based on defect definition, the defect generation conditions comprise floating point defect generation conditions set based on floating point defect definition, the symbol execution engine contains a part supporting a floating point type, the part supporting the floating point type comprises a symbol value representation supporting the floating point type, an expression representation supporting the floating point type operation, a memory representation supporting the floating point type, a constraint solver supporting the floating point type and a mathematical function modeling module, and the mathematical function modeling module is used for modeling a mathematical function so that the data function can be identified by the symbol execution engine;

Traversing the control flow graph by a symbol execution engine in a path sensitive mode, executing the program to be tested in a path-by-path symbolized mode to obtain a constraint range of a variable or an expression, collecting semantic information in the program, and generating an expansion graph for recording the state of each node of the program to be tested in the process of executing the program to be tested in the path sensitive mode; in the process of symbolically executing a program to be tested, binding a mathematical function with a symbol value under the condition that the input of a current branch path contains the mathematical function, classifying the input mathematical function based on a function output attribute under the condition that the input parameter of the mathematical function is a floating point type parameter, calculating an output value range corresponding to the input parameter based on a classification result, binding the value range as a constraint range with the symbol value of the mathematical function, generating a new state of the branch path based on the bound value range, and adding the generated new state into an expansion graph;

And using a first constraint solver based on the range to search for an reachable path in the program to be detected by using a constraint range based on the symbol value or the expression, using a second constraint solver to verify a constraint solving result of the first constraint solver based on the range, and searching for the reachable path in the program to be detected by a defect detector based on the verified constraint solving result to obtain a floating point defect detection result.

In some embodiments of the present invention, the symbolic value representations supporting the floating point type include a symbolic value representation of a floating point constant, a floating point variable, and a floating point pointer; the expression representations supporting floating point type operations include those of basic operations, assignment operations, logical operations, mathematical operations, comparison operations, and rounding operations; the memory representation supporting the floating point type includes: memory representations of floating point constants, floating point variables, and floating point pointers.

In some embodiments of the present invention, before symbolizing the execution of the program under test, the method further comprises: determining whether the input is from an external source of contamination, and in the event that the input is determined to be from the external source of contamination, modeling the external source of contamination to construct a symbol value corresponding to the source of contamination for taint identification.

In some embodiments of the invention, the method further comprises: and tracking the smear propagation, generating a smear propagation path from a smear source to a smear anchor, and generating intra-process and inter-process data streams by using a data stream analysis mode.

In some embodiments of the invention, the method further comprises: when the symbol execution engine operates the variable, an interval operation mode constructed based on an interval operation strategy is adopted, and an operated output value range interval is generated according to the interval of the variable and is used as the current operated output value range.

In some embodiments of the present invention, classifying the input mathematical function based on the function output attribute, and calculating the output value range corresponding to the input parameter based on the classification result includes: the mathematical functions are classified based on monotonicity, periodicity, parity, asymptotics, extremum and/or identity of the mathematical functions, and the output value range of the input parameters is determined as a constraint range according to the classification result.

In some embodiments of the invention, the second constraint solver is an SMT constraint solver.

In some embodiments of the present invention, the exploring the reachable paths in the program under test based on the verified constraint solving result, to obtain the floating point defect detection result includes: updating the expansion diagram based on the constraint solving result, determining the generation position of the defect triggering condition of the defect generating condition, and generating a defect report.

In some embodiments of the invention, the validating the constraint solving result of the range-based first constraint solver using the second constraint solver comprises: and under the condition that the constraint solving result of the first constraint solver triggers the defect generating condition, verifying the constraint solving result of the first constraint solver based on the second constraint solver and updating the constraint solving result.

Another aspect of the present invention provides a numerical software defect detection system based on static sign execution, the system comprising: a processor and a memory, said memory having stored therein computer instructions for executing the computer instructions stored in said memory, the system implementing the steps of the method as described above when said computer instructions are executed by the processor.

Another aspect of the invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the method as described above.

The numerical software defect detection method and device based on static sign execution can enable the floating point path to be explored, enable floating point anomalies to be detected rapidly, improve detection capability and greatly improve detection accuracy.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.

It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present invention are not limited to the above-described specific ones, and that the above and other objects that can be achieved with the present invention will be more clearly understood from the following detailed description.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate and together with the description serve to explain the application.

FIG. 1 is a schematic diagram of an overall static analysis tool according to an embodiment of the present invention.

FIG. 2 is a flow chart of a numerical software defect detection method based on static sign execution according to an embodiment of the invention.

FIG. 3 is a block diagram of a digital software defect detection tool according to another embodiment of the present invention.

FIG. 4 is a detailed flow chart of another embodiment of the present invention for implementing numerical software defect detection.

Detailed Description

The present invention will be described in further detail with reference to the following embodiments and the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent. The exemplary embodiments of the present invention and the descriptions thereof are used herein to explain the present invention, but are not intended to limit the invention.

It should be noted here that, in order to avoid obscuring the present invention due to unnecessary details, only structures and/or processing steps closely related to the solution according to the present invention are shown in the drawings, while other details not greatly related to the present invention are omitted.

It should be emphasized that the term "comprises/comprising" when used herein is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.

The existing Clang static analyzer does not support the defect detection of the floating point type and the processing of the mathematical function call, and cannot explore the path where the floating point type is located and cannot detect the defects of the floating point type caused by the mathematical function call and other program behaviors because the floating point type and the mathematical function are not known. Because the existing Clang static analyzer ignores floating point types and operations, an infeasible path is considered to be viable when a floating point condition is encountered, resulting in a large number of false positives and false negatives. The following examples illustrate:

Segment 1 code:

#include<stdio.h>

void func(float x,float y)

{

int*p＝NULL；

if(x+y>10)

{

if(x+y<5)

{

int t1＝*p；

}

//…

return；

}

when the floating point condition x+y >10.0 of the 5 th row is satisfied, the condition x+y <5.0 of the 7 th row is not necessarily satisfied on the basis that the condition is satisfied, the path along which the branch is located is not feasible, and the null pointer dereferencing defect in the path should not be detected. However, when an unknown condition is encountered, the existing Clang static analyzer will assume that the condition of line 7 is true, and detect a null pointer dereferencing defect therein, resulting in false alarm.

Segment 2 code:

int

gsl_sf_bessel_Knu_scaled_asympx_e(const double nu,const double x,gsl_sf_result*

result)

{

double mu＝4.0/nu；

double mum1＝mu-1.0；

double mum9＝mu-9.0；

double pre＝sqrt(M_PI/(2.0*x))；

double r＝log(x)*tan(nu)；

result->val＝pre*(1.0+mum1/(8.0*x)+mum1*mum9/(128.0*x*x))；

result->err＝2.0*GSL_DBL_EPSILON*fabs(result->val)+pre*fabs(0.1*r*r*r)；

return GSL_SUCCESS；

}

line 5 may result in a divide by 0 defect when the incoming parameter nu is 0.0. When both incoming nu and x are very large, the result of multiplying the two mathematical functions at line 9 of the program will be very large. And further exceeds the maximum value that floating points can represent, causing overflow. When the incoming parameter is negative, line 8 of the program will be enabled, and the parameter in the sqrt function call is negative, resulting in an invalid operation. The above floating point exceptions may lead to serious defects such as system crashes, data corruption, or other serious problems. The defects cannot be detected by the existing Clang static analyzer, so that the defects are not reported.

The existing Frama-c and Fpse-student code analysis tools have the defects of incomplete detection of floating point defect types, false alarm, multiple missing reports and the like, and Fpse-student also has the problems of too slow solution and too high time analysis cost. For example, existing Frama-c do not have the ability to update existing symbol values in the reverse direction, i.e., when a range of expressions is collected, the range of related variables cannot be updated in the reverse direction by assigning expressions and function calls, resulting in false positives. For some mathematical functions, the range collection is inaccurate (constraint of too tight or too loose range limits), resulting in false positives. The existing Fpse-student also has the problem of inaccurate range collection, thereby causing false alarm of defects. Furthermore, the mathematical functions supported by Frama-c and Fpse-student are not modeled completely, resulting in the occurrence of defect misinformation. In addition, the conventional Fpse-student has high time cost due to the adoption of an SMT constraint solver with slow constraint solving.

Aiming at the problems that the existing code analysis tool of the numerical software either does not support floating point reasoning and defect detection or is incomplete in detection of floating point defect types, more false positives and false negatives, high analysis time cost and the like, the invention provides a numerical software defect detection method based on static sign execution.

More specifically, the invention is a further improvement on the basis of a Clang static analyzer, the floating point type is supported by adding a floating point infrastructure to both the symbol execution engine and the detector, and the accuracy of constraint is improved by more accurately determining the value range output of the parameter through mathematical function modeling and classification of the mathematical function and binding with the symbol value, so that the accuracy of defect detection is improved. FIG. 1 is a schematic diagram of an overall static analysis tool (a numerical software defect detection device executed based on static symbols) for implementing numerical software defect detection according to an embodiment of the present invention. As shown in fig. 1, the device mainly comprises two parts: a symbol execution engine (hereinafter may be simply referred to as "engine" or "core engine") and a defect detector (hereinafter may be simply referred to as "detector"). The invention adds a part (floating point infrastructure) supporting the floating point type in the engine and the detector, namely, the full support of the floating point type is introduced. In the symbolic execution engine, the added portion supporting the floating point type may include a symbolic value representation supporting the floating point type, an expression representation supporting the floating point type operation, a memory representation supporting the floating point type, a constraint solver supporting the floating point type, and a mathematical function modeling module. For example, for sign values, floating-point constants, floating-point variables, and floating-point pointers are added, these supporting the sign value representation of the floating-point type. For symbolic expressions, symbolic expressions are added to floating-point type operations, examples of which include basic operations, assignment operations, logical operations, mathematical operations, comparison operations, rounding operations, etc., for which the present invention adds corresponding expression representations. And adding the floating point constant, the floating point variable and the memory representation of the floating point pointer to the memory management module. Constraint solving support for floating point constraints is added for the constraint solver. In addition, the mathematical function modeling module is added in the symbolic execution engine, and when a mathematical function call is met, a symbolic value corresponding to the mathematical function can be constructed by utilizing the mathematical function modeling module, so that the symbolic execution engine can identify the mathematical function. In the embodiment of the invention, tens of mathematical functions such as trigonometric functions, exponential functions, logarithmic functions, power functions, rounding functions, inverse trigonometric functions, remainder functions, gamma functions, hyperbolic functions, triangular identity functions, modulo functions and the like are supported in the mathematical function library so as to comprehensively contain mathematical functions of floating point types. When modeling mathematical functions, corresponding mathematical functions can be selected from the mathematical function library for modeling, and symbol values are allocated to bind the mathematical functions. Furthermore, the invention can classify the mathematical function based on the characteristics of output attribute (such as monotonicity, periodicity, parity, asymptotic, extremum and/or identity) of the mathematical function, calculate the output value range of the variable or expression based on the classification result, bind the value range with the symbol value of the mathematical function, and generate new states of parameters based on the bound value range, thereby fixing different constraint ranges according to different classifications, and reducing false alarm and improving the accuracy of the device the more precisely the variable is limited. The invention adds the new state into the expansion diagram, and then continuously updates the program state of the expansion diagram, and carries out path condition reasoning and constraint solving, so that the path of the mathematical function can be explored, floating point abnormality caused by the mathematical function can be detected, and the accuracy and the detection capability of the device are improved.

For the defect detector section, a defect generation condition is set based on the defect definition, the defect generation condition including not only an integer defect generation condition but also a floating point defect generation condition. More specifically, the present invention adds a defect generation condition capable of triggering floating point exception recognition, including a defect generation condition capable of triggering floating point exception recognition caused by a mathematical function, according to the definition of floating point exceptions in the IEEE 754 standard. The floating point exception refers to an error or exception occurring when a floating point number is operated, and generally, the floating point exception is caused by an illegal operation performed on the floating point number, and mainly includes: overflow, underflow, zero removal, invalid operation and inaccuracy. By adding defect generation conditions capable of triggering floating point abnormality identification caused by mathematical functions or other operations in the defect detector, a floating point path and a path where the mathematical functions are located can be explored, floating point abnormalities caused by the mathematical functions and other floating point abnormalities can be detected, and therefore the accuracy and the detection capability of the digital software defect detection device are improved.

Furthermore, the constraint solving module in the symbol execution engine not only supports the floating point type, but also adopts a mixed constraint solver to improve the time efficiency of defect detection, thereby reducing the time cost.

Still further, the present invention enables more accurate collection of constraint ranges by introducing interval arithmetic policies in non-relational numerical abstract domains.

Furthermore, the application introduces the stain analysis, tracks the stain propagation, and uses a data flow analysis mode to generate the data flow in the process and between the processes, thereby avoiding the repeated analysis of the same callee.

Because of the expression and storage of the floating point type variable, the invention can perform the process analysis, namely continuously collect the range of the floating point type variable according to the branch condition, the circulation condition and the like encountered in the program to be tested. And under the condition of carrying out operation on the variable, obtaining a new range of the variable under different operations by utilizing an interval operation strategy and an interval constraint propagation rule in a non-relational numerical abstract domain, and finally sending the new range to a constraint solver module for solving, and exploring an reachable path in a program.

FIG. 2 is a flow chart of a method for detecting defects of numerical software based on static symbols according to an embodiment of the invention. As shown in fig. 2, the numerical software defect detection method includes the following steps:

Step S110, inputting a program to be tested, analyzing an abstract syntax tree of the program to be tested, and constructing a control flow graph.

Firstly, analyzing the source code of an input program to be tested through the grammar analysis of a compiler to obtain an abstract grammar tree of the program to be tested, and constructing a control flow graph based on the analyzed abstract grammar tree. The nodes of the abstract syntax tree correspond to source code on the parse tree and are the basis for creating a control flow graph. The control flow graph is an abstract representation of a program, representing the portion traversed during execution of the program, and may be represented as a directed graph, with each node representing a basic statement block or line of code of the program and the directed edge representing a possible execution path of the program. The control flow graph graphically represents the possible flow of all basic block executions within a process and also reflects the real-time execution of a process. Because the control flow graph constructed based on the abstract syntax tree belongs to the prior art, the description is omitted here.

In step S120, the defect detector is loaded to the symbol execution engine and placed in the control flow inlet of the program to be tested.

The overall static analysis tool (numerical software defect detection device) mainly comprises two parts, namely a symbol execution engine of a core and a defect detector which can be mounted. In the embodiment of the invention, the symbolic execution engine comprises a part supporting a floating point type, wherein the part comprises a symbolic value representation supporting the floating point type, an expression representation supporting the floating point type operation, a memory representation supporting the floating point type, a constraint solver supporting the floating point type and a mathematical function modeling module, and the mathematical function modeling module is used for modeling a mathematical function so that the data function can be identified by the symbolic execution engine. The defect detector comprises a defect mode definition to be detected, wherein the defect mode definition not only comprises defect generation conditions supporting integer defect detection, but also comprises floating point defect generation conditions which are set based on floating point defect definition and can trigger floating point abnormality identification.

The defect detector is loaded to the symbol execution engine and is placed in a control flow inlet of the program to be tested, so that after the program to be tested is input, all the reachable paths can be explored based on the reachable paths of program branches traversed by the control flow graph, each branch of the program control flow is tracked, and the error hidden in each program branch is analyzed by the defect detector.

Step S130, traversing the control flow graph by a symbol execution engine in a path sensitive mode, executing the program to be tested in a path-by-path symbolization mode to obtain a constraint range of a variable or an expression and collect semantic information in the program, and generating an expansion graph for recording the state of each node (statement block or code line) of the program in the process of executing the program to be tested in the path sensitive mode.

The core analysis process of the symbol execution engine is that firstly, integer data identification module (existing), data structure identification module, mathematical function identification module and the like are passed, and then integer data, data structure and mathematical function operation are symbolized to obtain symbol values. For the mathematical function part, in order to enable the mathematical function identification module to identify the mathematical function, the symbol execution engine may select the mathematical function from the mathematical function library by using the mathematical function modeling module in advance to perform modeling of the mathematical function, and construct a symbol value corresponding to the mathematical function, so that the mathematical function identification module may identify the mathematical function after modeling, and thus the symbol execution engine may symbolize mathematical function operation.

The symbol execution engine traverses the control flow graph in a path sensitive mode and executes the program to be tested in a path-by-path symbolized mode. The path sensitivity is to calculate different analysis information according to different predicates (terms expressing object properties or relationships) of conditional branch sentences, that is, the path sensitivity specifically considers the distinction between program branches, and each branch (or called path branch) of the program control flow is tracked to record different program states of each branch. When traversing the control flow graph in a path sensitive mode, the symbolic execution engine symbolically executes the program to be tested according to different path branches, and when encountering integer, floating point and other variables, obtains the constraint range of the variables, expressions and the like, and collects semantic information in the program.

In the embodiment of the invention, the state of each recording position (such as a statement block or a code line) generated in the process of executing the program to be tested is recorded by utilizing an expansion chart, wherein the expansion chart is a chart of recording program states which are further subdivided by a control flow chart according to specific path sensitive execution conditions, and the states (value range) of variables, branch conditions and the like. If the statement block or code line state changes (such as the variable value range changes) or a new state is generated, the state diagram is updated accordingly.

In the process of symbolically executing a program to be tested, under the condition that the input of a currently executed branch path contains a mathematical function, after a symbol execution engine symbolizes the mathematical function to obtain a symbol value of the data function (a symbol value of a variable or expression corresponding to the mathematical function) and binds the mathematical function with the symbol value, whether the input parameter in the mathematical function is a floating point type parameter is further determined, under the condition that the input parameter is determined to be the floating point type parameter, the input mathematical function is classified based on a function output attribute (equivalence class division, namely, all possible input data of the program are divided into a plurality of equivalence classes), an output value range corresponding to the current input parameter is calculated based on a classification result, the value range is bound with the symbol value of the mathematical function, a new state of the parameter is generated based on the bound value range, and the generated new state is added into an expansion diagram.

That is, in the embodiment of the present invention, when encountering a mathematical function call, after the mathematical function recognition module recognizes that the program to be tested contains the mathematical function and binds the corresponding symbol value, it is further determined whether the parameter input by the mathematical function is of a floating point type, if so, the engine can obtain the output range of the function according to the support of the floating point part in the symbol execution engine and the range of the input value, and bind the output range to the symbol value of the mathematical function, thereby realizing the binding of the value range of the mathematical function, that is, the binding of the symbol value of the mathematical function and the output range of the function. If it is determined that the parameters entered by the mathematical function are not of the floating point type, such as of the integer type, then the symbolization and anomaly detection processes may be based on existing processes of the integer type.

More specifically, when the binding of the symbol value of the mathematical function and the output range of the function is performed, the input mathematical function is classified based on the output attribute (such as monotonicity, periodicity, parity, asymptotics, extremum and/or identity) of the mathematical function value, and the output value range corresponding to the current input parameter is calculated based on different classification results, that is, different constraint ranges are fixed according to different classification results. For example, for a monotonicity function, the output value is monotonicity increase or monotonicity decrease, and at this time, the maximum value of the parameter input value corresponds to the maximum value of the output value, and the minimum value of the parameter input value corresponds to the minimum value of the output value, so that a value range (or called value range interval) can be obtained based on the maximum value and the minimum value of the input variable of the mathematical function; for periodic functions (e.g., some trigonometric functions with upper and lower bounds such as sinx, cosx, etc.), the output values have maxima and minima and are periodic distributions, whereby the range (or range interval) of the mathematical function can be determined based on the maxima and minima. After determining the value range of the mathematical function, the value range can be used as a constraint range to be bound with the symbol value of the mathematical function.

After binding of the value range with the sign value of the mathematical function is completed, a new state of the node is generated based on the bound value range, and the generated new state is added to the expansion map, so that the subsequent detector can perform based on the updated expansion map when performing anomaly detection.

The range of variables for each control flow node is further calculated during the traversal of the path based on branching conditions encountered in the program, looping conditions, etc. (where there are limits to the variables). In some embodiments of the present invention, when a new range is obtained by performing an operation (basic operation, comparison operation, logic operation, etc.) on a variable subsequently, a section operation mode constructed by a section operation policy in an abstract domain of non-relational values may be further utilized, and an output value domain section after operation is generated according to the section of the variable, so as to obtain a new range of the variable under different operations (addition, subtraction, multiplication, division (basic operation), comparison (comparison operation), nor (logic operation), etc.), as the output value domain range of the current operation, and update the expansion graph accordingly. In the embodiment of the invention, because the new range of the variable under different operations can be obtained based on the interval operation strategy and the expansion diagram can be updated, when the range of one expression is collected, the range of the related variable can be reversely updated through the assignment expression and the function call, thereby enabling the range collection to be more accurate and greatly reducing false alarm.

Specifically, the non-relational value abstract field is an abstract representation of numerical information used to describe a program variable. It abstracts each variable into a single value range without regard to the relationships between the variables. The most typical non-relational numerical abstract domain is interval. The invention introduces a section operation theory and a section constraint propagation theory in a non-relation numerical value abstract domain for a program, and constructs a section operation mode (such as basic operation, comparison operation and logic operation), so that an engine can apply a section operation rule when collecting the range of variables in a program state, and returns a relatively accurate result range to perform more accurate collection. In a simplest example, the range of the variable x is [ x1, x2], and the range of the variable z is [ z1, z2], and the range of the output value range corresponding to the operation x+z based on the interval operation rule should be [ x1+z1, x2+z2]. Under the condition that a section operation strategy in a non-relational numerical abstract domain is added in a core engine, the propagation, combination and calculation of the range can be operated in a section mode, the constraint can be more accurately collected, and the accuracy of the numerical software defect detection device is improved.

Step S140, performing constraint solving by using a first constraint solver based on a range based on the constraint range of the symbol value or the expression to explore the reachable paths in the program under test, and verifying the constraint solving result of the first constraint solver based on the range by using a second constraint solver, and exploring the reachable paths in the program under test by a defect detector based on the verified constraint solving result to obtain a floating point defect detection result.

The symbol execution engine symbolically executes a program to be tested, acquires constraint ranges of variables, expressions and the like, acquires semantic information in the program, explores each reachable path in the program by using a constraint solver, checks the program state by using a detector mounted on the engine based on the generated and real-time updated expansion diagram and combined with a defect rule defined in the detector, determines the generation position of a defect generation condition triggering defect condition, and generates a defect report.

In this step, the first constraint solver is, by way of example, a constraint solver that solves faster, such as a range-based constraint solver. The second constraint solver is a constraint solver with slower solution but higher solution accuracy, for example, an SMT solver. The existing Clang static analyzer only adopts a range-based constraint solver, and only adopts a range constraint solver, so that the range constraint solver is quick, has few missed reports, but has a great number of false reports. Frama-c adopts Eva abstract interpretation engine for abstract execution analysis, and has a speed about 5 times slower than that of the range constraint solver, and more false positives. Fpse-student uses only an SMT solver, which is more accurate, but is about 100 times slower than the range constraint solver. Based on the method, the method modifies the solving strategy by using a hybrid solving mode, namely, for the final range on the obtained path, a solver with higher solving speed (such as a range solver) is used for solving, and then the solved result is sent to the SMT solver for verification, namely, under the condition that the constraint solving result of the first constraint solver triggers a defect generating condition, the constraint solving result of the first constraint solver is verified and updated based on the second constraint solver, so that the respective advantages of the two solvers can be utilized, the accuracy is ensured, the constraint solving performance is improved, the time cost is greatly reduced, and the false alarm is greatly reduced. Based on the final solving result, the engine records the collected new state information and the symbol range, generates an expansion graph and continuously updates the expansion graph. According to the expansion diagram and the defect definition, if a potential position which possibly triggers a defect condition is encountered, the program state is checked and reported by combining with a detector mounted on an engine, and finally a defect report is output. If no defect is encountered, the exploration program continues.

As can be seen from the steps of the method, the numerical software defect detection method can solve the problem that the existing static analysis tool is limited by floating point conditions and constraints, so that a floating point path can be explored, floating point exceptions can be detected, the path where a mathematical function is called can be explored, and floating point exceptions caused by the mathematical function can be detected, thereby greatly reducing missing report. Meanwhile, the mathematical functions are classified, different constraint ranges are fixed according to different classifications, the constraint and the collected variable ranges are limited and accurate, the collection is accurate, the floating point path and the path where the mathematical functions are located can be explored, and the path where the mathematical functions are located can be explored very accurately, so that the detection accuracy and the detection capability are improved, and false alarms are reduced. Furthermore, the solution strategy can be modified into a mixed solution mode, so that the time cost can be reduced while the constraint solution performance is improved.

In some embodiments of the invention, before symbolizing the execution of the program under test, the method further comprises: it is determined whether the input is from an external source of contamination, and in the event that the input is determined to be from an external source of contamination, the external source of contamination is modeled to construct a symbol value.

That is, the present invention introduces a stain analysis to track the flow direction of external contaminated data. When input from an external pollution source, such as standard input and output, a file and the like, is included in the program to be tested, pollution source modeling (such as pollution source modeling by using a pollution value modeling module) is performed, corresponding symbol values are constructed, and the symbol execution engine is enabled to know and process the symbol values. Stain analysis is a procedural analysis technique that tracks data propagation. One of the goals of the taint analysis is to identify the data flow from an attacker-controlled data source to an uncleaned security-sensitive receiver.

By introducing the stain analysis, the flow direction of external pollution data can be tracked, and the accuracy of the device is improved. Specifically, the stain source identification is performed first by using the stain identification module. And then tracking stain propagation, generating intra-process and inter-process data streams by using a data stream analysis mode, and analyzing the intra-process and inter-process data streams to avoid multiple analysis on the same callee. A smear propagation path is generated from the smear source to the smear anchor, tracking the external contaminated data flow direction, helping the engine find potential locations where errors may be triggered. In this case, the analysis of the intra-process data flow is understood to be an analysis of the program execution flow within a function, while the analysis of the inter-process data flow is generally referred to as an analysis of the call relationships of the function.

As shown in fig. 3, the numerical software defect detection tool of the present invention may include a control flow construction module, a detector module, a pollution value modeling module, a mathematical function modeling module, and a symbol execution engine, wherein the control flow construction module is used for analyzing a syntax tree based on a program to be tested and constructing a control flow; the detector includes a controller for setting defect generation conditions based on definitions of defects; the pollution value modeling module is used for detecting pollution sources and constructing stain symbol values, and the mathematical function modeling module is used for modeling mathematical functions so that the symbol execution engine can identify the mathematical functions. In the symbol execution engine, a symbol value and symbol expression module is used for constructing symbol values of variables and expressions; the memory management module is used for managing memory representations supporting integer and floating point types in the memory; the constraint solving module is used for carrying out constraint solving based on the mixed constraint solving mode and providing a solving result to the detector module so as to find defects through pattern matching by the detector module. According to the embodiment of the invention, the value range output range of the mathematical function can be more accurately determined based on the classification of the mathematical function, and the value range output interval range of subsequent operation can be more accurately determined based on the interval operation strategy, so that false alarm and missing report of defects are reduced.

In alternative embodiments of the invention, the pollution value modeling module and the mathematical function modeling module may be provided in a symbol execution engine.

The more detailed flow of the numerical software defect detection method of the present invention is shown in fig. 4. As shown in fig. 4, first, after inputting a program to be tested, an abstract syntax tree and a control flow graph of the program to be tested are constructed; then, a defect detector constructed according to the definition of the defect is mounted on the symbol execution engine and placed in a control flow inlet of the program to be tested. Then traversing the branch path of the control flow graph based on the path sensitive mode, and symbolically executing the program to be tested by the engine. When traversing the control flow graph, firstly, performing taint tracking, namely determining whether input comes from an external pollution source (such as content or file input through a standard input-output mode), modeling the pollution source under the condition of determining the external pollution source to construct a symbol value of the pollution source, so that the pollution source can be identified, the taint propagation can be tracked, a taint propagation path from the taint source to a taint anchor is generated, and in-process and inter-process data flows are generated by using a data flow analysis mode. Further, determining whether the input (pollution source or non-pollution source) is a mathematical function, and if so, constructing a mathematical function symbol value through mathematical function identification; if the input is not a mathematical function or after the sign value of the data function is constructed, the next step is carried out, namely whether the current input is of a floating point type is determined, and if the current input is of the floating point type, floating point variables and expressions are constructed; for the mathematical function, determining an accurate value range of the mathematical function according to the classification of the mathematical function based on the mathematical function output attribute, and binding the mathematical function and the value range; whether the current input is of a floating point type or not, the variable range of each control flow node in the traversal process of the control flow graph is calculated according to branch conditions or loop conditions and the like. When the variable is operated, a new range can be obtained based on an interval operation strategy, and an accurate constraint solving result is obtained by using mixed constraint solving based on the new range, namely, firstly, an reachable path in a program is explored by using a constraint solver based on the range to obtain a defect triggering result, and then, the defect triggering result obtained by using an SMT constraint solver based on the range is verified to obtain a more accurate solving result. By updating the expansion map recorded with the new state information and the symbol range in real time, the accurate position of the defect can be found, and thus a defect report is output.

The numerical software defect detection method based on static sign execution can solve the problem that the existing static analysis tool is limited by floating point conditions and constraints, so that a floating point path can be explored, and floating point abnormality can be detected. The path of the mathematical function call can be explored, and floating point exceptions caused by the mathematical function can be detected. At the same time, new strategies are proposed that enable not only floating point paths and paths in which mathematical functions reside to be explored, but also very accurately (i.e., constraint and collection of variable ranges are more accurate, collection more accurate). The accuracy and the detection capability of the device are improved. The method of the invention can also improve constraint solving performance and reduce time expenditure.

Correspondingly, the invention also provides a numerical software defect detection device based on static sign execution, which comprises: a processor and a memory, said memory having stored therein computer instructions for executing the computer instructions stored in said memory, the apparatus implementing the steps of the method as described above when said computer instructions are executed by the processor.

Furthermore, the present invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the method as described above.

Those of ordinary skill in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein can be implemented as hardware, software, or a combination of both. The particular implementation is hardware or software dependent on the specific application of the solution and the design constraints. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave.

It should be understood that the invention is not limited to the particular arrangements and instrumentality described above and shown in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. The method processes of the present invention are not limited to the specific steps described and shown, but various changes, modifications and additions, or the order between steps may be made by those skilled in the art after appreciating the spirit of the present invention.

In this disclosure, features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, and various modifications and variations can be made to the embodiments of the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A numerical software defect detection method based on static symbol execution, the method comprising:

inputting a program to be tested, analyzing an abstract syntax tree of the program to be tested, and constructing a control flow graph;

Loading a defect detector into a symbolic execution engine and placing the defect detector into a control flow inlet of the program to be tested, wherein the defect detector contains defect generation conditions set based on defect definition, the defect generation conditions comprise floating point defect generation conditions set based on floating point defect definition, the symbolic execution engine contains a part supporting a floating point type, the part supporting the floating point type comprises a symbolic value representation supporting the floating point type, an expression representation supporting the floating point type operation, a memory representation supporting the floating point type, a constraint solver supporting the floating point type and a mathematical function modeling module, and the mathematical function modeling module is used for modeling a mathematical function so that the data function can be identified by the symbolic execution engine;

Using a first constraint solver based on a range to search for an reachable path in the program to be detected by using a constraint range based on a symbol value or an expression to perform constraint solving, using a second constraint solver to verify a constraint solving result of the first constraint solver based on the range, and searching for the reachable path in the program to be detected by a defect detector based on the verified constraint solving result to obtain a floating point defect detection result;

Classifying the input mathematical function based on the function output attribute, and calculating the output value range corresponding to the input parameter based on the classification result comprises the following steps: classifying the mathematical function based on monotonicity, periodicity, parity, asymptotics, extremum and/or identity of the mathematical function, and determining an output value range of the input parameter as a constraint range according to the classification result;

the second constraint solver is an SMT constraint solver.

2. The method of claim 1, wherein the signed value representation supporting a floating point type includes signed value representations of a floating point constant, a floating point variable, and a floating point pointer;

the expression representations supporting floating point type operations include those of basic operations, assignment operations, logical operations, mathematical operations, comparison operations, and rounding operations;

the memory representation supporting the floating point type includes: memory representations of floating point constants, floating point variables, and floating point pointers.

3. The method of claim 1, wherein prior to symbolizing execution of the program under test, the method further comprises:

determining whether the input is from an external pollution source, and modeling the external pollution source to construct a symbol value corresponding to the pollution source if the input is determined to be from the external pollution source.

4. A method according to claim 3, characterized in that the method further comprises:

And tracking the smear propagation, generating a smear propagation path from a smear source to a smear anchor, and generating intra-process and inter-process data streams by using a data stream analysis mode.

5. The method according to claim 1, wherein the method further comprises:

When the symbol execution engine operates the variable, an interval operation mode constructed based on an interval operation strategy is adopted, and an operated output value range interval is generated according to the interval of the variable and is used as the current operated output value range.

6. The method of claim 1, wherein exploring reachable paths in the program under test based on the validated constraint solving result to obtain floating point defect detection results comprises:

Updating the expansion diagram based on the constraint solving result, determining the generation position of the defect triggering condition of the defect generating condition, and generating a defect report;

The verifying the constraint solving result of the range-based first constraint solver using the second constraint solver comprises: and under the condition that the constraint solving result of the first constraint solver triggers the defect generating condition, verifying the constraint solving result of the first constraint solver based on the second constraint solver and updating the constraint solving result.

7. A numerical software defect detection device based on static symbolic execution, comprising a processor and a memory, characterized in that the memory has stored therein computer instructions for executing the computer instructions stored in the memory, which device, when executed by the processor, implements the steps of the method according to any of claims 1 to 6.

8. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 6.