WO2016085272A1

WO2016085272A1 - Method for reducing false alarms in detecting source code error, computer program therefor, recording medium thereof

Info

Publication number: WO2016085272A1
Application number: PCT/KR2015/012791
Authority: WO
Inventors: 윤종원; 진민식
Original assignee: 주식회사 파수닷컴
Priority date: 2014-11-28
Filing date: 2015-11-26
Publication date: 2016-06-02

Abstract

The present invention relates to a method for reducing false alarms in detecting a source code error, a computer program therefor, and a recording medium thereof. The method for reducing false alarms, according to one aspect of the present invention, is executed on a false alarm reduction device connected to a static analyzer and is for reducing false alarms among error detection alarms occurring on the static analyzer, and disclosed is a method for reducing false alarms in detecting a source code error, the method comprising the steps of: 1) receiving input on information on error detection alarms that occurred, the information being information on alarm type, information on alarm path, and information on the source code which is the subject of the alarm, wherein the information on alarm type is information on what type, among pre-set alarm types, the error detection alarms that occurred correspond to, and the information on alarm path is information on the execution path, among execution paths of the source code, related to the error detection alarms that occurred; 2) converting the source code into an abstract syntax tree (AST); 3) removing, from the AST, unnecessary subtrees that are not related to the error detection alarms; 4) on the basis of a feature pattern set that is pre-set with respect to the alarm type of the error detection alarms, obtaining a feature vector on the AST having unnecessary subtrees removed therefrom; and 5) classifying false alarms by inputting the obtained feature vector in a classifier that has learned false alarm classification information beforehand.

Description

False alarm reduction method in error detection of source code, computer program therefor, recording medium thereof

The present invention relates to a method for reducing false alarms in error detection of a source code, a computer program for the same, and a recording medium thereof, despite the fact that there is no error in the analysis target source code in error detection of the source code using a static analyzer. In the error detection method of the source code to reduce the false alarm which misjudges that an error exists and to prevent the waste of resources for review and processing of the false alarm, a method for reducing the false alarm, a computer program therefor, and a recording medium thereof It is about.

Static analyzers are widely used to detect potential bugs or vulnerabilities in source code. The static analyzer detects a predefined error for each checker by executing checkers for each function, and generates an alarm message for determining that the error is detected.

However, in the analysis process of the static analyzer, it is a fact that the determination of an error is not always made correctly. As a result, an error is falsely determined as an error exists even though an error does not exist in the target source code. There was a problem occurring.

In general, when an error detection alarm of a source code is generated, various resources have to be invested in reviewing and dealing with errors. Thus, such a false alarm has been a waste of various resources during program development and inspection.

The present invention has been made in view of the above-described conventional problems, and in the error detection of the source code using a static analyzer, even though there is no error in the source code to be analyzed, the false alarm (wrongly determined to exist) To provide a false alarm reduction method, a computer program, and a recording medium thereof for detecting errors in the source code that can reduce false alarms and prevent waste of resources for reviewing and processing false alarms. do.

According to an aspect of the present invention for achieving the above object, a method for reducing the false alarm of the error detection alarm that is executed in the static alarm analysis device interlocked with the static analyzer, 1) generated error detection Receiving alarm type information, alarm path information, and source code information targeted for an alarm, wherein the alarm type information is set to any one of preset alarm types. Information corresponding to the corresponding information, wherein the alarm path information is information about an execution path associated with the generated error detection alarm among execution paths of source code; 2) converting the source code into an abstract syntax tree (AST); 3) removing an unnecessary subtree not associated with the error detection alert from the abstract syntax tree; 4) obtaining a feature vector for the abstract syntax tree from which unnecessary subtrees have been removed based on a set of feature patterns preset for the alert type of the error detection alert; And 5) classifying false alarms by inputting the obtained feature vector into a classifier in which false alarm classification information has been learned in advance.

Preferably, the present invention is characterized in that it further comprises the step 6) deleting the error detection alarm classified as a false alarm in step 5) from the error detection alarm target of the static analyzer in the corresponding alarm type. .

Preferably, in step 3), the removal of the unnecessary subtree comprises a first policy for removing general syntax other than the syntax executed on the execution path associated with the error detection alert, and associated with the error detection alert. A second policy for removing branch statements other than those executed on the execution path, provided that the execution path associated with the error detection alert includes the result of condition determination of the branch statement; A third policy for removing loops other than loops, a fourth policy including a function called on an execution path associated with an error detection alert, and an execution path of the function as a subtree of a node invoking the function, and error detection Is based on at least one of the fifth policies for removing declarations that are not relevant to the path of execution associated with the alert. It features.

Preferably, in the step 4), the feature pattern set is configured in the form of a set of n feature patterns preset for a specific alert type of the error detection alert, wherein the feature pattern includes: conditional statement generation, loop statement generation, Occurrence of return statement, occurrence of break or continue statement, occurrence of exit or assert method invocation, occurrence of null expression, comparison with null value (comparisons with a null value, null assignments, or statements that return a null value.

Preferably, in step 4), the process of obtaining the feature vector V _j (R) for the abstract syntax tree from which the unnecessary subtree is removed may be performed. j) defining a feature pattern set P composed of a set form of n feature patterns p, as shown in Equation 1 below;

[Equation 1]

P = {p ₁ , p ₂ , ..., p _n }

402) defining an n-dimensional pattern satisfaction vector v (P, d) for any node d on the abstract syntax tree as shown in Equation 2 below;

[Equation 2]

(Where S (d, p _i ) is a factor indicating whether an arbitrary node d or a subtree rooted at node d matches the i th feature pattern p _i , and is defined as in Equation 3 below. , i th feature pattern (p _i ) can be a single node or a subtree

[Equation 3]

)

403) defining a feature vector V (P, D) for any node D on the abstract syntax tree, as shown in Equation 4 below; And

[Equation 4]

(Where d ₁ , ..., d _m are children of any node D,

V (P, d ₁ ) ... V (P, d _m ) is a feature vector obtained through Equation 4 for the child nodes d ₁ , ..., d _m ,

v (P, D) is an n-dimensional pattern satisfaction vector for any node D)

404) using Equation 5 below, obtaining a feature vector V _j (R) for the abstract syntax tree from which the unnecessary subtree is removed.

[Equation 5]

R is a root node of the abstract syntax tree from which the unnecessary subtree is removed and corresponds to node D of Equation 4,

j is a factor indicating an alarm type for an error detection alarm that has occurred)

Preferably, the classifier of step 5) is a support vector machine (SVM), characterized in that it is generated for each alarm type for the error detection alert.

According to yet another aspect of the present invention, a computer program stored in a medium to be combined with hardware to execute a method for reducing false alarms in error detection of the source code is disclosed.

According to yet another aspect of the present invention, a computer-readable recording medium having a computer program recorded therein for executing a method for reducing false alarms in a computer in detecting an error of the source code is disclosed.

According to the present invention, in the error detection of the source code using the static analyzer, even if there is no error in the analysis target source code, it is possible to reduce the false alarm (false alarm) which is wrongly determined that the error exists, Accordingly, there is an advantage that it is possible to prevent the waste of resources required for the review and processing of false alarms.

In particular, the present invention has the advantage of classifying false alarms using structural features of a source code in which an error alarm is generated, thereby improving accuracy of false alarm classification.

1 is a conceptual diagram illustrating a method for reducing false alarm in error detection of source code according to an exemplary embodiment of the present invention.

The present invention can be embodied in many other forms without departing from the spirit or main features thereof. Therefore, the embodiments of the present invention are merely examples in all respects and should not be interpreted limitedly.

Terms such as first and second may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component. The term and / or includes a combination of a plurality of related items or any item of a plurality of related items.

When a component is referred to as being "connected" or "connected" to another component, it may be directly connected to or connected to that other component, but it may be understood that other components may be present in between. Should be. On the other hand, when a component is said to be "directly connected" or "directly connected" to another component, it should be understood that there is no other component in between.

The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting of the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this application, the terms "comprise", "comprise", "have", and the like are intended to indicate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification. Or other features or numbers, steps, operations, components, parts or combinations thereof in any way should not be excluded in advance.

Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in the commonly used dictionaries should be construed as having meanings consistent with the meanings in the context of the related art, and are not construed in ideal or excessively formal meanings unless expressly defined in this application. Do not.

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings, and the same or corresponding components will be denoted by the same reference numerals regardless of the reference numerals and redundant description thereof will be omitted. In the following description of the present invention, if it is determined that the detailed description of the related known technology may obscure the gist of the present invention, the detailed description thereof will be omitted.

The invention of the present embodiment is executed in a false alarm reduction device interoperating with a static analyzer, and is a method for reducing false alarms among error detection alarms generated in the static analyzer.

For example, the false alarm reducing apparatus may be understood as a computing means or a functional module for executing the false alarm reducing method, and may be implemented in the form of an interlocking module or an internal module in the static analyzer.

The false alarm reduction apparatus of this embodiment is interlocked with various known static analyzers. For example, the static analyzer has been known a variety of commercial products of the grammar-based (Syntactic) analysis or semantic analysis method, the detailed description thereof will be omitted.

In step S1, the false alarm reducing device receives alarm type information, alarm path information, and source code information targeted for an alarm regarding an error detection alarm that has occurred. For example, the input may be based on an input request of a false alarm reducing device or a setting of a static analyzer, and a static analyzer linked to the false alarm reducing device may provide the respective information to the false alarm reducing device, or in another example, The recorded file can be made by the user inputting the false alarm reduction device to be readable.

The alarm type information is information on which type of an error detection alarm has occurred which corresponds to a preset alarm type, and the alarm path information is information on an execution path related to the error detection alarm which has occurred in an execution path of source code.

The static analyzer detects whether an analysis target source code is error based on a predetermined criterion for each checker by executing checkers for each function, and generates a specific alarm for determining that an error is detected.

At this time, the alarm is made by outputting a specific alarm message preset to the checker with respect to the detected error. Each alert message is basically dependent on a specific checker, and generally does not reflect the nature or form of the source code, or features on the execution path.

One 'alarm type' has the same alarm message. For example, one checker generates one alert message and can be understood to define one 'alarm type'.

For example, the 'alarm type' may include 'general null dereference', 'dereferencing of an unchecked null value', and 'dereferencing of a returned null value', and the like, and various alarm types may be set.

For example, a feature pattern set to be described below is defined for each alarm type.

In operation S2, the false alarm reduction apparatus converts the source code into an abstract syntax tree (AST).

An abstract syntax tree (AST) is a tree of abstract syntax structures in source code written in a programming language, where each node represents a structure generated from the source code. Detailed concepts of the abstract syntax tree may be understood through a number of known materials, and thus detailed descriptions thereof will be omitted.

In step S3, the false alarm reduction apparatus removes an unnecessary subtree not associated with the error detection alert from the abstract syntax tree.

Elimination of unnecessary subtrees can be accomplished by conventional rule-based techniques.

For example, the removal of the unnecessary subtree may include a first policy for removing general syntax other than the syntax executed on the execution path associated with the error detection alert, and a branch executed on the execution path associated with the error detection alert. A second policy for removing branch statements other than the statement, provided that the execution path associated with the error detection alert includes the result of condition determination of the branch statement; A third policy to remove, a fourth policy including a function called on an execution path associated with an error detection alert and an execution path of the function as a subtree of a node invoking the function, an execution path associated with an error detection alert, and It is based on at least one of the fifth policies for removing extraneous statements.

In step S4, the false alarm reducing apparatus obtains a feature vector for the abstract syntax tree from which an unnecessary subtree is removed, based on a set of feature patterns preset for the alarm type of the error detection alert.

The abstract syntax tree (AST) obtained from the source code is too large and complex to use as input data for classification of false alarms through a classifier. When the feature vector is obtained and the classifier learning using the abstract syntax tree obtained from the source code as in the present embodiment, it is possible to reduce the learning time and improve the performance.

Preferably, the feature pattern set is configured in the form of a set of n feature patterns preset for a specific alert type of the error detection alert, and the feature pattern includes: condition statement generation, loop statement generation, return statement generation, and break ( break or continue statement occurrences, exit or assert method invocations, null expressions, comparisons with a null value, Null assignments can occur, or statements that return null values can occur. This is summarized as follows.

구분division	피처 패턴Feature pattern
1One	조건문 발생 Conditional statement occurrence
22	루프문 발생Loop statement occurrence
33	리턴문 발생Return statement occurs
44	브레이크(break) 또는 컨티뉴(continue)문 발생Break or continue statement occurs
55	엑시트(exit) 또는 어서트(assert) 메소드 호출(method invocation) 발생Exit or assert method invocation
66	널 표현(null expression) 발생Null expression occurs
77	널 값과의 비교(comparisons with a null value) 발생Encounters with a null value
88	널 할당(null assignments) 발생Null assignments occur
99	널 값을 리턴하는 문(statements)의 발생The occurrence of a statement that returns a null value

Preferably, in step S4, a process of obtaining a feature vector V _j (R) for the abstract syntax tree from which the unnecessary subtree is removed is performed as follows.

In operation S401, the false alarm reducing apparatus defines a feature pattern set P configured in the form of a set of n feature patterns p, as shown in Equation 1 below, for the alert type j of the error detection alert. .

[Equation 1]

P = {p ₁ , p ₂ , ..., p _n }

In operation S402, the false alarm reducing apparatus defines an n-dimensional pattern satisfaction vector v (P, d) for any node d on the abstract syntax tree, as shown in Equation 2 below.

[Equation 2]

[Equation 3]

)

In operation S403, the false alarm reduction apparatus defines a feature vector V (P, D) for any node D on the abstract syntax tree, as shown in Equation 4 below.

[Equation 4]

(Where d ₁ , ..., d _m are children of any node D,

v (P, D) is an n-dimensional pattern satisfaction vector for any node D)

In operation S404, the apparatus for reducing false alarms obtains a feature vector V _j (R) for the abstract syntax tree from which the unnecessary subtree is removed using Equation 5 below.

[Equation 5]

Equation 5 may be understood as an arbitrary node D of Equation 4 obtained by inputting a root node R of the abstract syntax tree from which an unnecessary subtree is removed.

In step S5, the false alarm reducing device classifies the false alarm by inputting the obtained feature vector into a classifier in which false alarm classification information has been learned in advance. To this end, in step S5, the feature vector (V _j (R)) obtained in the step S404 is input to the classifier that false alarm classification information has been learned in advance to classify the false alarm.

Learning of the classifier corresponds to, for example, a specific alarm type for which a classifier is to be generated, and inputs a feature vector of a plurality of learning error detection alerts in advance to the classifier that knows whether there is a false alarm / normal alert, and the learning error detection alert The classifier may be trained based on a known machine learning technique based on information on whether a false alarm / normal alarm is detected and information on a feature pattern thereof. In step S101 of FIG. 1, a classifier learning is performed for each alarm type. For example, inputting the feature vector to the classifier in advance may be performed by inputting a source code that knows whether there is a false alarm / normal alarm and an alarm type for the source code.

Preferably, the classifier is a support vector machine (SVM) classifier, and the SVM classifier is generated for each alarm type related to an error detection alert. A mapping transformation process using a kernel function may be included to classify the nonlinear feature vectors.

For example, the SVM (Support Vector Machine) classifier according to the present embodiment may be configured as follows. Corresponding to k alarm types, k classifier models M ₁ , M ₂ ,..., M _k can be generated.

The SVM classifier maps the i-th type of input data V _i (R _i1 ), ..., V _i (R _in ) from the input space X into the high-dimensional feature space F, and the maximum margin ( margin) Returns the hyperplane.

[Equation 6]

[Equation 7]

(Where b and a _i are parameters for determining the hyperplane,

y _j is a label of the j th training data V _i (R _ij ) and has a value of either '1' (true) or '-1' (false),

K is a kernel function to simplify nonlinear mapping,

Is a mapping function)

Equation 7 shows an optimal hyperplane corresponding to the model M _i of the i-th type.

This classifier configuration allows the model M _j to be used when a new alarm of type j is entered.

In one example, the training of the model for classifying alerts may be applied to known techniques including LIBSVM. The configuration of LIBSVM is described in "LIBSVM: A Library for Support Vector Machines," (ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, pp. 1-39, 2011., C. Chang and C. Lin), etc. Can be referenced through.

The classification result by the classifier is, for example, a classification result of the SVM classifier, a label having a value of '1' when classified as a normal alarm instead of a false alarm, and a value having a value of '-1' when classified as a false alarm. This can be done by printing

Meanwhile, as another example, a known machine learning classifier implemented in a manner other than the SVM classifier may be used.

In step S6, the false alarm reducing device deletes the error detection alarm classified as false alarm in step S5 from the error detection alarm object of the static analyzer of the corresponding alarm type. Preferably, such deletion processing may be automatically performed based on a preset criterion in the false alarm reduction apparatus. As another example, a user may check an error detection alarm classified as a false alarm and receive a deletion command.

Through this, an error detection alarm classified as a false alarm may be prevented from occurring during a source code analysis process of the static analyzer after the deletion, and as a result, a false alarm may be reduced.

In the final deletion process as described above, it may be configured to further provide a probability information value for the result value together with the classification result value, and a threshold θ may be set in the probability information value. In this way, more validity can be given to the final deletion classification result.

For example, when a specific error detection alarm is classified as a false alarm, it may be configured to be determined as a real false alarm only when the probability information value provided with the classification result value is higher than a preset threshold.

Increasing the threshold increases the accuracy of false alarm classifications, but may increase the chance of false alarms not being classified as false alarms, and lowering the threshold increases the probability of a normal alarm being classified as a false alarm. It is desirable to set the threshold appropriately in consideration of various situations such as the number of alarms and the trade-off of accuracy.

Probability information values are provided in "Probability Estimates for Multi-class Classification by Pairwise Coupling," The Journal of Machine Learning Research, vol. 5, pp. 975-1005, 2004., T. Wu et al. It can be achieved through known techniques, including.

The false alarm reduction method according to the present embodiment was tested for the alarm generated from the commercial static analyzer (SPARROW) proposed by the applicant. Test subject alerts were provided from 10 Java open source code. Each source code consists of 213 to 3,398 files and 15,037 to 292,967 lines. A total of 265 alerts were obtained from these source codes.

In the experiment, three 'alarm types' were used: 'general null dereference', 'dereferencing of an unchecked null value' and 'dereferencing of a returned null value'.

As the feature pattern, the feature patterns illustrated in Table 1 above were used.

As a result of the experiment, by the method of reducing false alarms of this example, 37.33% of false alarms could be found and removed at 50% threshold.

Embodiments of the present invention include a program for performing various computer-implemented operations and a computer readable medium recording the same. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer readable recording media include magnetic media such as hard disks, floppy disks and magnetic tape, optical recording media such as CD-ROM, DVD, USB drives, magnetic-optical media such as floppy disks, and ROM, RAM, Hardware devices specifically configured to store and execute program instructions, such as flash memory, are included. The medium may be a transmission medium such as an optical or metal wire, a waveguide, or the like including a carrier wave for transmitting a signal specifying a program command, a data structure, or the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like.

Claims

As a method for reducing false alarms among error detection alarms generated by static analyzers,

1) receiving alarm type information, alarm path information, and source code information targeted for an alarm, in which an error detection alarm has occurred. Information regarding which type corresponds to a type, wherein the alarm path information is information on an execution path associated with the generated error detection alarm in an execution path of source code;

2) converting the source code into an abstract syntax tree (AST);

3) removing an unnecessary subtree not associated with the error detection alert from the abstract syntax tree;

4) obtaining a feature vector for the abstract syntax tree from which unnecessary subtrees have been removed based on a set of feature patterns preset for the alert type of the error detection alert; And

5) classifying false alarms by inputting the obtained feature vector into a classifier for which false alarm classification information has been learned in advance. 5.
The method of claim 1,

And 6) deleting the error detection alarm classified as a false alarm in step 5) from the error detection alarm object of the static analyzer of the corresponding alarm type. How to reduce false alarms.
The method of claim 1,

In the step 3), the removal of the unnecessary subtree (unnecessary subtree),

A first policy for removing general syntax other than syntax executed on an execution path associated with an error detection alert;

A second policy for removing a branch statement other than the branch statement executed on the execution path associated with the error detection alert, provided that the execution path related to the error detection alert includes the result of condition determination of the branch statement,

A third policy for removing loops other than loops executed on the execution path associated with the error detection alert;

A fourth policy comprising a function called on an execution path associated with an error detection alert and a execution path of the function as a subtree of a node invoking the function;

A method for reducing false alarms in source code error detection, characterized in that based on at least one of the fifth policies for removing a statement irrelevant to an execution path associated with an error detection alarm.
The method of claim 1,

In the step 4), the feature pattern set,

Consists of a set of n feature patterns preset for the specific alarm type of the error detection alarm,

The feature pattern may include a conditional statement, a loop statement, a return statement, a break or continue statement, an exit or assert method invocation, and a null expression. expression detection, occurrences of comparisons with a null value, occurrences of null assignments, or occurrences of statements that return null values. False alarm reduction method in the.
The method of claim 1,

In the step 4), the process of obtaining a feature vector (V j (R)) for the abstract syntax tree from which the unnecessary subtree is removed,

401) for the alarm type (j) of the error detection alert, defining a feature pattern set (P) consisting of a set of n feature patterns (p), as shown in Equation 1 below;

[Equation 1]

P = {p 1 , p 2 , ..., p n }

402) defining an n-dimensional pattern satisfaction vector v (P, d) for any node d on the abstract syntax tree as shown in Equation 2 below;

[Equation 2]

(Where S (d, p i ) is a factor indicating whether an arbitrary node d or a subtree rooted at node d matches the i th feature pattern p i , and is defined as in Equation 3 below. , i th feature pattern (p i ) can be a single node or a subtree

[Equation 3]

)

403) defining a feature vector V (P, D) for any node D on the abstract syntax tree, as shown in Equation 4 below; And

[Equation 4]

(Where d 1 , ..., d m are children of any node D,

V (P, d 1 ) ... V (P, d m ) is a feature vector obtained through Equation 4 for the child nodes d 1 , ..., d m ,

v (P, D) is an n-dimensional pattern satisfaction vector for any node D)

404) using Equation 5 below, obtaining a feature vector V j (R) for the abstract syntax tree from which the unnecessary subtree has been removed; A method for reducing false alarms in detecting errors in codes.

[Equation 5]

R is a root node of the abstract syntax tree from which the unnecessary subtree is removed and corresponds to node D of Equation 4,

j is a factor indicating an alarm type for an error detection alarm that has occurred)
The method of claim 1,

The classifier of step 5) is a support vector machine (SVM), and the false alarm reduction method for error detection of the source code, characterized in that it is generated for each alarm type related to the error detection alarm.
A computer program, coupled with hardware, stored on a medium for executing a method for reducing false alarms in detecting errors in the source code according to any one of claims 1 to 6.
A computer-readable recording medium having recorded thereon a computer program for executing a method for reducing false alarms on a computer in error detection of source code according to any one of claims 1 to 6.