WO2016085273A1

WO2016085273A1 - Method for classifying alarm types in detecting source code error, computer program therefor, recording medium thereof

Info

Publication number: WO2016085273A1
Application number: PCT/KR2015/012792
Authority: WO
Inventors: 윤종원; 진민식
Original assignee: 주식회사 파수닷컴
Priority date: 2014-11-28
Filing date: 2015-11-26
Publication date: 2016-06-02

Abstract

The present invention relates to a method for classifying alarm types in detecting a source code error, a computer program therefor, and a recording medium thereof. The method, according to one aspect of the present invention, is executed on an alarm type classifying device connected to a static analyzer and is for classifying, by type, error detection alarms occurring on the static analyzer, and disclosed is a method for classifying alarm types in detecting a source code error, the method comprising the steps of: 1) receiving input on information on error detection alarms that occurred, the information being information on alarm path and information on the source code which is the subject of the alarm, wherein the information on alarm path is information on the execution path, among execution paths of the source code, related to the error detection alarms that occurred; 2) converting the source code into an abstract syntax tree (AST); 3) removing, from the AST, unnecessary subtrees that are not related to the error detection alarms; 4) on the basis of a pre-set feature pattern set, obtaining a feature vector on the AST having unnecessary subtrees removed therefrom; and 5) classifying, by type, the error detection alarms corresponding to the feature vector by clustering the obtained feature vector by means of a pre-set method.

Description

Method of classifying alarm type in error detection of source code, computer program for it, recording medium thereof

The present invention relates to an alarm type classification method, a computer program, and a recording medium for detecting an error of a source code. The present invention relates to an alarm by automatically classifying and analyzing various types of alarms related to source codes generated from a static analyzer. The present invention relates to a method for classifying an alarm type, a computer program for the same, and a recording medium thereof, in error detection of source code to prevent waste of resources required for type classification.

Static analyzers are widely used to detect potential bugs or vulnerabilities in source code. The static analyzer detects a predefined error for each checker by executing checkers for each function, and generates an alarm message for determining that the error is detected.

When an alarm of a static analyzer is generated, a process of classifying an alarm type may be performed for the purpose of analyzing the accuracy of the generated alarm.

For example, in the analysis process of the static analyzer, it may not always be possible to accurately determine whether an error exists. As a result, a false alarm may be falsely determined as an error exists even though an error does not exist in the target source code. May occur.

In this case, the types of alarms can be classified and analyzed to further analyze or prepare for alarm types that have a high probability of false alarms. This type of classification was conventionally performed by a developer's manual work. Classifiers are a waste of resources.

The present invention has been made in view of the above-mentioned conventional problems, and automatically classifies and analyzes various types of alarms related to source codes generated in a static analyzer, thereby preventing waste of resources required for alarm type classification. It is an object of the present invention to provide a method of classifying an alarm type, a computer program therefor, and a recording medium thereof, in detecting an error of source code.

According to an aspect of the present invention for achieving the above object, it is executed in the alarm type classification device interlocked with the static analyzer, a method for classifying the error detection alarm generated by the static analyzer for each type, 1) detection of the error occurred Receiving alarm path information regarding an alarm and source code information targeted for an alarm, wherein the alarm path information is information on an execution path related to the generated error detection alarm among execution paths of source code; 2) converting the source code into an abstract syntax tree (AST); 3) removing an unnecessary subtree not associated with the error detection alert from the abstract syntax tree; 4) obtaining a feature vector for the abstract syntax tree from which unnecessary subtrees have been removed based on a set of preset feature patterns; And 5) clustering the obtained feature vectors in a preset manner to classify the error detection alerts corresponding to the feature vectors by type. The method for classifying an alert type in error detection of the configured source code is disclosed.

Preferably, the present invention, in step 1), further receives alarm type information (alarm types) for the error detection alarm has occurred-the alarm type information corresponds to any type of the alarm type of the error detection alarm generated in advance Information on whether the feature pattern is set in advance for the alarm type of the error detection alert.

Preferably, in step 3), the removal of the unnecessary subtree comprises a first policy for removing general syntax other than the syntax executed on the execution path associated with the error detection alert, and associated with the error detection alert. A second policy for removing branch statements other than those executed on the execution path, provided that the execution path associated with the error detection alert includes the result of condition determination of the branch statement; A third policy for removing loops other than loops, a fourth policy including a function called on an execution path associated with an error detection alert, and an execution path of the function as a subtree of a node invoking the function, and error detection Is based on at least one of the fifth policies for removing declarations that are not relevant to the path of execution associated with the alert. It features.

Preferably, in step 4), the feature pattern set is configured in the form of a set of n feature patterns, and the feature pattern includes: conditional statement generation, loop statement generation, return statement generation, break or continuation (continue) statement occurrence, exit or assert method invocation, null expression, comparisons with a null value, null assignment assignments) occurrence, or statement generation that returns a null value.

Preferably, in step 4), the process of obtaining the feature vector V (R) for the abstract syntax tree from which the unnecessary subtree is removed is 401) Defining a feature pattern set P configured in the form of a set of feature patterns p;

[Equation 1]

P = {p ₁ , p ₂ , ..., p _n }

402) defining an n-dimensional pattern satisfaction vector v (P, d) for any node d on the abstract syntax tree as shown in Equation 2 below;

[Equation 2]

(Where S (d, p _i ) is a factor indicating whether an arbitrary node d or a subtree rooted at node d matches the i th feature pattern p _i , and is defined as in Equation 3 below. , i th feature pattern (p _i ) can be a single node or a subtree

[Equation 3]

)

403) defining a feature vector V (P, D) for any node D on the abstract syntax tree, as shown in Equation 4 below; And

[Equation 4]

(Where d ₁ , ..., d _m are children of any node D,

V (P, d ₁ ) ... V (P, d _m ) is a feature vector obtained through Equation 4 for the child nodes d ₁ , ..., d _m ,

v (P, D) is an n-dimensional pattern satisfaction vector for any node D)

404) using Equation 5 below, obtaining a feature vector V (R) for the abstract syntax tree from which the unnecessary subtree is removed.

[Equation 5]

(Where R is the root node of the abstract syntax tree from which the unnecessary subtree is removed and corresponds to node D of Equation 4)

Preferably, in step 5), the clustering is characterized by being performed by the K-means algorithm.

According to yet another aspect of the present invention, a computer program is stored in a medium in combination with hardware to execute an alarm type classification method in error detection of the source code.

According to yet another aspect of the present invention, a computer-readable recording medium having a computer program recorded thereon for executing an alarm type classification method in a computer in detecting an error of the source code is disclosed.

According to the present invention, in the error detection of the source code using the static analyzer, there is an advantage that it is possible to classify and analyze the occurrence type of the alarm, and to further analyze or prepare for the alarm type having a high probability of false alarm. .

In particular, according to the present invention, since the type of alarm classification can be executed through an automatic process without performing by a developer's manual work, there is an advantage in that a waste of resources required for alarm type classification can be prevented.

1 is a conceptual diagram illustrating an alarm type classification method in error detection of source code according to an embodiment of the present invention.

The present invention can be embodied in many other forms without departing from the spirit or main features thereof. Therefore, the embodiments of the present invention are merely examples in all respects and should not be interpreted limitedly.

Terms such as first and second may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component. The term and / or includes a combination of a plurality of related items or any item of a plurality of related items.

When a component is referred to as being "connected" or "connected" to another component, it may be directly connected to or connected to that other component, but it may be understood that other components may be present in between. Should be. On the other hand, when a component is said to be "directly connected" or "directly connected" to another component, it should be understood that there is no other component in between.

The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting of the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this application, the terms "comprise", "comprise", "have", and the like are intended to indicate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification. Or other features or numbers, steps, operations, components, parts or combinations thereof in any way should not be excluded in advance.

Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in the commonly used dictionaries should be construed as having meanings consistent with the meanings in the context of the related art, and are not construed in ideal or excessively formal meanings unless expressly defined in this application. Do not.

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings, and the same or corresponding components will be denoted by the same reference numerals regardless of the reference numerals and redundant description thereof will be omitted. In the following description of the present invention, if it is determined that the detailed description of the related known technology may obscure the gist of the present invention, the detailed description thereof will be omitted.

The present invention is executed in an alarm type classification apparatus that interworks with a static analyzer, and is a method for classifying error detection alarms generated by a static analyzer by type.

For example, the alarm type classification device may be understood as a computing means for executing the alarm type classification method or a functional module thereof. The alarm type classification device may be implemented in the static analyzer in the form of an interlocking module or in the form of an internal module.

The alarm type classification apparatus of this embodiment is interlocked with various known static analyzers. For example, the static analyzer has been known a variety of commercial products of the grammar-based (Syntactic) analysis or semantic analysis method, the detailed description thereof will be omitted.

In step S1, the alarm type classification apparatus receives alarm path information regarding the generated error detection alarm and source code information that is the target of the alarm. For example, the input may be based on an input request of an alarm type classification device or a setting of a static analyzer, and a static analyzer linked with the alarm type classification device may provide the respective information to the alarm type classification device, or in another example, The file in which each information is recorded may be made by a user inputting the alarm type classification device to be readable.

The alarm path information is information about an execution path related to the generated error detection alarm among execution paths of source code.

The static analyzer detects whether an analysis target source code is error based on a predetermined criterion for each checker by executing checkers for each function, and generates a specific alarm for determining that an error is detected.

At this time, the alarm is made by outputting a specific alarm message preset to the checker with respect to the detected error. Each alert message is basically dependent on a specific checker, and generally does not reflect the nature or form of the source code, or features on the execution path.

Preferably, in step S1, the alarm type classification apparatus is further received with alarm type information regarding the error detection alarm that has occurred.

One 'alarm type' has the same alarm message. For example, one checker generates one alert message and can be understood to define one 'alarm type'.

For example, the 'alarm type' may include 'general null dereference', 'dereferencing of an unchecked null value', and 'dereferencing of a returned null value', and the like, and various alarm types may be set.

For example, a feature pattern set to be described below is defined for each alarm type.

In step S2, the alarm type classification apparatus converts the source code into an abstract syntax tree (AST).

An abstract syntax tree (AST) is a tree of abstract syntax structures in source code written in a programming language, where each node represents a structure generated from the source code. Detailed concepts of the abstract syntax tree may be understood through a number of known materials, and thus detailed descriptions thereof will be omitted.

In step S3, the alert type classification apparatus removes an unnecessary subtree not associated with the error detection alert from the abstract syntax tree.

Elimination of unnecessary subtrees can be accomplished by conventional rule-based techniques.

For example, the removal of the unnecessary subtree may include a first policy for removing general syntax other than the syntax executed on the execution path associated with the error detection alert, and a branch executed on the execution path associated with the error detection alert. A second policy for removing branch statements other than the statement, provided that the execution path associated with the error detection alert includes the result of condition determination of the branch statement; A third policy to remove, a fourth policy including a function called on an execution path associated with an error detection alert and an execution path of the function as a subtree of a node invoking the function, an execution path associated with an error detection alert, and It is based on at least one of the fifth policies for removing extraneous statements.

In step S4, the alert type classification apparatus obtains the feature vector for the abstract syntax tree from which the unnecessary subtree has been removed based on the preset feature pattern set.

The abstract syntax tree (AST) obtained from the source code is too large and complex to use as input data for classification of alert types through clustering. When the feature vector is obtained and clustered using the abstract syntax tree obtained from the source code as in this embodiment, the clustering processing time is reduced and the resource required for this is obtained.

Preferably, the feature pattern set is configured in the form of a set of n feature patterns preset, wherein the feature pattern is a condition statement generation, loop statement generation, return statement generation, break or continue statement generation, An exit or assert method invocation, a null expression, a comparison with a null value, a null assignments, a null value It can be any one of the occurrences of statements that return. This is summarized as follows.

구분division	피처 패턴Feature pattern
1One	조건문 발생Conditional statement occurrence
22	루프문 발생 Loop statement occurrence
33	리턴문 발생Return statement occurs
44	브레이크(break) 또는 컨티뉴(continue)문 발생Break or continue statement occurs
55	엑시트(exit) 또는 어서트(assert) 메소드 호출(method invocation) 발생Exit or assert method invocation
66	널 표현(null expression) 발생Null expression occurs
77	널 값과의 비교(comparisons with a null value) 발생Encounters with a null value
88	널 할당(null assignments) 발생Null assignments occur
99	널 값을 리턴하는 문(statements)의 발생The occurrence of a statement that returns a null value

In a preferred embodiment, the feature pattern set, i.e., the set of n feature patterns, is preset for each alert type of the error detection alert. In the case of such a setting relationship, the alarm type classification is made within the scope of one specific alarm type.

Preferably, in step S4, a process of obtaining a feature vector V (R) for the abstract syntax tree from which the unnecessary subtree is removed is performed as follows.

In operation S401, the alarm type classification apparatus defines a feature pattern set P configured in the form of a set of n feature patterns p, as shown in Equation 1 below.

[Equation 1]

P = {p ₁ , p ₂ , ..., p _n }

In operation S402, the alarm type classification apparatus defines an n-dimensional pattern satisfaction vector v (P, d) for any node d on the abstract syntax tree, as shown in Equation 2 below.

[Equation 2]

[Equation 3]

)

In operation S403, the apparatus for classifying an alert type defines a feature vector V (P, D) for any node D on the abstract syntax tree, as shown in Equation 4 below.

[Equation 4]

(Where d ₁ , ..., d _m are children of any node D,

v (P, D) is an n-dimensional pattern satisfaction vector for any node D)

In operation S404, the apparatus for classifying an alert type obtains a feature vector V (R) for the abstract syntax tree from which the unnecessary subtree is removed using Equation 5 below.

[Equation 5]

Equation 5 may be understood as an arbitrary node D of Equation 4 obtained by inputting a root node R of the abstract syntax tree from which an unnecessary subtree is removed.

In operation S5, the alarm type classification apparatus may cluster the obtained feature vector V (R) in a preset manner to classify the error detection alert corresponding to the feature vector by type.

As the clustering, a known vector or data clustering technique may be used. For example, a known hierarchical clustering technique or a non-hierarchical clustering technique may be used.

In the present embodiment, as a preferred example, the K-means algorithm in the non-hierarchical clustering method may be used. K-means is a method of finding the centroid of a cluster by minimizing the Euclidean distance between the data (or vector) and the center of the cluster to which the data (or vector) belongs.

Since the K-means algorithm has a simple structure and generally has a fast convergence property, it can be applied as a preferable example in this embodiment. To get more accurate clustering results, you can try several times with different initial values and use the best results, or you can run clustering with the appropriate number of clusters to obtain in advance.

By clustering, feature vectors with high similarity may be classified into the same type, and as a result, respective error detection alerts corresponding to feature vectors classified into the same type may also be classified into the same type of alerts. For example, detailed conditions of similarity that may be classified into the same type may be preset in the alarm type classification device.

This classification of alert types provides developers with several advantages in the analysis of error detection alerts.

For example, when a new error detection alarm occurs, if the error detection alarm is classified as not the same type because of similarity with the error detection alarm classified as a normal alarm, the developer first analyzes the error detection alarm first. May first determine whether or not a false alarm is present.

Embodiments of the present invention include a program for performing various computer-implemented operations and a computer readable medium recording the same. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer readable recording media include magnetic media such as hard disks, floppy disks and magnetic tape, optical recording media such as CD-ROM, DVD, USB drives, magnetic-optical media such as floppy disks, and ROM, RAM, Hardware devices specifically configured to store and execute program instructions, such as flash memory, are included. The medium may be a transmission medium such as an optical or metal wire, a waveguide, or the like including a carrier wave for transmitting a signal specifying a program command, a data structure, or the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like.

Claims

It is executed in the alarm type classification device linked with the static analyzer, and is a method for classifying error detection alarms generated by the static analyzer by type.

1) receiving alarm path information on an error detection alarm that has occurred and source code information that is an object of the alarm; the alarm path information is stored in an execution path related to the error detection alarm that has occurred in a source code execution path; Information about;

2) converting the source code into an abstract syntax tree (AST);

3) removing an unnecessary subtree not associated with the error detection alert from the abstract syntax tree;

4) obtaining a feature vector for the abstract syntax tree from which unnecessary subtrees have been removed based on a set of preset feature patterns; And

5) clustering the obtained feature vectors in a predetermined manner to classify the error detection alerts corresponding to the feature vectors by type. 2.
The method of claim 1,

In step 1) above,

Further receiving alarm type information on an error detection alarm that has occurred, wherein the alarm type information is information on which type of alarm detection alarm corresponds to a preset alarm type;

In step 4),

And the feature pattern set is preset for an alert type of the error detection alert.
The method of claim 1,

In the step 3), the removal of the unnecessary subtree (unnecessary subtree),

A first policy for removing general syntax other than syntax executed on an execution path associated with an error detection alert;

A second policy for removing a branch statement other than the branch statement executed on the execution path associated with the error detection alert, provided that the execution path related to the error detection alert includes the result of condition determination of the branch statement,

A third policy for removing loops other than loops executed on the execution path associated with the error detection alert;

A fourth policy comprising a function called on an execution path associated with an error detection alert and a execution path of the function as a subtree of a node invoking the function;

Method for classifying an alarm type in error detection of the source code, characterized in that based on at least one policy of the fifth policy for removing a statement irrelevant to the execution path associated with the error detection alert.
The method of claim 1,

In the step 4), the feature pattern set,

It consists of a set of n preset feature patterns,

The feature pattern may include a conditional statement occurrence, a loop statement occurrence, a return statement occurrence, a break or continue statement occurrence, an exit or assert method invocation, and a null expression. expression detection, occurrences of comparisons with a null value, occurrences of null assignments, or occurrences of statements that return null values. How to classify alarm types.
The method of claim 1,

In the step 4), the process of obtaining the feature vector V (R) for the abstract syntax tree from which the unnecessary subtree is removed is

401) defining a feature pattern set P composed of a set form of n feature patterns p as shown in Equation 1 below;

[Equation 1]

P = {p 1 , p 2 , ..., p n }

402) defining an n-dimensional pattern satisfaction vector v (P, d) for any node d on the abstract syntax tree as shown in Equation 2 below;

[Equation 2]

(Where S (d, p i ) is a factor indicating whether an arbitrary node d or a subtree rooted at node d matches the i th feature pattern p i , and is defined as in Equation 3 below. , i th feature pattern (p i ) can be a single node or a subtree

[Equation 3]

)

403) defining a feature vector V (P, D) for any node D on the abstract syntax tree, as shown in Equation 4 below; And

[Equation 4]

(Where d 1 , ..., d m are children of any node D,

V (P, d 1 ) ... V (P, d m ) is a feature vector obtained through Equation 4 for the child nodes d 1 , ..., d m ,

v (P, D) is an n-dimensional pattern satisfaction vector for any node D)

404) using Equation 5 below, obtaining a feature vector V (R) for the abstract syntax tree from which the unnecessary subtree has been removed; Method of classifying alarm types in error detection.

[Equation 5]

(Where R is the root node of the abstract syntax tree from which the unnecessary subtree is removed and corresponds to node D of Equation 4)
The method of claim 1,

And in step 5), the clustering is performed by a K-means algorithm.
A computer program, coupled with hardware, stored on a medium for executing an alarm type classification method in the error detection of source code according to any one of the preceding claims.
A computer-readable recording medium having recorded thereon a computer program for executing an alarm type classification method in a computer in detecting an error of a source code according to any one of claims 1 to 6.