CN112214399A - API misuse defect detection system based on sequence pattern matching - Google Patents

API misuse defect detection system based on sequence pattern matching Download PDF

Info

Publication number
CN112214399A
CN112214399A CN202010974385.7A CN202010974385A CN112214399A CN 112214399 A CN112214399 A CN 112214399A CN 202010974385 A CN202010974385 A CN 202010974385A CN 112214399 A CN112214399 A CN 112214399A
Authority
CN
China
Prior art keywords
api
misuse
sequence
defect
mode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010974385.7A
Other languages
Chinese (zh)
Other versions
CN112214399B (en
Inventor
孙文靖
李晓伟
曾杰
贲可荣
苏建敏
洪楠
张清
杨洋
李春静
王赢超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jinghang Computing Communication Research Institute
Original Assignee
Beijing Jinghang Computing Communication Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jinghang Computing Communication Research Institute filed Critical Beijing Jinghang Computing Communication Research Institute
Priority to CN202010974385.7A priority Critical patent/CN112214399B/en
Publication of CN112214399A publication Critical patent/CN112214399A/en
Application granted granted Critical
Publication of CN112214399B publication Critical patent/CN112214399B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3628Software debugging of optimised code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis

Abstract

The invention belongs to the field of software defect detection, and particularly relates to an API misuse defect detection system based on sequence pattern matching. The method does not analyze API using protocols, but according to discovered API misuse defect examples, combines code information before and after defect repair in patch files to depict API misuse modes, and then searches API calling sequences conforming to the misuse modes by utilizing an improved AC algorithm in target software to be tested, so as to detect similar defects. Compared with a detection method based on a protocol, the method effectively avoids the problems that the defect detection accuracy depends on the description accuracy of the protocol, part of API protocols have defects and the like. The accuracy of API misuse defect detection is improved.

Description

API misuse defect detection system based on sequence pattern matching
Technical Field
The invention belongs to the field of software defect detection, and particularly relates to an API misuse defect detection system based on sequence pattern matching.
Background
With the arrival of the big data era, code resources are explosively increased in an open source community, developers share and copy codes with one another to be a normal state, and an Application Programming Interface (API) is used as an access interface of an existing code library or an application framework, so that high-quality code modules are reused. The provider of the API explains how the API is used, via the conventions in the API documentation, such as the Java doc documentation commonly used in Java programming. However, research shows that developers often program without being familiar with the API specification, and when the usage of the API violates the usage rules in the API specification, the API misuse defect is generated when the program cannot be executed correctly.
In order to detect the API misuse defect, the protocol-based detection method describes the API protocol, and checks whether the target system meets the property of the protocol description, thereby judging the violation. The detection method based on the protocols is divided into an explicit protocol detection method and a implicit protocol detection method. Wherein, what is described based on the strict formalization method is an "explicit" formula protocol, the construction process of the formalization protocol needs to consider all the using conditions of the API, the process is quite complex, and part of the API protocol has defects of itself. The methods of thought learning using big data are "hidden" formula conventions, and these methods have two problems: 1) the accuracy of defect detection depends on the accuracy of the specifications; 2) when a correct calling mode does not appear in the mined samples, the corresponding rule cannot be learned, and the violation detection based on the protocol can cause false alarm.
Disclosure of Invention
Technical problem to be solved
The technical problem to be solved by the invention is as follows: how to improve accuracy of API misuse defect detection and search efficiency.
(II) technical scheme
In order to solve the above technical problem, the present invention provides an API misuse defect detection system based on sequence pattern matching, the system comprising: the system comprises an API misuse mode analysis module, an API calling sequence extraction module and an association judgment module;
the API misuse mode analysis module is used for extracting a defect code and a patch code of the known API misuse defect aiming at the known API misuse defect in the API misuse defect database and analyzing the API misuse mode of the API misuse defect;
the API calling sequence extraction module is used for analyzing program multipath aiming at a target tested code in a program code library and extracting an API calling sequence;
and the association judging module is used for judging whether an API calling sequence in the extracted target tested code is associated with the analyzed and determined API misuse mode with the historical known API misuse defect through a sequence mode matching method and reporting the similar defect.
The process of analyzing the API misuse mode with the API misuse defect by the API misuse mode analysis module is as follows:
if the historical known API misuse defect is a redundant calling type defect, carrying out API misuse mode analysis on the redundant calling type defect, setting an API misuse mode of the redundant calling type defect as p, and judging whether the target code T conforms to p or not;
the specific judgment process is that whether an API calling sequence s exists in the target tested code T or not is judged to simultaneously meet the following two conditions:
1) s calls the key method of p;
2) p is a non-contiguous subsequence of s;
when the condition is satisfied, it can be considered that an API misuse defect similar to p exists in the target code under test T.
The process of analyzing the API misuse mode with the API misuse defect by the API misuse mode analysis module is as follows:
if the historical known API misuse defects are one of missing calls, missing prefixes, missing exception handling and method return results which are not checked, analyzing API misuse patterns aiming at the five types of defects which are not checked in the missing calls, the missing prefixes, the missing exception handling and the method return results, setting the misuse patterns as p, and judging whether the target code T conforms to the patterns p;
the specific judgment process is that whether an API calling sequence s exists in the target tested code T or not is judged to simultaneously meet the following three conditions:
1) the API call sequence s calls the key method of p;
2) p is a non-contiguous subsequence of s;
3) the correct calling sequence for p is not a non-contiguous subsequence of s;
when the conditions are established, it can be considered that there is an API misuse defect similar to p in T.
The process of extracting the API call sequence by the API call sequence extraction module specifically includes:
analyzing a target code to be tested of a Java language by using a JavaParser tool to form an abstract syntax tree;
analyzing the statement block of each function in the abstract syntax tree and each statement in the statement block, and judging whether the statement contains a control node:
1) if no control node has API call, the sequence is not extracted, and if no control node has API call, the sequence is extracted;
2) if the API call does not exist under the branch path under the control node, the sequence is not extracted; if API call exists under the branch path under the control node, adding a new extraction sequence, and copying the currently extracted sequence according to the number of paths;
and when all statements of the function are analyzed, forming the extracted API calling sequence into an API calling sequence set S for outputting.
The input of the correlation judgment module is an API misuse mode set P and an API call sequence set S of a target code to be tested;
the process that the association judging module judges whether the association exists between the API calling sequence in the extracted target tested code and the analyzed and determined API misuse mode with the historical known API misuse defect through a sequence mode matching method specifically comprises the following steps:
step 3.1: each pattern P in the API misuse pattern set P is a sequence, and the elements of the sequence are a character string; constructing a prefix tree based on all the modes in the API misuse mode set P according to a traditional prefix tree construction method, wherein each node g of the prefix tree represents a character string sequence, and each edge e corresponds to a character string element;
step 3.2: the node set in the prefix tree is G, the edge set is E, the node set in the covered state is A, and the A set comprises all nodes in the mode matching success state and is initialized as a root node; the set of edges to be analyzed is B, and is initialized to be an edge set led out by the root node;
step 3.3: analyzing each character string element in the API call sequence set S; if B is empty, the step 3.7 is carried out after the analysis is finished; if B is not empty, go to step 3.4;
step 3.4: analyzing each edge e in B; whether the comparison unit corresponding to the edge e is matched with the character string element in the API call sequence set S or not is judged, and if not, the next edge e is continuously analyzed; if yes, adding the node g with the changed direction into the node A, simultaneously removing e from the set B, adding the edge led out by g into the node B, and turning to the step 3.5;
step 3.5: whether all the edges in the B are analyzed is finished; if yes, turning to step 3.6, and if not, turning to step 3.3;
step 3.6: judging whether the scanning of the API calling sequence set S is finished, if so, turning to the step 3.7; if not, go to step 3.3;
step 3.7: and B, taking the corresponding mode of all the nodes belonging to the mode matching success state in the A as the matched mode as output.
In the step 3.1, each pattern P in the API misuse pattern set P is a sequence, and an element of the sequence is a character string;
namely: for example, "java.io.file.open java.io.file.close" is a sequence pattern.
(III) advantageous effects
Compared with the prior art, the invention has the following beneficial effects:
compared with a detection scheme based on the protocol, the method does not analyze the protocol used by the API, and describes the API misuse mode according to the discovered API misuse defect example by combining the code information before and after defect repair in the patch file, and then searches the API calling sequence conforming to the misuse mode by using an improved AC algorithm in the target software to be tested, thereby detecting the similar defect. Compared with a detection method based on the specification, the method and the device effectively avoid the problems that the defect detection accuracy depends on the specification description accuracy, partial API specifications have defects, and the like, and improve the accuracy of API misuse defect detection.
In order to reduce the computational complexity, the invention improves the AC algorithm for multi-pattern matching. And the AC algorithm preprocesses a plurality of modes into a definite finite state automaton, and then scans the target sequence once to obtain whether all the modes to be checked exist in the sequence. However, the AC algorithm can only be used to detect consecutive subsequences, which the present invention improves to support the detection of non-consecutive subsequences.
Drawings
Fig. 1 and 2 are schematic diagrams illustrating the technical solution of the present invention.
Detailed Description
In order to make the objects, contents, and advantages of the present invention clearer, the following detailed description of the embodiments of the present invention will be made in conjunction with the accompanying drawings and examples.
Aiming at the problems in the prior art, the API misuse mode is characterized according to the discovered API misuse defect examples and the code information before and after defect repair in the patch file without analyzing API use protocols (the improper calling of a certain API generates wrong program behavior, and has determined error reasons and repair measures, and the mode can repeatedly appear in different software products), and then an API calling sequence conforming to the misuse mode is searched in the target software to be tested by utilizing an improved AC algorithm, so that similar defects are detected. Compared with a detection method based on the specification, the method effectively avoids the problems that the defect detection accuracy depends on the specification description accuracy, part of API specifications have defects and the like, and improves the accuracy of API misuse defect detection.
Specifically, the improved AC algorithm used by the invention searches the API calling sequence conforming to the misuse mode, and compared with the traditional AC algorithm, the method reduces the calculation complexity and improves the searching efficiency. The steps of searching for an API call sequence conforming to a misuse pattern by a conventional AC Algorithm (Aho-Corasick Algorithm) are as follows: given an API misuse defect database, analyzing a misuse pattern set P of all defects in the API misuse defect database, and setting three elements in the API misuse pattern as follows: the set of key methods is M, the set of wrong API call sequences is W, and the set of correct API call sequences is R. Description and relationship of three elements: firstly, a program must call a key method to possibly have defects in the mode; secondly, the error calling sequence contains the error using mode of the key method and is the reason for the occurrence of the API defects; finally, the correct call sequence provides a way to fix the defect.
At this time, the process of searching similar API misuse defects for the object program comprises four steps:
step 1: extracting an API calling sequence set S of a target program;
step 2: analyzing each API calling sequence S of the S, judging which key methods in the M are called by the API calling sequence S, and extracting corresponding API calling sequence sequences from the W and the R respectively according to the same subscript corresponding to the M to form a new set W and R;
and step 3: judging whether W is a non-continuous subsequence of s or not for each wrong API calling sequence W in W; if yes, turning to step 4; otherwise, repeating the step 3 until all API calling sequences in W are analyzed;
and 4, step 4: if the API misuse mode p corresponding to the w belongs to the redundant calling type, the target program has a defect similar to p and reports the defect; if p is a non-redundant calling type (namely one of five types such as missing calling, error calling, missing preposition, missing exception handling and method return result not checking), taking a correct API calling sequence R of a corresponding subscript of w from R, and continuously judging whether R is a non-continuous subsequence (sequence formed by interval splicing) of s; if yes, turning to step 3; if not, the target program has similar defects as p and reports; while reporting similar defects, reporting the sequence r as a repair measure of the defect, and turning to the step 3 to continue analyzing;
the above steps require scanning (1+2| W |) through the target sequence in the worst case, and in order to reduce the computational complexity, the Aho-coreski (ac) algorithm is improved by the invention to perform multi-pattern matching. And the AC algorithm preprocesses a plurality of modes into a definite finite state automaton, and then scans the target sequence once to obtain whether all the modes to be checked exist in the sequence. However, the AC algorithm can only be used to detect consecutive subsequences, which the present invention improves to support the detection of non-consecutive subsequences.
Specifically, to solve the above technical problem, the present invention provides an API misuse defect detection system based on sequence pattern matching, where the system includes: the system comprises an API misuse mode analysis module, an API calling sequence extraction module and an association judgment module;
the API misuse mode analysis module is used for extracting a defect code and a patch code of the known API misuse defect aiming at the known API misuse defect in the API misuse defect database and analyzing the API misuse mode of the API misuse defect;
the API calling sequence extraction module is used for analyzing program multipath aiming at a target tested code in a program code library and extracting an API calling sequence;
and the association judging module is used for judging whether an API calling sequence in the extracted target tested code is associated with the analyzed and determined API misuse mode with the historical known API misuse defect through a sequence mode matching method and reporting the similar defect.
The process of analyzing the API misuse mode with the API misuse defect by the API misuse mode analysis module is as follows:
if the historical known API misuse defect is a redundant calling type defect, carrying out API misuse mode analysis on the redundant calling type defect, setting an API misuse mode of the redundant calling type defect as p, and judging whether the target code T conforms to p or not;
the specific judgment process is that whether an API calling sequence s exists in the target tested code T or not is judged to simultaneously meet the following two conditions:
1) s calls the key method of p;
2) p is a non-contiguous subsequence of s;
when the condition is satisfied, it can be considered that an API misuse defect similar to p exists in the target code under test T.
The process of analyzing the API misuse mode with the API misuse defect by the API misuse mode analysis module is as follows:
if the historical known API misuse defects are one of missing calls, missing prefixes, missing exception handling and method return results which are not checked, analyzing API misuse patterns aiming at the five types of defects which are not checked in the missing calls, the missing prefixes, the missing exception handling and the method return results, setting the misuse patterns as p, and judging whether the target code T conforms to the patterns p;
the specific judgment process is that whether an API calling sequence s exists in the target tested code T or not is judged to simultaneously meet the following three conditions:
1) the API call sequence s calls the key method of p;
2) p is a non-contiguous subsequence of s;
3) the correct calling sequence for p is not a non-contiguous subsequence of s;
when the conditions are established, it can be considered that there is an API misuse defect similar to p in T.
The process of extracting the API call sequence by the API call sequence extraction module specifically includes:
analyzing a target code to be tested of a Java language by using a JavaParser tool to form an abstract syntax tree;
analyzing the statement block of each function in the abstract syntax tree and each statement in the statement block, and judging whether the statement contains a control node:
1) if no control node has API call, the sequence is not extracted, and if no control node has API call, the sequence is extracted;
2) if the API call does not exist under the branch path under the control node, the sequence is not extracted; if API call exists under the branch path under the control node, adding a new extraction sequence, and copying the currently extracted sequence according to the number of paths;
and when all statements of the function are analyzed, forming the extracted API calling sequence into an API calling sequence set S for outputting.
The input of the correlation judgment module is an API misuse mode set P and an API call sequence set S of a target code to be tested;
the process that the association judging module judges whether the association exists between the API calling sequence in the extracted target tested code and the analyzed and determined API misuse mode with the historical known API misuse defect through a sequence mode matching method specifically comprises the following steps:
step 3.1: each pattern P in the API misuse pattern set P is a sequence, and the elements of the sequence are a character string; constructing a prefix tree based on all the modes in the API misuse mode set P according to a traditional prefix tree construction method, wherein each node g of the prefix tree represents a character string sequence, and each edge e corresponds to a character string element;
step 3.2: the node set in the prefix tree is G, the edge set is E, the node set in the covered state is A, and the A set comprises all nodes in the mode matching success state and is initialized as a root node; the set of edges to be analyzed is B, and is initialized to be an edge set led out by the root node;
step 3.3: analyzing each character string element in the API call sequence set S; if B is empty, the step 3.7 is carried out after the analysis is finished; if B is not empty, go to step 3.4;
step 3.4: analyzing each edge e in B; whether the comparison unit corresponding to the edge e is matched with the character string element in the API call sequence set S or not is judged, and if not, the next edge e is continuously analyzed; if yes, adding the node g with the changed direction into the node A, simultaneously removing e from the set B, adding the edge led out by g into the node B, and turning to the step 3.5;
step 3.5: whether all the edges in the B are analyzed is finished; if yes, turning to step 3.6, and if not, turning to step 3.3;
step 3.6: judging whether the scanning of the API calling sequence set S is finished, if so, turning to the step 3.7; if not, go to step 3.3;
step 3.7: and B, taking the corresponding mode of all the nodes belonging to the mode matching success state in the A as the matched mode as output.
In the step 3.1, each pattern P in the API misuse pattern set P is a sequence, and an element of the sequence is a character string;
namely: for example, "java.io.file.open java.io.file.close" is a sequence pattern.
In addition, an improved technical solution of the present invention provides an API misuse defect detection method based on sequence pattern matching, as shown in fig. 1 and 2, the method includes the following steps:
step 1: aiming at the API misuse defects which are known historically in the API misuse defect database, extracting defect codes and patch codes of the API misuse defects, and analyzing an API misuse mode of the API misuse defects;
step 2: analyzing program multipath aiming at target tested codes in a program code library, and extracting an API (application program interface) calling sequence;
and step 3: and (3) judging whether an API calling sequence in the target tested code extracted in the step (2) is associated with the API misuse mode of the historical known API misuse defect analyzed and determined in the step (1) through a sequence mode matching method, and reporting the similar defect.
Wherein, the process of analyzing the API misuse mode with API misuse defects in step 1 is as follows:
if the historical known API misuse defect is a redundant calling type defect, carrying out API misuse mode analysis on the redundant calling type defect, setting an API misuse mode of the redundant calling type defect as p, and judging whether the target code T conforms to p or not;
the specific judgment process is that whether an API calling sequence s exists in the target tested code T or not is judged to simultaneously meet the following two conditions:
1) s calls the key method of p;
2) p is a non-contiguous subsequence of s;
when the condition is satisfied, it can be considered that an API misuse defect similar to p exists in the target code under test T.
Wherein, the process of analyzing the API misuse mode with API misuse defects in step 1 is as follows:
if the historical known API misuse defects are one of missing calls, missing prefixes, missing exception handling and method return results which are not checked, analyzing API misuse patterns aiming at the five types of defects which are not checked in the missing calls, the missing prefixes, the missing exception handling and the method return results, setting the misuse patterns as p, and judging whether the target code T conforms to the patterns p;
the specific judgment process is that whether an API calling sequence s exists in the target tested code T or not is judged to simultaneously meet the following three conditions:
1) the API call sequence s calls the key method of p;
2) p is a non-contiguous subsequence of s;
3) the correct calling sequence for p is not a non-contiguous subsequence of s;
when the conditions are established, it can be considered that there is an API misuse defect similar to p in T.
In step 2, the process of extracting the API call sequence specifically includes:
step 2.1: analyzing a target code to be tested of a Java language by using a JavaParser tool to form an abstract syntax tree;
step 2.2: analyzing the statement block of each function in the abstract syntax tree and each statement in the statement block, and judging whether the statement contains a control node:
1) if no control node has API call, the sequence is not extracted, and if no control node has API call, the sequence is extracted;
2) if the API call does not exist under the branch path under the control node, the sequence is not extracted; if API call exists under the branch path under the control node, adding a new extraction sequence, and copying the currently extracted sequence according to the number of paths;
step 2.3: and when all statements of the function are analyzed, forming the extracted API calling sequence into an API calling sequence set S for outputting.
The input of the step 3 is an API misuse mode set P and an API call sequence set S of a target code to be tested;
the step 3 specifically includes:
step 3.1: each pattern P in the API misuse pattern set P is a sequence, and the elements of the sequence are a character string; for example, "java.io.file.open java.io.file.close" is a sequence pattern; constructing a prefix tree based on all the modes in the API misuse mode set P according to a traditional prefix tree construction method, wherein each node g of the prefix tree represents a character string sequence, and each edge e corresponds to a character string element;
step 3.2: the node set in the prefix tree is G, the edge set is E, the node set in the covered state is A, and the A set comprises all nodes in the mode matching success state and is initialized as a root node; the set of edges to be analyzed is B, and is initialized to be an edge set led out by the root node;
step 3.3: analyzing each character string element in the API call sequence set S; if B is empty, the step 3.7 is carried out after the analysis is finished; if B is not empty, go to step 3.4;
step 3.4: analyzing each edge e in B; whether the comparison unit corresponding to the edge e is matched with the character string element in the API call sequence set S or not is judged, and if not, the next edge e is continuously analyzed; if yes, adding the node g with the changed direction into the node A, simultaneously removing e from the set B, adding the edge led out by g into the node B, and turning to the step 3.5;
step 3.5: whether all the edges in the B are analyzed is finished; if yes, turning to step 3.6, and if not, turning to step 3.3;
step 3.6: judging whether the scanning of the API calling sequence set S is finished, if so, turning to the step 3.7; if not, go to step 3.3;
step 3.7: and B, taking the corresponding mode of all the nodes belonging to the mode matching success state in the A as the matched mode as output.
Compared with the traditional AC algorithm, the improved technical scheme of the invention can find out all existing modes by scanning the character string once, and is a mode of detecting the discontinuous subsequences.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (6)

1. An API misuse defect detection system based on sequence pattern matching, the system comprising: the system comprises an API misuse mode analysis module, an API calling sequence extraction module and an association judgment module;
the API misuse mode analysis module is used for extracting a defect code and a patch code of the known API misuse defect aiming at the known API misuse defect in the API misuse defect database and analyzing the API misuse mode of the API misuse defect;
the API calling sequence extraction module is used for analyzing program multipath aiming at a target tested code in a program code library and extracting an API calling sequence;
and the association judging module is used for judging whether an API calling sequence in the extracted target tested code is associated with the analyzed and determined API misuse mode with the historical known API misuse defect through a sequence mode matching method and reporting the similar defect.
2. The API misuse defect detection system according to claim 1, wherein the API misuse pattern analysis module analyzes the API misuse pattern of the API misuse defect as follows:
if the historical known API misuse defect is a redundant calling type defect, carrying out API misuse mode analysis on the redundant calling type defect, setting an API misuse mode of the redundant calling type defect as p, and judging whether the target code T conforms to p or not;
the specific judgment process is that whether an API calling sequence s exists in the target tested code T or not is judged to simultaneously meet the following two conditions:
1) s calls the key method of p;
2) p is a non-contiguous subsequence of s;
when the condition is satisfied, it can be considered that an API misuse defect similar to p exists in the target code under test T.
3. The API misuse defect detection system according to claim 2 based on sequence pattern matching, wherein the API misuse pattern analysis module analyzes the API misuse pattern of the API misuse defect by the following procedure:
if the historical known API misuse defects are one of missing calls, missing prefixes, missing exception handling and method return results which are not checked, analyzing API misuse patterns aiming at the five types of defects which are not checked in the missing calls, the missing prefixes, the missing exception handling and the method return results, setting the misuse patterns as p, and judging whether the target code T conforms to the patterns p;
the specific judgment process is that whether an API calling sequence s exists in the target tested code T or not is judged to simultaneously meet the following three conditions:
1) the API call sequence s calls the key method of p;
2) p is a non-contiguous subsequence of s;
3) the correct calling sequence for p is not a non-contiguous subsequence of s;
when the conditions are established, it can be considered that there is an API misuse defect similar to p in T.
4. The API misuse defect detection system based on sequence pattern matching as recited in claim 3, wherein the process of the API call sequence extraction module extracting the API call sequence specifically includes:
analyzing a target code to be tested of a Java language by using a JavaParser tool to form an abstract syntax tree;
analyzing the statement block of each function in the abstract syntax tree and each statement in the statement block, and judging whether the statement contains a control node:
1) if no control node has API call, the sequence is not extracted, and if no control node has API call, the sequence is extracted;
2) if the API call does not exist under the branch path under the control node, the sequence is not extracted; if API call exists under the branch path under the control node, adding a new extraction sequence, and copying the currently extracted sequence according to the number of paths;
and when all statements of the function are analyzed, forming the extracted API calling sequence into an API calling sequence set S for outputting.
5. The API misuse defect detection system of claim 4 wherein,
the input of the correlation judgment module is an API misuse mode set P and an API call sequence set S of a target code to be tested;
the process that the association judging module judges whether the association exists between the API calling sequence in the extracted target tested code and the analyzed and determined API misuse mode with the historical known API misuse defect through a sequence mode matching method specifically comprises the following steps:
step 3.1: each pattern P in the API misuse pattern set P is a sequence, and the elements of the sequence are a character string; constructing a prefix tree based on all the modes in the API misuse mode set P according to a traditional prefix tree construction method, wherein each node g of the prefix tree represents a character string sequence, and each edge e corresponds to a character string element;
step 3.2: the node set in the prefix tree is G, the edge set is E, the node set in the covered state is A, and the A set comprises all nodes in the mode matching success state and is initialized as a root node; the set of edges to be analyzed is B, and is initialized to be an edge set led out by the root node;
step 3.3: analyzing each character string element in the API call sequence set S; if B is empty, the step 3.7 is carried out after the analysis is finished; if B is not empty, go to step 3.4;
step 3.4: analyzing each edge e in B; whether the comparison unit corresponding to the edge e is matched with the character string element in the API call sequence set S or not is judged, and if not, the next edge e is continuously analyzed; if yes, adding the node g with the changed direction into the node A, simultaneously removing e from the set B, adding the edge led out by g into the node B, and turning to the step 3.5;
step 3.5: whether all the edges in the B are analyzed is finished; if yes, turning to step 3.6, and if not, turning to step 3.3;
step 3.6: judging whether the scanning of the API calling sequence set S is finished, if so, turning to the step 3.7; if not, go to step 3.3;
step 3.7: and B, taking the corresponding mode of all the nodes belonging to the mode matching success state in the A as the matched mode as output.
6. The API misuse defect detection system of claim 5 wherein,
in the step 3.1, each pattern P in the API misuse pattern set P is a sequence, and an element of the sequence is a character string;
namely: for example, "java.io.file.open java.io.file.close" is a sequence pattern.
CN202010974385.7A 2020-09-16 2020-09-16 API misuse defect detection system based on sequence pattern matching Active CN112214399B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010974385.7A CN112214399B (en) 2020-09-16 2020-09-16 API misuse defect detection system based on sequence pattern matching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010974385.7A CN112214399B (en) 2020-09-16 2020-09-16 API misuse defect detection system based on sequence pattern matching

Publications (2)

Publication Number Publication Date
CN112214399A true CN112214399A (en) 2021-01-12
CN112214399B CN112214399B (en) 2023-01-10

Family

ID=74049512

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010974385.7A Active CN112214399B (en) 2020-09-16 2020-09-16 API misuse defect detection system based on sequence pattern matching

Country Status (1)

Country Link
CN (1) CN112214399B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115053A (en) * 2020-09-16 2020-12-22 北京京航计算通讯研究所 API misuse defect detection method based on sequence pattern matching
CN113392016A (en) * 2021-06-25 2021-09-14 中债金科信息技术有限公司 Protocol generation method, device, equipment and medium for processing program abnormal condition
CN113900962A (en) * 2021-12-10 2022-01-07 广州易方信息科技股份有限公司 Code difference detection method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101286132A (en) * 2008-06-02 2008-10-15 北京邮电大学 Test method and system based on software defect mode
US20130061211A1 (en) * 2011-09-01 2013-03-07 Infosys Limited Systems, methods, and computer-readable media for measuring quality of application programming interfaces
CN108932192A (en) * 2017-05-22 2018-12-04 南京大学 A kind of Python Program Type defect inspection method based on abstract syntax tree
CN109408371A (en) * 2018-09-18 2019-03-01 深圳壹账通智能科技有限公司 Software defect analyzes input method, device, computer equipment and storage medium
CN109857648A (en) * 2019-01-14 2019-06-07 复旦大学 A kind of change mode excavation method of API misuse
CN110362968A (en) * 2019-07-16 2019-10-22 腾讯科技(深圳)有限公司 Information detecting method, device and server
CN111400724A (en) * 2020-05-08 2020-07-10 中国人民解放军国防科技大学 Operating system vulnerability detection method, system and medium based on code similarity analysis
CN112115053A (en) * 2020-09-16 2020-12-22 北京京航计算通讯研究所 API misuse defect detection method based on sequence pattern matching

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101286132A (en) * 2008-06-02 2008-10-15 北京邮电大学 Test method and system based on software defect mode
US20130061211A1 (en) * 2011-09-01 2013-03-07 Infosys Limited Systems, methods, and computer-readable media for measuring quality of application programming interfaces
CN108932192A (en) * 2017-05-22 2018-12-04 南京大学 A kind of Python Program Type defect inspection method based on abstract syntax tree
CN109408371A (en) * 2018-09-18 2019-03-01 深圳壹账通智能科技有限公司 Software defect analyzes input method, device, computer equipment and storage medium
CN109857648A (en) * 2019-01-14 2019-06-07 复旦大学 A kind of change mode excavation method of API misuse
CN110362968A (en) * 2019-07-16 2019-10-22 腾讯科技(深圳)有限公司 Information detecting method, device and server
CN111400724A (en) * 2020-05-08 2020-07-10 中国人民解放军国防科技大学 Operating system vulnerability detection method, system and medium based on code similarity analysis
CN112115053A (en) * 2020-09-16 2020-12-22 北京京航计算通讯研究所 API misuse defect detection method based on sequence pattern matching

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SVEN AMANN等: ""MUBench: A Benchmark for API-Misuse Detectors"", 《2016 IEEE/ACM 13TH WORKING CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR)》 *
汪昕等: "基于深度学习的API误用缺陷检测", 《软件学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115053A (en) * 2020-09-16 2020-12-22 北京京航计算通讯研究所 API misuse defect detection method based on sequence pattern matching
CN113392016A (en) * 2021-06-25 2021-09-14 中债金科信息技术有限公司 Protocol generation method, device, equipment and medium for processing program abnormal condition
CN113900962A (en) * 2021-12-10 2022-01-07 广州易方信息科技股份有限公司 Code difference detection method and device
CN113900962B (en) * 2021-12-10 2022-03-18 广州易方信息科技股份有限公司 Code difference detection method and device

Also Published As

Publication number Publication date
CN112214399B (en) 2023-01-10

Similar Documents

Publication Publication Date Title
CN112214399B (en) API misuse defect detection system based on sequence pattern matching
CN112115053A (en) API misuse defect detection method based on sequence pattern matching
Pham et al. Detection of recurring software vulnerabilities
CN111459799B (en) Software defect detection model establishing and detecting method and system based on Github
CN114077741B (en) Software supply chain safety detection method and device, electronic equipment and storage medium
CN112651028B (en) Vulnerability code clone detection method based on context semantics and patch verification
CN111914260B (en) Binary program vulnerability detection method based on function difference
CN109670318B (en) Vulnerability detection method based on cyclic verification of nuclear control flow graph
CN111400724A (en) Operating system vulnerability detection method, system and medium based on code similarity analysis
CN112733156A (en) Intelligent software vulnerability detection method, system and medium based on code attribute graph
CN115952503B (en) Application safety test method and system fused with black and white ash safety detection technology
CN116383833A (en) Method and device for testing software program code, electronic equipment and storage medium
US7496898B1 (en) Error analysis and diagnosis for generic function calls
CN113297580B (en) Code semantic analysis-based electric power information system safety protection method and device
CN112965838B (en) Concurrent program data competition checking method and device
US20150193617A1 (en) Signature verification device, signature verification method, and program
Sun et al. Propagating bug fixes with fast subgraph matching
CN110928793A (en) Regular expression detection method and device and computer readable storage medium
CN108804308B (en) Defect detection method and device for new version program
CN112905370A (en) Topological graph generation method, anomaly detection method, device, equipment and storage medium
CN115408700A (en) Open source component detection method based on binary program modularization
CN115577364A (en) Vulnerability mining method for result fusion of multiple static analysis tools
CN112464237B (en) Static code security diagnosis method and device
CN115310095A (en) Block chain intelligent contract mixed formal verification method and system
CN113961475B (en) Protocol-oriented error processing defect detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant