CN115858405A - Grammar perception fuzzy test method and system for code test - Google Patents

Grammar perception fuzzy test method and system for code test Download PDF

Info

Publication number
CN115858405A
CN115858405A CN202310194536.0A CN202310194536A CN115858405A CN 115858405 A CN115858405 A CN 115858405A CN 202310194536 A CN202310194536 A CN 202310194536A CN 115858405 A CN115858405 A CN 115858405A
Authority
CN
China
Prior art keywords
syntax tree
abstract syntax
segment
code
test
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310194536.0A
Other languages
Chinese (zh)
Inventor
毛得明
茹凯琪
吴春明
曹夕
卞绪杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 30 Research Institute
Original Assignee
CETC 30 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 30 Research Institute filed Critical CETC 30 Research Institute
Priority to CN202310194536.0A priority Critical patent/CN115858405A/en
Publication of CN115858405A publication Critical patent/CN115858405A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Devices For Executing Special Programs (AREA)

Abstract

The invention relates to the technical field of grammar perception fuzzy test, and discloses a grammar perception fuzzy test method and a system for code test, wherein the method adopts a deep learning mode to carry out grammar perception learning on a fragile source code, generates a new code segment, and realizes the fuzzy test of the code by using the new code segment; wherein, the fragile source code refers to the source code which has been disclosed to have security defect. The invention solves the problems that the code segment is difficult to automatically generate, the programming language specification inspection is difficult to accurately realize and the like in the prior art.

Description

Grammar perception fuzzy test method and system for code test
Technical Field
The invention relates to the technical field of grammar perception fuzzy test, in particular to a grammar perception fuzzy test method and a grammar perception fuzzy test system for code test.
Background
The grammar-aware fuzzy test is mainly a method for carrying out automatic vulnerability mining on key basic software such as an interpreter and a compiler. The method is characterized in that a test case is automatically generated by recognizing grammatical semantics, and an attempt is made to enable a target application program to be abnormal, so that a security defect is discovered.
Software such as an interpreter and a compiler is a key infrastructure in the software field, and if a problem occurs, a great potential safety hazard is caused. The behavior of the generated executable program and the source program semantic may be inconsistent, so that unexpected errors occur to the program, which are not easy to detect and find for application program developers and easily cause serious online accidents, and therefore, the reliability of the interpreter (or compiler) is very critical. However, fuzz testing of an interpreter or a compiler is difficult, and highly structured input data is required. The test case for an interpreter or compiler is a segment of code that needs to meet the programming language specification. If the generated test case can not meet the programming specification, the interpreter (or compiler) will report syntax error in advance, so that the deep test can not be carried out.
Conventional fuzz testing methods can be divided into variation-based fuzz testing and generation-based fuzz testing. The fuzzy test method based on variation is to apply variation technology to the existing data sample to create a test case; the fuzzy test method based on generation is to generate a test case from the beginning by a method of modeling a target protocol or a file format. It is difficult to generate highly structured data based on both variation-based and generation-based fuzz testing methods.
In order to solve the problem of test case generation, the existing scheme is to extract code segments, then combine the code segments, including variables, expressions, operators and the like, and add language legal detection to generate a test code with controllable code length, controllable code nesting depth, controllable transmission parameter quantity and the like, and ensure accurate grammar. However, the combination of the code segments generated in the way is simpler, and the analysis and utilization of the code data stream are lacked; secondly, a test case library can be constructed, when the test case library is large enough, certain potential safety hazards can be eliminated, and the method has the difficulty that test cases of a programming language are always limited and cannot cope with an infinite state space; because of the more specifications of programming languages, the general test case generation method is difficult to cover a larger state space, and some researches are currently carried out to perform fuzzy tests aiming at specific language characteristics.
The technical problems of the prior art mainly include the following two points:
1. code fragments automatically generate problems:
the code segment generated by adopting the code segment combination method has little difference with the original method, and a great deal of manpower and material resources are needed to be consumed for constructing various code libraries, so that a method capable of automatically generating the code segment is needed to be provided for efficiently carrying out fuzzy test on an interpreter and a compiler.
2. Programming language specification checking problem:
after the code fragments are automatically generated, the generated code fragments need to be checked to see if the programming language specifications are met, and if the specifications are not met, corresponding adjustments need to be made to the code fragments. By combining the code segments predicted by the deep learning model with the original code segments, variable reference errors are easily caused.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a grammar perception fuzzy test method and a grammar perception fuzzy test system for code testing, which solve the problems that code segments are difficult to generate automatically, programming language specification checking is difficult to realize accurately and the like in the prior art.
The technical scheme adopted by the invention for solving the problems is as follows:
a grammar perception fuzzy test method for code test is characterized in that a deep learning mode is adopted to conduct grammar perception learning on fragile source codes to generate new code segments, and the new code segments are utilized to realize fuzzy test of codes; wherein, the fragile source code refers to the source code which has been disclosed to have security defect.
As a preferred technical scheme, the method comprises the following steps:
s1, data preprocessing: converting the collected vulnerability source codes into data types which can be identified by a deep learning model, thereby realizing the analysis processing of the vulnerability source codes;
s2, model training: training a deep learning model by using the vulnerability source code after data preprocessing as a training set by adopting a deep learning method;
s3, code generation and check: generating a new code segment by using the trained deep learning model, and then carrying out normative check on the generated new code segment;
s4, fuzzing test: and inputting the generated new code segment to an interpreter or a compiler for fuzzing test.
As a preferred technical solution, the step S1 includes the following steps:
s11, running a test code by using an interpreter, and checking whether the fragile source code meets a grammar specification;
s12, converting the fragile source code into an abstract syntax tree;
s13, replacing the variable name and the function name in the vulnerability source code with the self-defined variable name and function name;
s14, converting the complete abstract syntax tree into an abstract syntax tree sub-tree;
and S15, converting the abstract syntax tree subtree into an input which can be identified by a deep learning model.
As a preferred technical solution, in S14, a subtree of the abstract syntax tree is formed by recursively traversing the abstract syntax tree and replacing the subtree of the current node with the root node of the subtree.
As a preferred solution, in S15, the dependency relationship between the sub-trees of the abstract syntax tree depends on the order of the abstract syntax tree fragments.
As a preferred technical solution, the step S2 includes the following steps:
s21: establishing a statistical language model according to each fragment sequence, so that the statistical language model can predict the next fragment according to the context fragment; the training targets of the statistical language model are as follows: abstract syntax tree sequence for a given piece of code
Figure SMS_1
According to >>
Figure SMS_2
Predicting next abstractLegal tree fragment->
Figure SMS_3
S22, defining a loss function for rewarding the fragments related to the positioning types;
the loss function is:
Figure SMS_4
Figure SMS_5
wherein,
Figure SMS_8
representing an abstract syntax tree sequence pick>
Figure SMS_12
Is selected based on the abstract syntax tree fragment of (4)/, is selected>
Figure SMS_16
Represents the next abstract syntax tree segment with the greatest likelihood of being predicted, and->
Figure SMS_9
Represents a normalization function, <' > is selected>
Figure SMS_13
Represents the distribution probability of the next abstract syntax tree segment, based on the value of the distribution probability value>
Figure SMS_18
A function of an objective measure is represented, device for selecting or keeping>
Figure SMS_20
Represents->
Figure SMS_6
,/>
Figure SMS_10
Represents a set of the current abstract syntax tree segment and the next abstract syntax tree segment, and->
Figure SMS_14
Indicates that the collection is->
Figure SMS_17
Number of elements in (1), and>
Figure SMS_7
represents the probability distribution of the next abstract syntax tree segment, based on the value of the probability distribution>
Figure SMS_11
,/>
Figure SMS_15
Represents a cross entropy loss function, <' > based on the entropy of the entropy signal>
Figure SMS_19
Representing that abstract syntax tree fragments having the same type as the real abstract syntax tree fragments are prioritized;
s23, reducing during training
Figure SMS_21
And &>
Figure SMS_22
And, enabling the statistical language model to achieve the training goal.
As a preferable embodiment, in S22,
Figure SMS_23
Figure SMS_24
wherein,
Figure SMS_26
number representing a segment of the abstract syntax tree, and->
Figure SMS_28
Representing a total number of abstract syntax tree fragments, -a>
Figure SMS_31
A probability distribution representing a next abstract syntax tree segment of the ith abstract syntax tree segment, based on the comparison of the values of the parameters in the abstract syntax tree segment>
Figure SMS_27
Probability distribution of the next abstract syntax tree segment representing a prediction, -a->
Figure SMS_30
Abstract syntax tree segment number, in conjunction with a predicate flag, indicating a correct type>
Figure SMS_32
Represents a probability distribution of correct type, based on the number of times the next abstract syntax tree segment is predicted>
Figure SMS_33
Represents the number of abstract syntax tree fragments of the same type as the real abstract syntax tree fragments, and->
Figure SMS_25
Represents the first n most likely abstract syntax tree fragments, which are in the set that predicts the next abstract syntax tree fragment, and->
Figure SMS_29
The representation returns a collection of abstract syntax tree fragments of the correct type.
As a preferred technical solution, the step S3 includes the following steps:
s31, randomly selecting a test case from the test case set, and randomly deleting an abstract syntax tree segment in the test case from the selected test case to obtain a deleted abstract syntax tree segment; the test case set refers to the combination of a plurality of vulnerability source codes;
s32, synthesizing an abstract syntax tree: obtaining a complete abstract syntax tree by using the deleted abstract syntax tree fragments obtained in the step S31;
and S33, carrying out syntax check on the complete abstract syntax tree, and judging whether the complete abstract syntax tree meets syntax specifications.
As a preferred technical solution, the step S32 includes the following steps:
s321, inputting the deleted abstract syntax tree segment into the trained deep learning model to obtain a candidate abstract syntax tree segment;
s322, selecting a proper abstract syntax tree segment from the candidate segments and combining the abstract syntax tree segment with the deleted abstract syntax tree segment to generate a complete abstract syntax tree; wherein, a suitable abstract syntax tree segment refers to an abstract syntax tree segment that simultaneously satisfies the following conditions: A. the predicted next abstract syntax tree segment has the highest probability in the set; B. matching the type required by the current abstract syntax tree segment;
s323, through traversing the abstract syntax tree, searching the position of adding the abstract syntax tree segment until no node can be added, thereby completing the synthesis of the abstract syntax tree; the concrete method for searching the position of adding the abstract syntax tree fragment comprises the following steps: if the current node has no child node and is not a terminal node, attaching the abstract syntax tree segment to the current node; otherwise, the searching method is called iteratively on the sub-nodes of the current node to perform the traversal in the preset sequence.
A grammar-aware fuzz testing system for code testing is used for realizing the grammar-aware fuzz testing method for code testing, and comprises the following modules which are connected in sequence:
a data preprocessing module: the method is used for converting the collected vulnerability source codes into data types which can be identified by a deep learning model, so that the vulnerability source codes are analyzed;
a model training module: the method is used for training the deep learning model by adopting a deep learning method and taking the vulnerability source code after data preprocessing as a training set;
a code generation and inspection module: generating a new code segment by using the trained deep learning model, and then carrying out normative check on the generated new code segment;
a fuzzy test module: the generated new code segment is input to an interpreter or a compiler for fuzzing test.
Compared with the prior art, the invention has the following beneficial effects:
the invention innovatively provides a method for automatically generating new code segments by using a deep learning method, which not only contains the grammatical characteristics of the original code segments, but also adds a new structure, and improves the diversity of the code segments; and secondly, by deducing the variable types, syntax errors in the newly generated code segments are solved, and the effectiveness of the newly generated codes is improved to a certain extent. Through an automatic generation and error correction mechanism, the invention greatly improves the generation efficiency of the code segment, improves the diversity of the test case and optimizes the grammar perception fuzzy test technology.
Drawings
FIG. 1 is a schematic diagram illustrating steps of a syntax-aware fuzz testing method for code testing according to the present invention;
FIG. 2 is a schematic diagram of the detailed step S1 of the present invention;
FIG. 3 is a diagram illustrating a specific step of step S3 according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited to these examples.
Example 1
As shown in fig. 1 to fig. 3, the present invention adopts a deep learning method to train a code segment prediction model by learning the grammatical structure of an existing code segment, so as to automatically guide the generation of a new code segment.
The invention provides correct variable reference by using a static type inference method, thereby improving the correctness of the generated code.
The starting point of the technical scheme of the invention is from the observation of the code segment triggering the security vulnerability. Summarizing the characteristics of generalized PoC by collecting PoC samples of interpreter (or compiler) exploits, it was found that more than 90% of the code fragments therein are syntactically overlapping. If sensing is carried out through the existing security defect code grammar, a code segment is directly generated, so that the grammar characteristics of the PoC code segment are reserved, and a new code segment can be generated, thereby triggering the security defect of an interpreter (or a compiler) more efficiently.
According to the method, a deep learning mode is adopted to conduct grammar perception learning on the collected vulnerability source codes, a new code segment is generated, and automatic fuzzy testing is achieved. To achieve the above object, the basic process of the present invention is shown in fig. 1.
In the first stage, the analysis processing of the source codes is mainly completed, and the collected vulnerability source codes are converted into data types which can be identified by a deep learning model. The basic process is shown in fig. 2.
The invention mainly aims at grammar learning, firstly, an interpreter is used for running a test code to check whether the grammar conforms to grammar specifications. The code is then converted into an Abstract Syntax Tree (AST) which characterizes the code by a tree structure and omits some of the code details. Therefore, different code segments have similar structures, and deep learning model identification and generation are facilitated.
The code standardization is to replace variable names, function names and the like in codes, so that sentence structures are more concerned during training, and the influence caused by the variable names is reduced. By traversing the abstract syntax tree, the information such as the user-defined variable name, function name and the like is collected, and standardized replacement is carried out in the action range of the user-defined variable and function. The variable names or functions in the code base do not need to be replaced, thereby facilitating the generation of an AST fragment containing the code base. ( And replacing the variable name and the function name in the vulnerability source code with the self-defined variable name and function name. The more specific mode is as follows: and replacing variable names, function names and the like in the codes, replacing the variable names with v + numbers, increasing the numbers along with the traversed different variables, replacing the function names with f + numbers, and increasing the numbers along with the traversed different function names. Therefore, the sentence structure is more concerned during training, and the influence brought by the variable name is reduced. )
AST fragmentation is mainly to convert a complete AST into AST subtrees with height of 1, facilitating training of deep learning models. By recursively traversing the AST, the subtree of the current node is replaced with the root node of the subtree, and an AST subtree with a height of 1 is formed. The root node of each segment is an internal node of the AST and also corresponds to a leaf node in another segment. For a given AST, one unit sub-tree is extracted from each internal node. Therefore, the number of extracted unit subtrees becomes the number of AST internal nodes.
AST sequence vectorization is primarily the conversion of AST subtrees into recognizable inputs to the deep learning model. The composition relationship between the segments is modeled as a ranking of the segments so that a deep learning model can predict the next segment to be used based on the segments that appear in front of the grammar.
The second stage is mainly to complete the training work. The training effect is influenced because the fragment set contains some fragments without significant meaning. So, before training begins, labeling is performed for some less frequently occurring fragments.
Each sequence of segments represents a file, also an input for each training. Based on each input, a statistical language model is built that enables the model to predict the next segment based on the context segment. Training a target: given a segment coded AST sequence X, the next AST fragment Y is predicted from X. Wherein,
Figure SMS_34
,/>
Figure SMS_35
,/>
Figure SMS_36
,/>
Figure SMS_37
representing the ith AST fragment of the r AST's in a segment of code. The predicted output has the most probable code segment, giving preference to segments of the same type as the true segment, but not to segments of other types.
To achieve the training goal, a new penalty function (shown below) is defined for rewarding for locating type-dependent segments, thereby continually optimizing the goal. The loss function is applied to the training set D,
Figure SMS_38
Figure SMS_39
Figure SMS_40
in the above-mentioned formula,
Figure SMS_41
and &>
Figure SMS_42
The definition of (A) is as follows.
Figure SMS_43
Figure SMS_44
In the above-mentioned formula,
Figure SMS_45
indicating the number of segments of the same type as the real segments. />
Figure SMS_46
And &>
Figure SMS_47
Respectively denotes before returning->
Figure SMS_48
Individual segments and correct type segments. />
Figure SMS_49
For the reward model, segments with the same type as the real segments are prioritized.
Reduce in training
Figure SMS_50
And &>
Figure SMS_51
The model can achieve the training goal. Eventually, not only can the correct segment for a given context be predicted, but also the segment of the same type as the correct segment in its given recommendation can be located.
The third stage mainly completes the generation and checking of the code. The basic process is shown in fig. 3.
Firstly, randomly selecting a test case from a test case set, and randomly deleting one AST fragment from the selected test case. And taking the deleted AST segment as input, and predicting according to a model trained by deep learning so as to obtain a candidate AST segment. And selecting a proper fragment from the candidate fragments by adopting a K-Top algorithm to be combined with the deleted AST fragment to generate the complete AST. The process of this combination is the reverse of the process of AST fragmentation in the first stage. By traversing the AST, the location of adding the fragment is found. If the current node does not have any children and is not an end node, then the fragment is appended to the current node. Otherwise, it will make iterative calls to itself on the children of the node to make a predetermined sequence traversal. And finally, completing AST synthesis until no node can be added.
Then, syntax check is performed on the complete AST to determine whether it meets the syntax specification. Since different AST fragments are combined, a reference error is easily caused. The invention combines the context of AST fragment generation and infers the variable type through static mode according to the using mode of the new introduced variable (if the variable is in binary operator or ternary operator, the variable type is consistent with other variable types in the expression, if the variable is in unitary operator or no operator, the variable is considered to be integer). Therefore, according to the inferred variable types, corresponding variables are declared in front of the variable scope, and variable reference errors are solved.
And the fourth stage is mainly used for completing the work of the fuzz test. And (5) finally sending the generated code segment to an interpreter (compiler) to complete a fuzzing test link.
The invention innovatively provides a method for automatically generating new code segments by using a deep learning method, which not only contains the grammatical characteristics of the original code segments, but also adds a new structure, and improves the diversity of the code segments; and secondly, by deducing the variable types, syntax errors in the newly generated code segments are solved, and the effectiveness of the newly generated codes is improved to a certain extent. Through an automatic generation and error correction mechanism, the method greatly improves the generation efficiency of the code segment, improves the diversity of the test case, and optimizes the grammar perception fuzzy test technology.
As described above, the present invention can be preferably realized.
All features disclosed in all embodiments of the present specification, or all methods or process steps implicitly disclosed, may be combined and/or expanded, or substituted, in any way, except for mutually exclusive features and/or steps.
The foregoing is only a preferred embodiment of the present invention, and the present invention is not limited thereto in any way, and any simple modification, equivalent replacement and improvement made to the above embodiment within the spirit and principle of the present invention still fall within the protection scope of the present invention.

Claims (10)

1. A grammar perception fuzzy test method for code test is characterized in that a deep learning mode is adopted to conduct grammar perception learning on fragile source codes, new code segments are generated, and the new code segments are utilized to achieve fuzzy test of the codes; the source code with vulnerability refers to the source code with security defect which has been disclosed.
2. The syntax-aware fuzz testing method for code testing according to claim 1, characterized by comprising the following steps:
s1, data preprocessing: converting the collected vulnerability source codes into data types which can be identified by a deep learning model, thereby realizing the analysis processing of the vulnerability source codes;
s2, model training: training a deep learning model by using the vulnerability source codes subjected to data preprocessing as a training set by adopting a deep learning method;
s3, code generation and inspection: generating a new code segment by using the trained deep learning model, and then carrying out normative check on the generated new code segment;
s4, fuzzing test: and inputting the generated new code segment to an interpreter or a compiler for fuzzing test.
3. The syntax-aware fuzz testing method for code testing according to claim 2, wherein the step S1 comprises the following steps:
s11, running a test code by using an interpreter, and checking whether the fragile source code meets a grammar specification;
s12, converting the fragile source code into an abstract syntax tree;
s13, replacing the variable name and the function name in the vulnerability source code with the self-defined variable name and function name;
s14, converting the complete abstract syntax tree into an abstract syntax tree sub-tree;
and S15, converting the abstract syntax tree subtree into an input which can be identified by a deep learning model.
4. The syntax-aware fuzz testing method for code testing as claimed in claim 3, wherein in S14, the sub-trees of the abstract syntax tree are constructed by recursively traversing the abstract syntax tree to replace the sub-trees of the current node with the root nodes of the sub-trees.
5. The syntax aware fuzzing method for code testing according to claim 4, wherein in S15, the dependency relationship between the sub-trees of the abstract syntax tree depends on the order of the abstract syntax tree fragments.
6. The syntax-aware fuzzing method for code testing according to any one of claims 2 to 5, wherein the step S2 comprises the following steps:
s21: according to each fragment sequence, establishing a statistical language model, so that the statistical language model can predict the next fragment according to the context fragment; the training targets of the statistical language model are as follows: abstract syntax tree sequence for a given piece of code
Figure QLYQS_1
According to >>
Figure QLYQS_2
Predicting the next abstract syntax tree fragment->
Figure QLYQS_3
S22, defining a loss function for rewarding the fragments related to the positioning types;
the loss function is:
Figure QLYQS_4
Figure QLYQS_5
wherein,
Figure QLYQS_7
representing an abstract syntax tree sequence pick>
Figure QLYQS_11
Is selected based on the abstract syntax tree fragment of (4)/, is selected>
Figure QLYQS_17
The next abstract syntax tree segment, representing the highest predicted likelihood, of being predicted>
Figure QLYQS_9
Represents a normalization function, <' > is selected>
Figure QLYQS_13
Represents the distribution probability of the next abstract syntax tree segment, based on the value of the distribution probability value>
Figure QLYQS_18
A function of an objective measure is represented, device for selecting or keeping>
Figure QLYQS_20
Represents->
Figure QLYQS_6
,/>
Figure QLYQS_10
Represents a set of the current abstract syntax tree segment and the next abstract syntax tree segment, and->
Figure QLYQS_14
Indicates that the collection is->
Figure QLYQS_16
Number of middle element(s) is greater or less>
Figure QLYQS_8
Represents the probability distribution of the next abstract syntax tree segment, based on the value of the probability distribution>
Figure QLYQS_12
,/>
Figure QLYQS_15
Represents a cross entropy loss function>
Figure QLYQS_19
Representing that abstract syntax tree fragments having the same type as the real abstract syntax tree fragments are prioritized;
s23, reducing during training
Figure QLYQS_21
And &>
Figure QLYQS_22
And, enabling the statistical language model to achieve the training goal.
7. The syntax-aware fuzz testing method for code testing according to claim 6, wherein, in S22,
Figure QLYQS_23
Figure QLYQS_24
wherein,
Figure QLYQS_27
number representing a segment of the abstract syntax tree, and->
Figure QLYQS_30
Representing a total number of abstract syntax tree fragments, -a>
Figure QLYQS_32
A probability distribution representing a next abstract syntax tree segment of the ith abstract syntax tree segment, based on the comparison of the values of the parameters in the abstract syntax tree segment>
Figure QLYQS_26
Probability distribution of the next abstract syntax tree segment representing a prediction, -a->
Figure QLYQS_29
Represents the correct type of abstract syntax tree segment number, <' > or>
Figure QLYQS_31
Represents a probability distribution of correct type, based on the number of times the next abstract syntax tree segment is predicted>
Figure QLYQS_33
Represents the number of abstract syntax tree fragments of the same type as the real abstract syntax tree fragment, and/or->
Figure QLYQS_25
Representing the first n most likely abstract syntax tree fragments in a set that predicts the next abstract syntax tree fragment, and->
Figure QLYQS_28
The representation returns a collection of abstract syntax tree fragments of the correct type.
8. The syntax-aware fuzz testing method for code testing according to claim 7, wherein the step S3 comprises the following steps:
s31, randomly selecting a test case from the test case set, and randomly deleting an abstract syntax tree segment in the test case from the selected test case to obtain a deleted abstract syntax tree segment; the test case set refers to the combination of a plurality of fragile source codes;
s32, synthesizing an abstract syntax tree: obtaining a complete abstract syntax tree by using the deleted abstract syntax tree segment obtained in the step S31;
and S33, carrying out syntax check on the complete abstract syntax tree, and judging whether the complete abstract syntax tree accords with syntax specifications.
9. The syntax-aware fuzzing method for code testing according to claim 8, wherein the step S32 includes the steps of:
s321, inputting the deleted abstract syntax tree segment into the trained deep learning model to obtain a candidate abstract syntax tree segment;
s322, selecting a proper abstract syntax tree segment from the candidate segments and combining the abstract syntax tree segment with the deleted abstract syntax tree segment to generate a complete abstract syntax tree; wherein, the suitable abstract syntax tree fragment refers to an abstract syntax tree fragment satisfying the following conditions at the same time: A. the predicted next abstract syntax tree fragment has the highest probability in the set; B. matching with the type required by the current abstract syntax tree fragment;
s323, through traversing the abstract syntax tree, searching the position of adding abstract syntax tree fragments until no node can be added, thereby completing the synthesis of the abstract syntax tree; the concrete method for searching the position of adding the abstract syntax tree segment comprises the following steps: if the current node has no child node and is not a terminal node, attaching the abstract syntax tree segment to the current node; otherwise, the searching method is called iteratively on the subnodes of the current node to traverse in a preset sequence.
10. A grammar-aware fuzzing test system for code testing, characterized in that, the grammar-aware fuzzing test method for code testing according to any one of claims 1 to 9 is realized, which comprises the following modules connected in sequence:
a data preprocessing module: the method is used for converting the collected vulnerability source codes into data types which can be identified by a deep learning model, so that the vulnerability source codes are analyzed;
a model training module: the method is used for training the deep learning model by adopting a deep learning method and taking the vulnerability source code after data preprocessing as a training set;
a code generation and inspection module: generating a new code segment by using the trained deep learning model, and then carrying out normative check on the generated new code segment;
a fuzzy test module: the generated new code segment is input to an interpreter or a compiler for fuzzing test.
CN202310194536.0A 2023-03-03 2023-03-03 Grammar perception fuzzy test method and system for code test Pending CN115858405A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310194536.0A CN115858405A (en) 2023-03-03 2023-03-03 Grammar perception fuzzy test method and system for code test

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310194536.0A CN115858405A (en) 2023-03-03 2023-03-03 Grammar perception fuzzy test method and system for code test

Publications (1)

Publication Number Publication Date
CN115858405A true CN115858405A (en) 2023-03-28

Family

ID=85659847

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310194536.0A Pending CN115858405A (en) 2023-03-03 2023-03-03 Grammar perception fuzzy test method and system for code test

Country Status (1)

Country Link
CN (1) CN115858405A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117762415A (en) * 2023-11-17 2024-03-26 北京计算机技术及应用研究所 Fuzzy test-oriented abstract syntax tree mutation method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170212829A1 (en) * 2016-01-21 2017-07-27 American Software Safety Reliability Company Deep Learning Source Code Analyzer and Repairer
CN111367815A (en) * 2020-03-24 2020-07-03 中国电子科技网络信息安全有限公司 Man-machine cooperation based software vulnerability fuzzy test method
CN113157565A (en) * 2021-03-23 2021-07-23 西北大学 Feedback type JS engine fuzzy test method and device based on seed case mutation
CN114048464A (en) * 2022-01-12 2022-02-15 北京大学 Ether house intelligent contract security vulnerability detection method and system based on deep learning
CN114385491A (en) * 2021-12-30 2022-04-22 大连理工大学 JS translator defect detection method based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170212829A1 (en) * 2016-01-21 2017-07-27 American Software Safety Reliability Company Deep Learning Source Code Analyzer and Repairer
CN111367815A (en) * 2020-03-24 2020-07-03 中国电子科技网络信息安全有限公司 Man-machine cooperation based software vulnerability fuzzy test method
CN113157565A (en) * 2021-03-23 2021-07-23 西北大学 Feedback type JS engine fuzzy test method and device based on seed case mutation
CN114385491A (en) * 2021-12-30 2022-04-22 大连理工大学 JS translator defect detection method based on deep learning
CN114048464A (en) * 2022-01-12 2022-02-15 北京大学 Ether house intelligent contract security vulnerability detection method and system based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SUYOUNG LEE 等: "Montage: A neural network language model-guided javascript engine fuzzer", 《SEC\'20: PROCEEDINGS OF THE 29TH USENIX CONFERENCE ON SECURITY SYMPOSIUM》, pages 2613 *
刘文倩: "基于深度学习和代码覆盖引导的模糊测试技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, pages 139 - 72 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117762415A (en) * 2023-11-17 2024-03-26 北京计算机技术及应用研究所 Fuzzy test-oriented abstract syntax tree mutation method

Similar Documents

Publication Publication Date Title
Brockschmidt et al. Generative code modeling with graphs
US11775414B2 (en) Automated bug fixing using deep learning
CN112541180B (en) Software security vulnerability detection method based on grammatical features and semantic features
WO2021231007A1 (en) Automated program repair tool
CN112215013B (en) Clone code semantic detection method based on deep learning
CN111694746A (en) Flash defect fuzzy evaluation tool for compilation type language AS3
CN114692600B (en) Method and system for formal language processing using subroutine graph
CN115309451A (en) Code clone detection method, device, equipment, storage medium and program product
JP4951416B2 (en) Program verification method and program verification apparatus
CN115858405A (en) Grammar perception fuzzy test method and system for code test
CN108563561B (en) Program implicit constraint extraction method and system
CN117215935A (en) Software defect prediction method based on multidimensional code joint graph representation
CN113238937B (en) Compiler fuzzy test method based on code compaction and false alarm filtering
CN114265772A (en) Test case generation method and test method
CN115794119B (en) Case automatic analysis method and device
CN114153447B (en) Automatic AI training code generation method
CN116069337A (en) Code defect automatic repair method combining repair template and deep learning
CN115310095A (en) Block chain intelligent contract mixed formal verification method and system
Nguyen et al. Using topic model to suggest fine-grained source code changes
CN115774558B (en) Automatic test case expansion method and device
CN116991459B (en) Software multi-defect information prediction method and system
Xiong et al. A Multi-code Representation Fusion Smart Contract Vulnerability Line Detection Method Based on Graph Neural Network
Kukluk Inference of node and edge replacement graph grammars
CN115879868B (en) Expert system and deep learning integrated intelligent contract security audit method
CN116755662B (en) Method and system for generating application development security requirements

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20230328

RJ01 Rejection of invention patent application after publication