CN116701235B - Fuzzy test method based on grammar correct variation and semantic effective instantiation - Google Patents

Fuzzy test method based on grammar correct variation and semantic effective instantiation Download PDF

Info

Publication number
CN116701235B
CN116701235B CN202310976573.7A CN202310976573A CN116701235B CN 116701235 B CN116701235 B CN 116701235B CN 202310976573 A CN202310976573 A CN 202310976573A CN 116701235 B CN116701235 B CN 116701235B
Authority
CN
China
Prior art keywords
grammar
tree
group
terminal
sql
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310976573.7A
Other languages
Chinese (zh)
Other versions
CN116701235A (en
Inventor
汪毅
贾鹏
郭嵩
李晓冉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Anban Information Technology Co ltd
Original Assignee
Shanghai Anban Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Anban Information Technology Co ltd filed Critical Shanghai Anban Information Technology Co ltd
Priority to CN202310976573.7A priority Critical patent/CN116701235B/en
Publication of CN116701235A publication Critical patent/CN116701235A/en
Application granted granted Critical
Publication of CN116701235B publication Critical patent/CN116701235B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3676Test management for coverage analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1433Vulnerability analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The invention provides a fuzzy test method and a fuzzy test device based on grammar correct variation and semantic effective instantiation, comprising the following steps: importing grammar information, a first group of SQL sentences and a second group of SQL sentences; parsing the first set of SQL statements into an abstract syntax tree and converting into a first IR tree using bison; traversing all nodes in the first IR tree, and carrying out grammar correct mutation on each node by using grammar information; assigning the correct value to the child node of the first IR tree to obtain a second IR tree and converting to a third set of SQL statements; and inputting the third group of SQL sentences into the DBMS, and if the DBMS crashes or terminates operation, retaining the corresponding SQL sentences. According to the fuzzy test method and device based on grammar correct variation and semantic effective instantiation, the efficiency of the fuzzy test is greatly improved through SQL sentences with grammar correct variation and semantic effective instantiations.

Description

Fuzzy test method based on grammar correct variation and semantic effective instantiation
Technical Field
The embodiment of the invention relates to the technical field of fuzzy test, in particular to a fuzzy test method based on grammar correct variation and semantic effective instantiation.
Background
A database management system (Database Management System, DBMS) is a large piece of software that manipulates and manages databases for creating, using, and maintaining databases. Most enterprises use a DBMS to manage various data thereof and provide various services to clients through the use of the DBMS, so that the DBMS is one of the most important software in the field of information technology and occupies a vital position in the field of information technology. The structured query language (Structured Query Language, SQL) is a special purpose programming language for accessing data and querying, updating and managing relational database systems. Using a DBMS with SQL statements is the most common method used by users when using a DBMS. Like other complex systems, many undiscovered vulnerabilities often exist within the DBMS. These holes are light and can lead to program crashes, while heavy and can lead to serious problems such as rights improvement, unauthorized access, and arbitrary code execution, which can have serious consequences if used by an attacker. The fuzzy test technology is a technology for locating the position of a program bug by continuously inputting randomly generated test cases into the program and finding out the input which triggers the program bug according to the running condition of the program. Due to their high efficiency, automation, etc., they have become widely used in various fields. Compared with the traditional file type fuzzy test, the fuzzy test aiming at the DBMS needs to be capable of generating input with correct grammar and correct semantics, so that the generated SQL sentence can be covered to deeper codes through grammar checking and semantic checking in a database. The existing DBMS fuzzy test technology lacks a necessary guiding mechanism, and can only generate SQL sentences through blind random mutation, so that deeper codes are difficult to explore, and in the mutation process, SQL sentences with wrong grammar or invalid semantics are likely to be generated, and cannot pass through the most basic examination of the DBMS, so that deeper codes cannot be explored, and the efficiency of fuzzy test is seriously reduced.
At present, with the development of information technology, various information technology products occupy important roles in people's production and life, whether the vulnerability is utilized to launch DoS attack on the DBMS to prevent normal use of the DBMS or the vulnerability is utilized to raise authority to steal user information, serious loss is caused to enterprises and users, and the DBMS security problem becomes a critical problem.
Therefore, there is a need to provide a fuzzy test method based on grammar correct mutation and semantic effective instantiation, which can effectively solve the above problems.
Disclosure of Invention
The invention provides a fuzzy test method and a fuzzy test device based on grammar correct variation and semantic effective instantiation, which greatly improve the efficiency of fuzzy test through SQL sentences and semantic effective instantiations of grammar correct variation.
The embodiment of the invention provides a fuzzy test method based on grammar correct variation and semantic effective instantiation, which comprises the following steps:
in an initialization stage, importing grammar information, initial input and initial variation materials, wherein the initial input comprises a first group of SQL sentences, and the initial variation materials comprise a second group of SQL sentences;
in the analysis stage, the first group of SQL sentences are analyzed into abstract syntax trees by using a bison, the node types of the abstract syntax trees are recorded, the abstract syntax trees are converted into first IR trees, and the first IR trees have a binary tree structure;
a mutation stage, namely traversing all nodes in the first IR tree, and carrying out grammar correct mutation on each node by using the grammar information;
an instantiation stage of assigning correct values to child nodes of the first IR tree to obtain a second IR tree, converting the second IR tree into a third set of SQL statements;
and in the verification stage, inputting the third group of SQL sentences into a DBMS, and if the DBMS crashes or terminates operation, reserving the corresponding SQL sentences.
Preferably, the syntax information is extracted by:
performing normal form conversion on the context-free grammar, wherein each generated form in the context-free grammar after the normal form conversion comprises a plurality of non-terminal symbols and a plurality of terminal symbols;
and encoding the context-free grammar after the paradigm conversion, wherein the plurality of non-terminal symbols and the plurality of terminal symbols respectively use a first group number and a second group number, and the first group number and the second group number are different.
Preferably, the odd number in each of the generation formulas represents a terminal, and the even number in each of the generation formulas represents a non-terminal.
Preferably, if the number of symbols in the generated formula is not the same, a preset number is used as the placeholder.
Preferably, when numbering the terminal, the non-terminal corresponding to the number is stored in the form of a character string in an array, the index of the array is the number of the non-terminal, and the array is a component of the grammar information.
Preferably, the assigning of the correct value to the child nodes of the first IR tree includes deriving a dependency relationship between the child nodes of the first IR tree, and assigning the correct value to the child nodes of the first IR tree according to the dependency relationship.
Preferably, the performing the grammar correct mutation on each node by using the grammar information includes querying a corresponding item in the grammar information according to the number of the node, and if a plurality of items are corresponding, randomly selecting one of the items.
Preferably, the corresponding entry includes a terminal number and a non-terminal number, for the terminal Fu Bianhao, the terminal is assigned to a child node of the corresponding position according to the odd-numbered position, for the non-terminal Fu Bianhao, a child node of the first IR tree having the same number as the terminal is randomly selected in the second set of SQL statements according to the number, and the child node of the first IR tree having the same number is used for replacement.
Preferably, the DBMS crashes or terminates running and judges the execution information returned by the C++ interface provided by the DBMS;
if the DBMS crashes or terminates operation, the corresponding SQL statement is reserved;
if the DBMS does not crash or terminate operation, judging whether the coverage rate is increased, and if the coverage rate is not increased, discarding the corresponding SQL statement;
if the coverage rate is increased, judging whether grammar or semantic errors occur, discarding the corresponding SQL statement if the grammar or semantic errors occur, otherwise, reserving the corresponding SQL statement.
The embodiment of the invention also provides a fuzzy test device based on grammar correct variation and semantic effective instantiation, which comprises:
the initialization module is used for importing grammar information, initial input and initial variation materials, wherein the initial input comprises a first group of SQL sentences, and the initial variation materials comprise a second group of SQL sentences;
the analysis module is used for analyzing the first group of SQL sentences into abstract syntax trees by using a bison, recording node types of the abstract syntax trees, and converting the abstract syntax trees into first IR trees, wherein the first IR trees have a binary tree structure;
the mutation module is used for traversing all nodes in the first IR tree and carrying out grammar correct mutation on each node by using the grammar information;
an instantiation module for assigning correct values to child nodes of the first IR tree to obtain a second IR tree, converting the second IR tree into a third set of SQL statements;
and the checking module is used for inputting the third group of SQL sentences into the DBMS, and if the DBMS crashes or terminates operation, the corresponding SQL sentences are reserved.
Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:
the fuzzy test method and device based on grammar correct variation and semantic effective instantiation in the embodiment of the invention comprise the following steps: in an initialization stage, importing grammar information, initial input and initial variation materials, wherein the initial input comprises a first group of SQL sentences, and the initial variation materials comprise a second group of SQL sentences; in the analysis stage, the first group of SQL sentences are analyzed into abstract syntax trees by using a bison, the node types of the abstract syntax trees are recorded, the abstract syntax trees are converted into first IR trees, and the first IR trees have a binary tree structure; a mutation stage, namely traversing all nodes in the first IR tree, and carrying out grammar correct mutation on each node by using the grammar information; an instantiation stage of assigning correct values to child nodes of the first IR tree to obtain a second IR tree, converting the second IR tree into a third set of SQL statements; in the checking stage, inputting the third group of SQL sentences into a DBMS, if the DBMS crashes or terminates operation, retaining the corresponding SQL sentences, and adding a grammar information guiding mutation method to ensure that the corresponding SQL sentences obtained by mutation are always grammatically correct, thereby improving the efficiency of fuzzy test;
further, more complex static deduction rules are used and necessary adjustment is carried out on the IR tree structure, so that semantic effective SQL sentences are generated with higher probability during instantiation, and the efficiency of fuzzy test is further improved;
further, the DBMS crashes or terminates running and judges the execution information returned by the C++ interface provided by the DBMS; if the DBMS crashes or terminates operation, the corresponding SQL statement is reserved; if the DBMS does not crash or terminate operation, judging whether the coverage rate is increased, and if the coverage rate is not increased, discarding the corresponding SQL statement; if the coverage rate is increased, judging whether grammar or semantic errors occur, if the grammar or semantic errors occur, discarding the corresponding SQL statement, otherwise, reserving the corresponding SQL statement, and due to the fact that a screening mechanism based on the correctness of the SQL statement is added, discarding the wrong SQL statement in advance, probability of generating the correct SQL statement is increased, and further efficiency of fuzzy test is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the prior art, a brief description of the drawings is provided below, wherein it is apparent that the drawings in the following description are some, but not all, embodiments of the present invention. Other figures may be derived from these figures without inventive effort for a person of ordinary skill in the art.
FIG. 1 is a flow chart of a fuzzy test method based on grammar correct variation and semantic valid instantiation according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a fuzzy test method based on grammar-based correct mutation and semantic valid instantiation according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a method for processing grammar information based on a grammar correct mutation and a semantic valid instantiation of a fuzzy test method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a parsing stage of a fuzzy test method based on grammar correct variation and semantic valid instantiation according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a mutation phase specific process of a fuzzy test method based on grammar correct mutation and semantic valid instantiation according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a verification phase concrete process of a fuzzy test method based on grammar correct variation and semantic valid instantiation according to an embodiment of the present invention;
fig. 7 is a schematic block diagram of a fuzzy testing apparatus based on grammar correct mutation and semantic valid instantiation according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.
Based on the problems existing in the prior art, the invention provides a fuzzy test method and a fuzzy test device based on grammar correct variation and semantic effective instantiation, and the efficiency of the fuzzy test is greatly improved through SQL sentences with the grammar correct variation and semantic effective instantiations.
FIG. 1 is a flow chart of a fuzzy test method based on grammar correct variation and semantic valid instantiation according to an embodiment of the present invention; fig. 2 is another flow chart of a fuzzy test method based on grammar correct mutation and semantic valid instantiation according to an embodiment of the present invention. Referring now to fig. 1 and 2, an embodiment of the present invention provides a fuzzy test method based on grammar correct variation and semantic valid instantiation, including:
step S101: in an initialization stage, importing grammar information, initial input and initial variation materials, wherein the initial input comprises a first group of SQL sentences, and the initial variation materials comprise a second group of SQL sentences;
step S102: in the analysis stage, the first group of SQL sentences are analyzed into abstract syntax trees by using a bison, the node types of the abstract syntax trees are recorded, the abstract syntax trees are converted into first IR trees, and the first IR trees have a binary tree structure;
step S103: a mutation stage, namely traversing all nodes in the first IR tree, and carrying out grammar correct mutation on each node by using the grammar information;
step S104: an instantiation stage of assigning correct values to child nodes of the first IR tree to obtain a second IR tree, converting the second IR tree into a third set of SQL statements;
step S105: and in the verification stage, inputting the third group of SQL sentences into a DBMS, and if the DBMS crashes or terminates operation, reserving the corresponding SQL sentences.
In a specific implementation, the syntax information is extracted by:
performing normal form conversion on the context-free grammar, wherein each generated form in the context-free grammar after the normal form conversion comprises a plurality of non-terminal symbols and a plurality of terminal symbols;
and encoding the context-free grammar after the paradigm conversion, wherein the plurality of non-terminal symbols and the plurality of terminal symbols respectively use a first group number and a second group number, and the first group number and the second group number are different.
In a specific implementation, the generating formulae are part of a context-free grammar for describing the grammar of the SQL statement, the odd numbers in each generating formula representing the terminators, and the even numbers in each generating formula representing the non-terminators.
In a specific implementation, if the number of symbols in the generated formula is not the same, a preset number is used as the placeholder.
In a specific implementation, when numbering the terminal, the non-terminal corresponding to the number is stored in an array in the form of a character string, the index of the array is the number of the non-terminal, and the array is a component part of the grammar information.
In a specific implementation, the assigning the correct value to the child nodes of the first IR tree includes deriving a dependency relationship between the child nodes of the first IR tree, and assigning the correct value to the child nodes of the first IR tree according to the dependency relationship. Wherein, the child node refers to any node in the first IR tree, and all nodes extending from the node are child nodes of the node.
In a specific implementation, the performing the grammar correct mutation on each node by using the grammar information includes querying a corresponding item in the grammar information according to the number of the node, and if a plurality of corresponding items exist, randomly selecting one of the corresponding items. Specifically, the generation formula needs to be encoded and then changed into an item, and the information in the generation formula cannot be directly used in the mutation stage, so that the information can be used in the mutation stage after being encoded and changed into the item.
In a specific implementation, the corresponding entry includes a terminal number and a non-terminal number, for the terminal Fu Bianhao, the terminal is assigned to a child node of the corresponding position according to the odd-numbered position, for the non-terminal Fu Bianhao, a child node of the first IR tree having the same number as the terminal is randomly selected in the second set of SQL statements according to the number, and the child node of the first IR tree having the same number is used for replacement.
In specific implementation, the DBMS crashes or terminates running and judges the execution information returned by the C++ interface provided by the DBMS;
if the DBMS crashes or terminates operation, the corresponding SQL statement is reserved;
if the DBMS does not crash or terminate operation, judging whether the coverage rate is increased, and if the coverage rate is not increased, discarding the corresponding SQL statement;
if the coverage rate is increased, judging whether grammar or semantic errors occur, discarding the corresponding SQL statement if the grammar or semantic errors occur, otherwise, reserving the corresponding SQL statement.
In particular implementations, a fuzzy test method based on grammar correct mutation and semantic effective instantiation includes an initialization stage, a parsing stage, a mutation stage, an instantiation stage and a verification stage. Adjacent stages have a progressive relationship, and the output of each stage will participate in the processes as input to the next stage. The main process of implementation is as follows: firstly, initializing a program, and importing grammar information, initial sentences and initial variant materials. In the parsing stage, a first group of SQL sentences is selected from the seed set for grammar parsing, node types of AST (Abstract Syntax Tree, abstract grammar tree) are marked in the parsing process, after AST is obtained, AST is converted into a first IR tree, and the first IR tree enters the mutation stage. In the mutation stage, the program selects a node in the first IR tree, and selects the first IR tree from the second group of SQL sentences to replace a subtree of the node according to the imported grammar information to obtain a second IR tree. In the instantiation stage, the program adjusts the structure of the second IR tree according to the type of the node in the second IR tree using a rule, and uses a preset rule to statically deduce the relation of all the child nodes in the IR tree, namely, the child nodes of the first IR tree are given with correct values to obtain the second IR tree, the second IR tree is converted into a third group of SQL sentences which are in the form of character strings. Finally, in the checking stage, a third group of SQL sentences in the form of character strings are input into the DBMS, the running state of the DBMS is monitored, and whether the loopholes are triggered or not and whether the corresponding SQL sentences are saved as new input or not are judged.
The main purpose of the initialization stage is to import syntax information, initial inputs and initial variant material into the program for later use. The initial input refers to a plurality of SQL sentences as initial input, and the initial input comprises a plurality of SQL sentences, namely a first group of SQL sentences, so that a complete context relationship is formed. To improve efficiency, these initial inputs are manually selected, and some SQL sentences with complex structures and some SQL sentences with complex context are generally used for combination, so that good initial inputs can increase the efficiency of the early stage of the fuzzy test process. The initial variant material refers to a group of SQL sentences, namely a second group of SQL sentences, which are analyzed and converted into a first IR tree in an initialization stage, stored in a variant material library and indexed according to node types. In the mutation stage, nodes are randomly selected from the second group of SQL statement libraries according to the node types, and the first IR tree is mutated, and the specific process is detailed in the mutation stage.
Fig. 3 is a schematic diagram of a method for processing syntax information based on a syntax correct mutation and a semantic valid instantiation fuzzy test method according to an embodiment of the present invention. Referring now to fig. 3, syntax information refers to information obtained by performing a normal form conversion and encoding process on the SQL syntax, and the information can be directly used for guiding mutation. The method is characterized in that SQL grammar information is initially defined in a context-free grammar mode, and is subjected to paradigm conversion to be converted into a form containing 2 non-terminal characters at most in each generation form. Finally, the non-terminal symbol and the terminal symbol are respectively subjected to digital coding by using a first group number and a second group number to generate grammar information, wherein each item is fixed to be 5 numbers, odd numbers, namely 1 st, 3 rd and 5 th bits, represent the terminal symbol numbers, even numbers, namely 2 nd and 4 th bits, represent the non-terminal symbol numbers, if the number of the terminal symbol or the non-terminal symbol is insufficient, the number occupation representing a null value is used, the first group number and the second group number are different, and different coding tables are respectively used. When numbering the terminal, the non-terminal corresponding to the number is stored in an array in the form of character string, the array index is the number of the non-terminal, and when converting the second IR tree into the third SQL sentence, the array is used for converting the non-terminal number into the character string of the terminal.
Fig. 4 is a schematic diagram of a specific process of an parsing stage of a fuzzy test method based on grammar correct mutation and semantic valid instantiation according to an embodiment of the present invention. Referring now to FIG. 4, the primary purpose of the parse stage is to convert a first set of SQL statements in text form into a first IR tree format for use by the next stage. The first part is to parse the first set of SQL sentences in text form into AST, and specifically, the first set of SQL sentences can be parsed by using bison, which is an open source software for automatically generating a parser program. The parser generated by bison may parse languages using context-free grammar and execute user-defined code at the time of the reduction, which marks the non-terminal node type mainly with a number that is consistent with the number used by the non-terminal in the grammar information. The second part is to convert the AST into a first IR tree, which modifies the AST structure, limiting the AST child node number to 5, from left to right nodes 1, 3, 5 representing terminators and nodes 2, 4 representing non-terminators, wherein if the number of terminators or non-terminators is insufficient, empty nodes are used instead, which structure is consistent with the entry structure in the syntax information. The specific non-terminal and terminal are then replaced with numbers, wherein the numbers of the non-terminal and terminal are the same as the numbers in the syntax information.
Fig. 5 is a schematic diagram of a mutation stage specific process of a fuzzy test method based on grammar correct mutation and semantic valid instantiation according to an embodiment of the present invention. Referring now to fig. 5, the primary purpose of the mutation phase is to traverse all nodes in the first IR tree, using the syntax information for each node to make a syntactically correct mutation. For each selected node, the mutation process is divided into two steps, wherein the first step is to query the corresponding item in the grammar information according to the number of the node, and randomly select a guide mutation from the corresponding item. And modifying the child node of the node according to the entry, simply replacing the numbers recorded in the nodes for the node representing the terminal, randomly selecting an IR node with the same number in the variant material library for the node representing the non-terminal, and replacing the IR tree with the modified node as the root node by using the IR tree with the node as the root node. Since SQL syntax information is used in mutation, the second IR tree obtained in this way must be syntactically correct after conversion to the third set of SQL statements.
The main purpose of the instantiation stage is to adjust the structure of the second IR tree and assign a value to the second IR leaf node so that it can be converted into a syntactically correct and semantically valid SQL statement, after which the second IR tree is converted into a third set of SQL statements for use in the next stage. The adjustment and assignment are mainly realized through a series of manually written rules, firstly, different rules are selected according to the types of SQL sentences, the selected rules are used for checking the nodes of the second IR tree, the nodes which do not accord with the rules are adjusted, then, the dependency relationships among the leaf nodes of the second IR tree are statically deduced according to the rules, and assignment is carried out on the leaf nodes according to the deduced dependency relationships. And finally, traversing the second IR tree to obtain a corresponding third group of SQL sentences, wherein the specific method is to traverse the second IR tree by using a left-to-right depth-first method, when the node representing the terminal is reached, acquiring the character string form of the terminal from the character string array recorded with all the terminals according to the number, splicing the character string form to an output character string, splicing the node value of the character string form to the output character string after the node of the second IR tree is reached, and continuing traversing when the character string representing the non-terminal is reached. Where a leaf node refers to a node without child nodes.
FIG. 6 is a schematic diagram of a verification phase concrete process of a fuzzy test method based on grammar correct variation and semantic valid instantiation according to an embodiment of the present invention. Referring now to FIG. 6, the primary purpose of the verification stage is to input SQL statements into the DBMS, and to process the input SQL statements according to the results fed back by the DBMS. Inputting a third group of SQL sentences into the DBMS, monitoring the running state of the DBMS and collecting coverage rate information, and if the DBMS crashes or terminates running, storing the SQL sentences for subsequent manual analysis. If no crash occurs or the operation is terminated, checking whether the coverage rate of the SQL statement input at this time is increased, and if the coverage rate is not increased, discarding the SQL statement. If the coverage rate is increased, checking whether the SQL statement input at this time is grammatically correct and semantically valid, and if not, discarding the SQL statement. If the statement is correct, adding the statement into a seed subset, using the SQL statement for mutation in the future, adding the statement into a mutation material library, and adding a corresponding index for later mutation. Finally, select the next SQL statement from the seed set and begin the next round of testing.
Fig. 7 is a schematic block diagram of a fuzzy testing apparatus based on grammar correct mutation and semantic valid instantiation according to an embodiment of the present invention. Referring now to fig. 7, an embodiment of the present invention further provides a ambiguity test apparatus based on grammar correct variation and semantic valid instantiation, including:
an initialization module 71 for importing grammar information, an initial input and an initial variant material, the initial input comprising a first set of SQL statements, the initial variant material comprising a second set of SQL statements;
a parsing module 72 for parsing the first set of SQL statements into an abstract syntax tree using bison, recording node types of the abstract syntax tree, converting the abstract syntax tree into a first IR tree, the first IR tree having a binary tree structure;
a mutation module 73, configured to traverse all nodes in the first IR tree, and perform a grammar-correct mutation on each node using the grammar information;
an instantiation module 74 for assigning correct values to child nodes of the first IR tree to obtain a second IR tree, converting the second IR tree into a third set of SQL statements;
a checking module 75, configured to input the third set of SQL statements into the DBMS, and if the DBMS crashes or terminates the operation, reserve the corresponding SQL statements.
In summary, the fuzzy test method and device based on grammar correct variation and semantic effective instantiation provided by the embodiment of the invention comprise the following steps: in an initialization stage, importing grammar information, initial input and initial variation materials, wherein the initial input comprises a first group of SQL sentences, and the initial variation materials comprise a second group of SQL sentences; in the analysis stage, the first group of SQL sentences are analyzed into abstract syntax trees by using a bison, the node types of the abstract syntax trees are recorded, the abstract syntax trees are converted into first IR trees, and the first IR trees have a binary tree structure; a mutation stage, namely traversing all nodes in the first IR tree, and carrying out grammar correct mutation on each node by using the grammar information; an instantiation stage of assigning correct values to child nodes of the first IR tree to obtain a second IR tree, converting the second IR tree into a third set of SQL statements; in the checking stage, the third group of SQL sentences are input into a DBMS, if the DBMS crashes or terminates operation, the corresponding SQL sentences are reserved, and due to the addition of a grammar information guiding mutation method, the mutated SQL sentences are always grammatically correct, so that the efficiency of fuzzy test is improved;
further, more complex static deduction rules are used and necessary adjustment is carried out on the IR tree structure, so that semantic effective SQL sentences are generated with higher probability during instantiation, and the efficiency of fuzzy test is further improved;
further, the DBMS crashes or terminates running and judges the execution information returned by the C++ interface provided by the DBMS; if the DBMS crashes or terminates operation, the corresponding SQL statement is reserved; if the DBMS does not crash or terminate operation, judging whether the coverage rate is increased, and if the coverage rate is not increased, discarding the corresponding SQL statement; if the coverage rate is increased, judging whether grammar or semantic errors occur, if the grammar or semantic errors occur, discarding the corresponding SQL statement, otherwise, reserving the corresponding SQL statement, and due to the fact that a screening mechanism based on the correctness of the SQL statement is added, discarding the wrong SQL statement in advance, probability of generating the correct SQL statement is increased, and further efficiency of fuzzy test is improved.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (8)

1. A fuzzy test method based on grammar correct variation and semantic effective instantiation, comprising:
in an initialization stage, importing grammar information, initial input and initial variation materials, wherein the initial input comprises a first group of SQL sentences, and the initial variation materials comprise a second group of SQL sentences;
in the analysis stage, the first group of SQL sentences are analyzed into abstract syntax trees by using a bison, the node types of the abstract syntax trees are recorded, the abstract syntax trees are converted into first IR trees, and the first IR trees have a binary tree structure;
a mutation stage, namely traversing all nodes in the first IR tree, and carrying out grammar correct mutation on each node by using the grammar information;
an instantiation stage, namely giving a correct value to the child nodes of the first IR tree to obtain a second IR tree, deducing a dependency relationship among the child nodes of the first IR tree, giving the correct value to the child nodes of the first IR tree according to the dependency relationship, and converting the second IR tree into a third group of SQL sentences;
a verification stage, inputting the third group of SQL sentences into a DBMS, and if the DBMS crashes or terminates operation, reserving the corresponding SQL sentences;
the syntax information is extracted by:
performing normal form conversion on the context-free grammar, wherein each generated form in the context-free grammar after the normal form conversion comprises a plurality of non-terminal symbols and a plurality of terminal symbols;
and encoding the context-free grammar after the paradigm conversion, wherein the plurality of non-terminal symbols and the plurality of terminal symbols respectively use a first group number and a second group number, and the first group number and the second group number are different.
2. The fuzzy test method of claim 1, wherein the odd number in each of the generated formulas represents a terminal, and the even number in each of the generated formulas represents a non-terminal.
3. The fuzzy test method of claim 2, wherein if the number of symbols in the generated formula is not the same, a predetermined number is used as a placeholder.
4. The fuzzy test method of claim 1, wherein when numbering the terminal, storing the non-terminal corresponding to the number in the form of a character string, wherein the index of the array is the number of the non-terminal, and the array is a component of the grammar information.
5. The fuzzy test method of claim 2, wherein said syntactically correct mutation of each node using said syntactical information includes querying a corresponding entry in said syntactical information according to the number of the node, and if there are a plurality of said corresponding entries, randomly selecting one of them.
6. The ambiguity test method of claim 5 wherein the corresponding entry includes a terminal number and a non-terminal number, wherein for the terminal Fu Bianhao, the child nodes of the corresponding position are assigned to the odd-numbered positions, and wherein for the non-terminal Fu Bianhao, the child nodes of the first IR tree having the same number as the node are randomly selected in the second set of SQL statements according to the number, and the child nodes of the first IR tree having the same number are used for replacement.
7. The fuzzy test method based on grammar correct variation and semantic valid instantiation of claim 1, wherein said DBMS crashes or terminates running and determines the execution information returned by the c++ interface provided by said DBMS;
if the DBMS crashes or terminates operation, the corresponding SQL statement is reserved;
if the DBMS does not crash or terminate operation, judging whether the coverage rate is increased, and if the coverage rate is not increased, discarding the corresponding SQL statement;
if the coverage rate is increased, judging whether grammar or semantic errors occur, discarding the corresponding SQL statement if the grammar or semantic errors occur, otherwise, reserving the corresponding SQL statement.
8. A fuzzy test apparatus based on grammar correct variation and semantic valid instantiation, comprising:
the initialization module is used for importing grammar information, initial input and initial variation materials, wherein the initial input comprises a first group of SQL sentences, and the initial variation materials comprise a second group of SQL sentences;
the analysis module is used for analyzing the first group of SQL sentences into abstract syntax trees by using a bison, recording node types of the abstract syntax trees, and converting the abstract syntax trees into first IR trees, wherein the first IR trees have a binary tree structure;
the mutation module is used for traversing all nodes in the first IR tree and carrying out grammar correct mutation on each node by using the grammar information;
an instantiation module, configured to assign a correct value to child nodes of the first IR tree to obtain a second IR tree, derive a dependency relationship between child nodes of the first IR tree, assign a correct value to child nodes of the first IR tree according to the dependency relationship, and convert the second IR tree into a third set of SQL statements;
the checking module is used for inputting the third group of SQL sentences into the DBMS, and if the DBMS crashes or terminates operation, the corresponding SQL sentences are reserved;
the syntax information is extracted by:
performing normal form conversion on the context-free grammar, wherein each generated form in the context-free grammar after the normal form conversion comprises a plurality of non-terminal symbols and a plurality of terminal symbols;
and encoding the context-free grammar after the paradigm conversion, wherein the plurality of non-terminal symbols and the plurality of terminal symbols respectively use a first group number and a second group number, and the first group number and the second group number are different.
CN202310976573.7A 2023-08-04 2023-08-04 Fuzzy test method based on grammar correct variation and semantic effective instantiation Active CN116701235B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310976573.7A CN116701235B (en) 2023-08-04 2023-08-04 Fuzzy test method based on grammar correct variation and semantic effective instantiation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310976573.7A CN116701235B (en) 2023-08-04 2023-08-04 Fuzzy test method based on grammar correct variation and semantic effective instantiation

Publications (2)

Publication Number Publication Date
CN116701235A CN116701235A (en) 2023-09-05
CN116701235B true CN116701235B (en) 2023-10-31

Family

ID=87826161

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310976573.7A Active CN116701235B (en) 2023-08-04 2023-08-04 Fuzzy test method based on grammar correct variation and semantic effective instantiation

Country Status (1)

Country Link
CN (1) CN116701235B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105335366A (en) * 2014-05-30 2016-02-17 北大方正信息产业集团有限公司 SQL statement processing method and apparatus and server
US10114624B1 (en) * 2017-10-12 2018-10-30 Devfactory Fz-Llc Blackbox matching engine
CN113238937A (en) * 2021-05-11 2021-08-10 西北大学 Compiler fuzzy test method based on code compaction and false alarm filtering
CN113961930A (en) * 2021-10-19 2022-01-21 北京天融信网络安全技术有限公司 SQL injection vulnerability detection method and device and electronic equipment
CN114490353A (en) * 2022-01-06 2022-05-13 清华大学 Database management system fuzzy test method and device and electronic equipment
CN115237760A (en) * 2022-07-08 2022-10-25 中国人民解放军战略支援部队信息工程大学 JavaScript engine directional fuzzy test method and system based on natural language processing
CN116126830A (en) * 2023-02-16 2023-05-16 厦门大学 Method and device for detecting logic defects of database management system and readable medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10705943B2 (en) * 2017-09-08 2020-07-07 Devfactory Innovations Fz-Llc Automating identification of test cases for library suggestion models
US11151018B2 (en) * 2018-04-13 2021-10-19 Baidu Usa Llc Method and apparatus for testing a code file

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105335366A (en) * 2014-05-30 2016-02-17 北大方正信息产业集团有限公司 SQL statement processing method and apparatus and server
US10114624B1 (en) * 2017-10-12 2018-10-30 Devfactory Fz-Llc Blackbox matching engine
CN113238937A (en) * 2021-05-11 2021-08-10 西北大学 Compiler fuzzy test method based on code compaction and false alarm filtering
CN113961930A (en) * 2021-10-19 2022-01-21 北京天融信网络安全技术有限公司 SQL injection vulnerability detection method and device and electronic equipment
CN114490353A (en) * 2022-01-06 2022-05-13 清华大学 Database management system fuzzy test method and device and electronic equipment
CN115237760A (en) * 2022-07-08 2022-10-25 中国人民解放军战略支援部队信息工程大学 JavaScript engine directional fuzzy test method and system based on natural language processing
CN116126830A (en) * 2023-02-16 2023-05-16 厦门大学 Method and device for detecting logic defects of database management system and readable medium

Also Published As

Publication number Publication date
CN116701235A (en) 2023-09-05

Similar Documents

Publication Publication Date Title
CN109445834B (en) Program code similarity rapid comparison method based on abstract syntax tree
Fan et al. On XML integrity constraints in the presence of DTDs
Bex et al. Inference of concise regular expressions and DTDs
Wagner et al. Efficient and flexible incremental parsing
Gottlob et al. The complexity of XPath query evaluation and XML typing
US20130055223A1 (en) Compiler with Error Handling
CN111475525A (en) Desensitization method based on structured query language and related equipment thereof
US11768831B2 (en) Systems and methods for translating natural language queries into a constrained domain-specific language
Lasser et al. A verified LL (1) parser generator
US20090012980A1 (en) Efficient Query Processing Of Merge Statement
CN115630368A (en) Java vulnerability classification method based on natural language processing and deep forest
US8832155B2 (en) Minimizing database repros using language grammars
Lasser et al. CoStar: a verified ALL (*) parser
US10416971B2 (en) Method of creating the balanced parse tree having optimized height
CN116701235B (en) Fuzzy test method based on grammar correct variation and semantic effective instantiation
Yan et al. Structurized grammar‐based fuzz testing for programs with highly structured inputs
CN116149669B (en) Binary file-based software component analysis method, binary file-based software component analysis device and binary file-based medium
CN116842042A (en) Universal method, device, electronic equipment and storage medium for dissimilating database
CN116467047A (en) Method and device for detecting container configuration compliance, storage medium and terminal
EP1672541A1 (en) System and method for composition of mappings given by dependencies
De et al. Canonical proof-objects for coinductive programming: infinets with infinitely many cuts
Beedkar et al. A unified framework for frequent sequence mining with subsequence constraints
CN111124422A (en) EOS intelligent contract language conversion method based on abstract syntax tree
Wang et al. Learning program representations with a tree-structured transformer
AT&T

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant