CN116932389A - Solver defect detection method based on large pre-training language model - Google Patents

Solver defect detection method based on large pre-training language model Download PDF

Info

Publication number
CN116932389A
CN116932389A CN202310869091.1A CN202310869091A CN116932389A CN 116932389 A CN116932389 A CN 116932389A CN 202310869091 A CN202310869091 A CN 202310869091A CN 116932389 A CN116932389 A CN 116932389A
Authority
CN
China
Prior art keywords
solver
model
smt
defects
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310869091.1A
Other languages
Chinese (zh)
Inventor
杨已彪
孙茂林
许沂聪
卢红敏
周毓明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202310869091.1A priority Critical patent/CN116932389A/en
Publication of CN116932389A publication Critical patent/CN116932389A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a solver defect detection method based on a large pre-training language model. The method mainly comprises the following steps: firstly, carrying out data amplification on formulas in a solver standard library and formulas of historical trigger defects to obtain a training set; secondly, performing customized training on the pre-training large model by using a 'retraining-fine tuning' framework based on a training set so as to generate test input of a solver; and finally, generating a solver test case by using the model obtained through training, and verifying a plurality of solvers by using a differential test. The method solves the key challenges that the test cases are difficult to generate efficiently and the diversity test inputs are difficult to generate in the defect detection of the solver. The retraining-fine tuning framework provided by the invention can learn knowledge in a solver standard library and historical defect use cases by utilizing a pre-trained large language model, so that legal, effective and high-fault-uncovering test input can be generated. The invention provides a brand new solution for solver defect detection.

Description

Solver defect detection method based on large pre-training language model
Technical Field
The invention relates to the field of software defect detection, in particular to defect detection of an SMT solver, which is a defect detection method of the SMT solver based on a large pre-training language model.
Background
SMT (Satisfiability Modulo Theories, satisfiability modulo theory) solver is an automated reasoning tool used to check the satisfiability of a logical formula. It has been applied to a number of important areas including software verification, test case generation, and program synthesis. However, hidden defects exist in SMT solvers, which can cause erroneous results in these areas to have serious impact. It is therefore important to ensure the reliability and robustness of SMT solvers. While many testing methods have been proposed for SMT solvers, generating efficient test formulas to comprehensively test SMT solvers remains an important challenge. In order to solve the problem, the invention provides a method for retraining-fine tuning a Large Pre-trained language model (Large Pre-trained Language model, LLM) to generate a Large number of test cases which can be input to an SMT solver, namely formulas to be solved, by utilizing the Large language model. The large pre-training language model is a deep learning-based natural language processing model that learns a representation of a language by self-supervised learning on a large corpus. Large language models typically employ a transducer architecture to learn a representation of an input sequence through a multi-layer self-attention mechanism. The advent of large pre-trained language models greatly improved the performance of natural language processing tasks because they could be fine-tuned on a large-scale pre-trained basis, avoiding the need for a de novo training process. The large pre-trained language model exhibits excellent results on various types of natural language processing tasks and code-related tasks. The invention uses a large pre-training language model in the test case generation of the SMT solver, and customizes, retrains and optimizes the pre-training model through the proposed retraining-fine tuning framework so as to be suitable for generating test input of the solver and further detect defects in the solver.
Disclosure of Invention
The invention provides a defect detection method of an SMT solver based on a pre-training language model, which aims to solve the key problem that high-quality and various test inputs are lack in the test of the SMT solver. And collecting different types of training data (including normal SMT formulas and cases for triggering historical defects), amplifying by adopting a specific data enhancement technology, and customizing and retraining a pre-training language model by using a retraining-fine tuning framework to enable the pre-training language model to have the capability of generating legal and effective SMT formulas, and finally, detecting the defects in an SMT solver by using the SMT formulas generated by the model as test input to improve the reliability and quality of the solver.
In order to detect defects in a solver, the invention discloses an SMT solver defect detection method based on a pre-training language model, which specifically comprises the following steps:
step 1, collecting a data set for model training;
step 2, amplifying the data set collected in the step 1 through a diversity-oriented mutation technology and a semantic maintenance mutation technology;
step 3, training the pre-training language model by using a 'retraining-fine tuning' framework and an enhanced data set;
step 4, generating an SMT formula by using the model obtained through training, and taking the SMT formula as a test input after instantiation;
and 5, resolving the generated test input by using different SMT solvers, recording the obtained output, and regarding inconsistent results obtained by the different SMT solvers on the same test input and crashes or memory errors generated by the solvers as potential defects.
In step 1, two types of SMT formulas are mainly considered when collecting the training data set, one is a formula in the solver test benchmark, and the other is a formula triggering the defects of the solver. The formulas in the solver test standard refer to SMT formulas which are high in quality, correct in semantics and represent the target field, and the formulas contain rich formula related knowledge, so that the models can learn the grammar and the semantics of the SMT formulas. The historical defect triggering use case contains key elements for triggering the defects of the solver, so that a model can be helped to learn how to generate a formula capable of effectively triggering the defects of the solver.
In step 2, in order to obtain better performance of the model, it is necessary to provide it with a training dataset with diversity and high quality, so that the data collected in step 1 will be amplified. Techniques used for data amplification include diversity-oriented mutation techniques and semantic maintenance mutation techniques. Aiming at formulas in a test benchmark, the diversity guide mutation technology carries out sub-formula mutation and operator mutation, so as to increase the diversity of training sets as much as possible. The objective of the semantic maintenance mutation technique is to increase the diversity of the data set while maintaining the triggering defect capability of the defect triggering case as much as possible. The two mutation strategies are respectively implemented on the two types of data, and a data set with higher quality is obtained to train the model.
In step 3, the pre-training model is retrained by using the formula in the amplified test standard, so that the capacity of generating a legal SMT formula is obtained. And then fine tuning the re-trained model by using the amplified historical defect triggering formula, so that the model can generate a formula which is easy to trigger the defects of the solver. When fine tuning is performed, only the parameters in the two fully connected layers of the model are updated.
In step 4, the obtained model is used to generate an SMT formula, and the output is processed and instantiated to obtain legal test input.
In step 5, the instantiated test cases are resolved by using different SMT solvers, the results given by the solvers are compared, and whether the differences exist in the results are checked, so that defects in the solvers are found. Here, defects in the solver that the method can detect mainly include the following three types: 1) Soundness defect: the solver produces erroneous decisions on the satisfaction of a formula; 2) Invalid model defects: the solver gives the correct satisfaction, but provides an erroneous model, i.e. a solution that cannot be satisfied by the formula; 3) And (3) crashing: an SMT solver occurrence assertion violation or other error causes the solver to terminate abnormally. If any one of the possible defects is found, the input triggering the defect (including the test case and corresponding command) and the corresponding output are saved for subsequent review.
The invention uses the pre-trained large language model as a generator of the testing input of the solver, and performs data amplification and customized training on the model by using a 'retraining-fine tuning' framework, so that the model has the capability of generating the testing input of the effective SMT solver, thereby being capable of detecting defects in the solver. The invention can generate a large number of test formulas with diversity by utilizing the pre-training model, solves the problem of lack of effective test input in the test of the solver, and can effectively improve the quality and reliability of the solver.
The beneficial effects are that: the method can effectively detect the defects of the SMT solver, and uses the large pre-training language model as a test input generator, so that the key challenge of lack of legal and effective test input in the test of the SMT solver is effectively solved, and a brand-new solution is provided for defect detection of the solver.
Drawings
The foregoing and other advantages of the invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings and detailed description.
FIG. 1 is a flow chart of a solver defect detection method based on a large pre-trained language model.
FIG. 2 is a flow chart for enhancing a data set and training a model.
FIG. 3 is a flow chart for differentially validating different SMT solvers and collecting corresponding results.
Detailed Description
For a clearer description of the objects, technical solutions and advantages of the present invention, the present invention will be described in further detail with reference to the accompanying drawings.
FIG. 1 shows a flowchart of a method for SMT solver defect detection energized by historical defect use cases of the invention, comprising 5 steps, the process being as follows:
step 1, collecting a data set for model training, wherein the data set comprises formulas in a solver test benchmark and solver historical defect use cases;
step 2, enhancing the collected data set, wherein the related technology comprises diversity guide variation and semantic maintenance variation;
step 3, training the model, including two stages of retraining and fine tuning;
step 4, generating test input by using the obtained model, and instantiating the test input;
step 5, the inconsistent results obtained by different SMT solvers for the same test input and the crashes or memory errors generated by the solvers are considered as potential defects.
FIG. 2 shows a flow chart of step 2 and step 3 for constructing a dataset and training a model from the SMT formulas collected in step 1, as follows:
in step 3-1, in order to implement data enhancement, the formulas in the solver test standard are subjected to diversity-oriented variation, including sub-formula variation and operator variation. And performing semantic maintenance mutation on the historical defect triggering formula. After this step is completed, step 3-2 is entered.
And 3-2, preprocessing the enhanced formula, and screening the formula unsuitable for serving as training data. And then retraining the model by taking the enhanced test benchmark formula as a training set. After this step is completed, step 3-3 is entered.
And 3-3, fine tuning the model trained in the step 3-2 by using the enhanced historical defect triggering formula to obtain a final test case generator.
FIG. 3 shows a flow chart of step 5 of performing differential analysis on different SMT solvers based on the test inputs generated in step 4, as follows:
and 5-1, solving the same test input by using a plurality of SMT solvers. The two solvers to be tested are respectively recorded as a first SMT solver and a second SMT solver. And (4) after the solution is finished, entering a step 4-2.
Step 5-2, recording the outputs of different solvers, respectively, including but not limited to the following information: the solving result of the test formula (returned in the form of standard output), error information (invalid model report or abort signal) of the solver, and the like are regarded as the output of the SMT solver. Step 5-1 is entered.
And 5-3, performing differential analysis according to the output of the solver obtained in the step 3-2. If the solver's solution results are the same and there is no anomaly information, the round of operation is ended and the process from step 4-1 is continued for the next test input. Otherwise, go to step 5-4.
Step 5-4, recording relevant information of the solver defect, including but not limited to the following information: the test inputs that trigger the defects, the commands used to enable the solvers, and the information returned by the different solvers. After this step is completed, the round of operation is ended and the process from step 4-1 is continued for the next test input.
The invention provides a defect detection method of an SMT solver energized by historical defect use cases, and particularly provides a method and a plurality of ways for realizing the technical scheme. The foregoing is only a preferred embodiment of the present invention. It should be noted that modifications and adaptations to the present invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be comprehended within the scope of the present invention. The components not explicitly described in this embodiment can be implemented by using the prior art.

Claims (6)

1. A solver defect detection method based on a large pre-training language model is characterized in that the solver is tested by using the large pre-training language model as a test case generator, new legal and effective test input is generated, and defects in the SMT solver are detected through differential verification. The SMT solver in differential verification here typically uses solvers of different implementations, but can also employ the same solver and apply different solving options. In other words, the method supports the verification of multiple solvers, and may also support the self-verification of a single solver. The method mainly comprises the following steps:
1) Model training dataset construction:
two types of SMT formulas are mainly considered when collecting training data sets, one is a formula in a solver test benchmark and the other is a formula that triggers a solver defect. The formulas in the solver test standard refer to SMT formulas which are high in quality, correct in semantics and represent the target field, and the formulas contain rich formula related knowledge, so that the models can learn the grammar and the semantics of the SMT formulas. The historical defect triggering use case contains key elements for triggering the defects of the solver, so that a model can be helped to learn how to generate a formula capable of effectively triggering the defects of the solver.
2) Data enhancement:
in order for the model to achieve better performance, it is necessary to provide it with a training dataset of diversity, high quality, so the data collected in step 1 will be amplified. Techniques used for data amplification include diversity-oriented mutation techniques and semantic maintenance mutation techniques. Aiming at formulas in a test benchmark, the diversity guide mutation technology carries out sub-formula mutation and operator mutation, so as to increase the diversity of training sets as much as possible. The objective of the semantic maintenance mutation technique is to increase the diversity of the data set while maintaining the triggering defect capability of the defect triggering case as much as possible. The two mutation strategies are respectively implemented on the two types of data, and a data set with higher quality is obtained to train the model.
3) "retraining-fine tuning" of a pre-trained model:
firstly, retraining the pre-trained model by using a formula in the amplified test standard to ensure that the pre-trained model has the capability of generating a legal SMT formula. And then fine tuning the re-trained model by using the amplified historical defect triggering formula, so that the model can generate a formula which is easy to trigger the defects of the solver. When fine tuning is performed, only the parameters in the two fully connected layers of the model are updated.
4) Differential verification compares the outputs of different SMT solvers:
the solution results of the different SMT solvers are compared to check if they differ, thereby finding defects in the solvers. Here, defects in the solver that the method can detect mainly include the following three types: 1) Soundness defect: the solver produces erroneous decisions on the satisfaction of a formula; 2) Invalid model defects: the solver gives the correct satisfaction, but provides an erroneous model, i.e. a solution that cannot be satisfied by the formula; 3) And (3) crashing: the occurrence of assertion violation (violation of assertions) or other errors by the SMT solver causes the solver to terminate abnormally. If one of the three types of defects is found, the solver is determined to have the defect.
2. A method for detecting defects in a large pre-trained language model based solver according to claim 1 characterized in that in step 1) the method supports the collection of SMT formulas in test benchmarks and defect use cases in defect trackers of commonly used SMT solvers. The test-reference case here is an official test reference provided by SMT-LIB. Whereas a common SMT solver comprises Z3, cvc5 and ies 2. The method can also be extended to use SMT formulas from other sources and defect use cases in other SMT solvers as training data.
3. The method for detecting defects in a solver based on a large pre-trained language model according to claim 1, wherein in step 2), the method defines diversity-oriented variations and semantic maintenance variations for data enhancement. The semantic maintenance mutation is realized by adopting the function of a solver Z3, and the data enhancement technology used by the method can be replaced by other strategies.
4. The method for detecting defects in a solver based on a large pre-trained language model according to claim 1, wherein in step 3), the pre-trained language model used in the method is GPT-2. Other pre-trained models may be substituted for the model, such as GPT-3.5, GPT-4, BERT, etc.
5. The method for detecting defects of a solver based on a large pre-training language model according to claim 1, wherein in step 3), the method trains the model by using a "retraining-fine tuning" framework to obtain a customized model as a solver test case generator. Other specific training strategies may also be employed to train the model.
6. The method for detecting defects in a solver based on a large pre-trained language model according to claim 1, wherein in step 4), the method mainly detects three types of defects in the solver: soundness defect, invalid model defect, crashes. Other types of defects in the solver, such as completeness defects, performance defects, etc., may also be of interest herein.
CN202310869091.1A 2023-07-14 2023-07-14 Solver defect detection method based on large pre-training language model Pending CN116932389A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310869091.1A CN116932389A (en) 2023-07-14 2023-07-14 Solver defect detection method based on large pre-training language model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310869091.1A CN116932389A (en) 2023-07-14 2023-07-14 Solver defect detection method based on large pre-training language model

Publications (1)

Publication Number Publication Date
CN116932389A true CN116932389A (en) 2023-10-24

Family

ID=88390448

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310869091.1A Pending CN116932389A (en) 2023-07-14 2023-07-14 Solver defect detection method based on large pre-training language model

Country Status (1)

Country Link
CN (1) CN116932389A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117130943A (en) * 2023-10-26 2023-11-28 北京一平方科技有限公司 Test case generation and operation and maintenance data analysis method based on large language model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117130943A (en) * 2023-10-26 2023-11-28 北京一平方科技有限公司 Test case generation and operation and maintenance data analysis method based on large language model
CN117130943B (en) * 2023-10-26 2024-02-20 北京一平方科技有限公司 Test case generation and operation and maintenance data analysis method based on large language model

Similar Documents

Publication Publication Date Title
CN107657250B (en) Bearing fault detection and positioning method and detection and positioning model implementation system and method
CN107832219B (en) Construction method of software fault prediction technology based on static analysis and neural network
US10176079B2 (en) Identification of elements of currently-executing component script
CN117951701A (en) Method for determining flaws and vulnerabilities in software code
Zhang et al. Autotrainer: An automatic dnn training problem detection and repair system
CN105738109A (en) Bearing fault classification diagnosis method based on sparse representation and ensemble learning
CN110162478B (en) Defect code path positioning method based on defect report
US20220300820A1 (en) Ann-based program testing method, testing system and application
CN107862327B (en) Security defect identification system and method based on multiple features
CN116932389A (en) Solver defect detection method based on large pre-training language model
CN111177655B (en) Data processing method and device and electronic equipment
CN112560269B (en) Rhapbody state machine-based high fault tolerance electronic system task reliability simulation analysis method
Zhang et al. A multi-module generative adversarial network augmented with adaptive decoupling strategy for intelligent fault diagnosis of machines with small sample
CN110020637B (en) Analog circuit intermittent fault diagnosis method based on multi-granularity cascade forest
CN111752833B (en) Software quality system approval method, device, server and storage medium
CN113157565A (en) Feedback type JS engine fuzzy test method and device based on seed case mutation
CN107622017A (en) A kind of analytic method of general automation software test
Li et al. Source-free domain adaptation framework for fault diagnosis of rotation machinery under data privacy
Mujica et al. A hybrid approach of knowledge-based reasoning for structural assessment
Dargan et al. Predicting systems performance through requirements quality attributes model
CN104615535B (en) Method and device for generating test case based on extended data flow model
Valueian et al. Constructing automated test oracle for low observable software
CN116578475A (en) Code verification method, device, equipment and readable storage medium
Jin et al. Mutation Operators for Object Constraint Language Specification.
Jiang Research on software defect prediction technology based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination