CN116431477A

CN116431477A - JS engine differential fuzzy test method based on deep learning

Info

Publication number: CN116431477A
Application number: CN202310240372.0A
Authority: CN
Inventors: 汤战勇; 弋雯; 赵晶莹; 范镇业; 车小康; 叶贵鑫
Original assignee: NORTHWEST UNIVERSITY
Current assignee: NORTHWEST UNIVERSITY
Priority date: 2023-03-14
Filing date: 2023-03-14
Publication date: 2023-07-14

Abstract

The invention belongs to the field of software automation testing, and particularly relates to a JS engine differential fuzzy testing method based on deep learning. Firstly, collecting model training data and testing kits with high quality; then, carrying out data preprocessing on the data and giving the data to a model for fine adjustment training; then writing the test suite by using the fine tuning model; and finally, giving the generated test cases to each JS engine in the differential test system for execution, and further judging the result of the suspicious cases obtained after execution by using a corresponding execution platform of a standard version supported by the engine, and analyzing whether the JS engines are in error. The method uses the deep learning text generation model to generate the JavaScript use case, and has high automation degree and high generation efficiency compared with the traditional generation method.

Description

JS engine differential fuzzy test method based on deep learning

Technical Field

The invention relates to the field of software automation testing, in particular to a JS engine differential fuzzy testing method based on deep learning.

Background

The JavaScr i pt (JS) language is one of the most widely applied scripting languages, and is widely used for building intelligent and personalized applications and systems, such as website front-end development, server development, command line tool development (node. JS), desktop application development, mobile development, plug-in development, game development, etc. The JS engine is a core component that interprets the JavaScr ipt code and converts it into binary code that can be processed by a computer, primarily as part of a browser or node. JS. Incorrect JS engine implementation can lead to failure of the user to obtain the correct output, and if a more serious engine vulnerability is mastered and utilized by an attacker, the user's private information and property security can be threatened.

Currently, the commonly used defect detection methods are mainly divided into two main types, namely static analysis and dynamic test. Static analysis scans the program by using lexical grammar analysis and other technologies, and the defect of high false alarm rate is common. The dynamic test technology comprises two main types of white box test and black box test. White box testing refers to testing of test case designs with knowledge of the internal design of the program. The black box test refers to a test performed when the internal structure of the software is unclear, wherein the fuzzy test is a mainstream method of JS engine test due to the characteristics of high automation degree, low labor cost consumption, easy processing of results and the like.

Fuzzy Test (Fuzz Test i ng) is an automated software testing technique that discovers vulnerabilities and errors in software programs by inputting large-scale Test data into the system under Test and observing whether the system is able to execute the use cases correctly. When the fuzzy test is performed, in order to ensure the validity of the test, the input data must be highly versatile to ensure the communication. The focus of the fuzzy test is on the test case generation stage, and the final effect of the fuzzy test depends on the quality of the generated test case. In the fuzzy test, the generation methods of test cases are roughly classified into two types: test case generation method based on generation and test case generation method based on variation. The new test case is built from beginning according to the predefined rule based on the generated method, a great amount of priori knowledge is needed when the rule is built, and the cost is high; mutation-based methods generate new test cases by randomly modifying from existing seed inputs, and the mutation process is too random, which can easily lead to grammar errors.

Disclosure of Invention

In order to solve the problems in the background technology, the invention provides a JS engine differential fuzzy test method based on deep learning, which has the advantages that abstract features of high level can be automatically learned, and the result can be gradually improved until the optimal result is reached.

In order to achieve the above purpose, the present invention provides the following technical solutions: aiming at the problems of high labor cost and low grammar accuracy of the existing fuzzy test case generation technology, the invention provides a JS engine differential fuzzy test method based on deep learning. Deep learning is a machine learning technique that can learn from data and make predictions. The main feature of deep learning is that high-level abstract features can be automatically learned, and the results can be gradually improved until the optimal results are reached. Deep learning does not require manual feature extraction compared to traditional machine learning, and therefore can handle data and complex tasks more efficiently. The text generation model is based on a deep learning algorithm, and can generate consistent and smooth text by learning a writing mode of human beings. Therefore, the invention is intended to generate the Java Scar i pt test case through a text generation model. The method mainly comprises a data collection stage, a model construction stage, a use case generation stage and a differential test stage.

In the data collection stage, the method firstly selects a JavaScr i pt project warehouse with higher quality on the G itHub to perform data crawling, and then collects JavaScr i pt files in the data crawling. The engine test suite libraries provided in each JS engine-official network are then collected.

In the model training stage, preprocessing is carried out on the JavaScr i pt file in the JavaScr i pt project crawled in the data collection stage, and the part of data is used as model training data. And training and fine-tuning the text generation model Dist i l GPT-2 by using training data to obtain a deep learning model capable of generating JavaScr i pt codes with higher grammar accuracy.

In the test case generation stage, an engine test suite in the data collection stage is used as an initial case, then a generation model obtained in the model training stage is used for writing all the initial cases, and the test cases obtained through writing are stored as test case sets.

In the differential test stage, the conventional JS engine is installed, and a differential test environment is built. And then testing by using the test case set obtained in the test case generation stage, giving each test case to all installed JS engines for execution, capturing the execution result, mainly comprising an execution code for executing the test case, test result information and the like, comparing the test results, and marking the engine which is different from the execution results of most engines as an engine with a problem. And finally, manually analyzing the result to judge whether the result is a problem.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

fig. 1 is a schematic structural diagram of the overall flow of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Examples

According to fig. 1, the present embodiment provides a JS engine differential fuzzy test method based on deep learning, which specifically includes a data collection stage, a model construction stage, a use case generation stage, and a differential test stage.

1. Data collection phase

In order to ensure that a better model training effect can be obtained when a text generation model is trained, and high-quality JavaScript test cases can be obtained when the text generation model is used for writing in the follow-up process, more approved items are selected to be collected in a data collection stage.

For model training data, the required data amount is large and the range is wide, so the invention selects the JavaScript code warehouse with quick updating and large collection number from the Github as a data collection source. For the initial data of the generation section, it is considered that the initial data needs to have a certain guidance for the test. When the engine is developed, a developer can conduct unit test, regression test and the like on the engine before the engine is formally released, and when the engine is tested, corresponding test cases are written according to purposes and stored as test kits, so that part of the cases have strong pertinence, and the test cases are selected as initial data.

When data collection is carried out, the method and the device collect the webpages needing to be crawled, write the corresponding she l scripts according to the needs, execute the scripts on the server and store the collected data.

2. Model construction stage

The task of the model build phase is to create a text generation model with good JavaScr i pt writing capabilities. In the implementation, firstly, the model training data collected in the previous stage is preprocessed, and the main steps comprise:

(1) extracting and storing the collected Javascr ipt files in the Javascr ipt items;

(2) resolving each javascript file in (1) into an AST using javascript resolution tool espr ima;

(3) performing annotation removal on AST, extracting functions in the AST, backfilling variables, and processing JavaScript files into function files, wherein each file only contains one function;

(4) grammar filtering, de-duplication, formatting and the like are carried out on the extracted function file;

(5) and writing the finally obtained file into a database for storage.

After training data is obtained, fine tuning training is needed to be carried out on a text generation model, the text generation model selected in the invention is d i st i l GPT2, which is a smaller and faster generation model for pre-training on GPT-2 by adopting a knowledge distillation technology, and the speed is improved by three times on the basis of maintaining high-performance processing of various natural language processing tasks, so that the efficiency is improved, and equipment with limited computing resources can be deployed more seamlessly.

For dist i l gpt2, the fine tuning input is a text file, so that training data obtained through preprocessing needs to be written into the text file, and the di st i l gpt2 can automatically mark the training data. In writing text, each function file is marked with a separator "<|endotext| >", meaning that it is a single sample rather than an entire long text, where it stops when text generation occurs. When fine tuning the model, max l ength is set 512 and the number of rounds is set to 16 ten thousand, obtaining the final text generation model.

3. Use case generation stage

At this stage, firstly, the official test suite of the acquisition engine on the official network of each JavaScr ipt engine collected in the data collection stage is subjected to data preprocessing, and the preprocessing process is the same as that of the model construction stage, so that the data is processed into data with the same format as that of model generation data.

When data generation is carried out, the method adopts two strategies, namely, a first row of initial data is used as a generation prefix of a text model, the text model is subjected to continuous writing, and the rest of the whole function file is generated by the model; and secondly, randomly intercepting the original test case, taking the beginning of the test case to the interception point as a prefix, and using a model to write the original test case so as to form a complete function.

After model writing, the test case at this time is just an anonymous function and there is no function call. To enable this function to execute requires the use of add variable declarations, parameter passes, and function calls to it, making it a complete test case. The addition of the variable declaration is completed at the source code level, and the source codes are simply spliced. The parameter transfer is aimed at the function with parameters, and the required number of parameters are transferred into the function so as to ensure that no error occurs when the function is called. When parameter transmission is carried out, firstly, parameter type deduction is carried out according to the use mode of the parameter in the function body. After the parameter type is obtained, parameter generation is performed according to the type, and the generated value is added as a parameter. In addition to delivering a specific type of parameter, the method generates a random type of parameter for all the tape functions, and there are mainly two purposes: firstly, some functions cannot accurately infer the parameter types, so that values of random types need to be generated to ensure that the grammar is correct; secondly, the ecmascript standard includes a large number of parameter conversion operations, so that passing different types of parameters can check whether the partial engine is implemented correctly. And finally, adding function call to Funct ion by using the function name given by the variable declaration stage and the parameter name generated by the parameter transfer stage.

4. Differential test phase

The main purpose of the differential test stage is to make the JS engine execute the obtained test case and judge whether the result is correct. In the implementation, the JS engine configuration tool jsvu is first used to install and configure the engine, and the specific engine is shown in table 1 below.

Table 1:

engine numbering	Engine name	Engine version
			1	v8	9.9.1
2	spiderMonkey	JavaScript-C96.0
			3	chakra	chversion1.11.24.0
4	quickjs	v2021-03-27
			5	jerryscript	Version:2.4.0
6	graaljs	21.3.0
			7	hermes	0.10.0

After the engine configuration is completed, all test cases in the fuzzy test seed pool obtained by writing the text generation model d i st i l gpt2 in the case generation stage are sequentially submitted to all engines for execution, and the results are captured and compared, wherein the method mainly comprises information such as test case execution output, return code when the case is executed and the like. After the execution result is obtained, the test result is compared by using a voting mechanism, and the preliminary mark which is different from the most execution results in appearance is a suspicious case.

For the JS engine, due to the dynamic and interactive characteristics, the updating iteration speed is very fast to meet new application requirements, in recent years, the standard ecmascript ipt-262 of the JS engine issues a new version almost every year, and for the JS engine, a certain difference may exist in the standard version supported by the JS engine. If the differential test technology is simply used to compare the execution results of JS engines developed by different manufacturers, a large number of false positives are inevitably generated, and the load of manual result analysis is greatly increased. Therefore, the invention increases the comparison with the execution result of the execution platform realized by direct conversion according to the grammar standard on the basis of the cross-compiler differential fuzzy method, thereby improving the accuracy of the differential test result.

For the ECMAScr i pt standard, there are two versions with larger difference in the process of version update iteration, namely ECMAScr ipt5.1 (ECMAScr ipt 2011) and ECMAScr ipt6 (ECMAScr ipt 2015), and the ECMAScr i pt6 is updated every year, but the changes are smaller, so that for the JS engine, the version of the ECMAScr ipt standard is generally supported to be ECMAScr ipt5.1 or the latest version. The corresponding tools selected when the standard conversion execution platform tool is carried out are KJS supporting ECMAScr i pt5.1 and J ISET supporting the latest ECMAScript standard ECMAScript 12, and the J ISET is used as a standard test result for comparison.

KJS is a JavaScr ipt formalized analysis tool developed based on the K Framework [50] Framework, and can be used for formalized analysis and verification of the JavaScr ipt program. Testing KJS using the KJS in ECMAScr ipt5.1 consistency test suite may be more standard than the production version JavaScr i pt engines (e.g., safar i JavaScr iptCore and Fi refox Spi derMonkey) by all 2,782 tests corresponding to the core language. And using the tool as a standard execution platform of the ECMAScript 5.1, comparing the execution result of the engine meeting the ECMAScript 5.1 with KJS again, and marking the result as a suspicious case if the result still does not meet the result.

J ISET, a JavaScr ipt ir-based semantic extraction tool chain, is the first tool to automatically synthesize a parser and an AST-I R translator directly from a given language specification ECMAScript, aiming to alleviate the problem of understanding and reasoning of JavaScr ipt programs by providing ECMAScript with I R-based formal semantics. J ISET differs from existing methods of defining JavaScript form semantics in that it proposes a semi-automatic synthesis of AST-IR translators aided by compilation rules. These compilation rules describe how each step of the abstract algorithm is converted into an Intermediate Representation (IR) designed for ecmascript. J ISET also automatically generates a parser for all versions of ECMAScript, averaging 95.03% steps in the auto-compile abstraction algorithm. Furthermore, J ISET is executable, which allows it to bridge the gap between specifications written in natural language and executable tests. Using the tool as a standard execution platform of the ECMAScr I pt12, comparing the execution result of the engine meeting the ECMAScr I pt12 with the J I SET again, and marking the result as a suspicious case if the result is still not met.

After differential test, suspicious results need to be analyzed and filtered. When filtering, four kinds of information are mainly judged, namely an API in a standard contained in the test case, an engine name of an abnormal behavior, a return code value of an engine with the abnormality when the test case is executed, and related output information of the engine with the abnormality when the test case is executed. The four information combinations form an error type, when a new test case needs to be filtered, the matching is carried out according to the error type, and the filtering is carried out according to the matching. The test cases which cannot be filtered are manually analyzed to determine whether they are a problem.

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. The JS engine differential fuzzy test method based on deep learning is characterized in that the overall operation process comprises a data collection stage, a model construction stage, a use case generation stage and a differential test stage, and the method comprises the following specific steps:

step 1, selecting a JavaScript item warehouse with higher quality and an engine test suite library provided in each JS engine official network from Github to perform data crawling and collection;

step 2, preprocessing the data obtained in the data collection stage, using the part of data as model training data, and then training and fine-tuning a text generation model DistillGPT-2 by using the training data to obtain a model capable of generating correct JavaScript codes;

and 3, using the engine test suite in the data collection stage as an initial use case, then using the generated model obtained in the model training stage to write all the initial use cases, and storing the test use cases obtained by writing as test use case sets.

And 4, performing test execution on the installed JS engine by using the test case set obtained in the test case generation stage, capturing an execution result, comparing the test result, marking the engine which is different from the execution result of most engines as an engine with a problem, performing re-judgment by combining a standard conversion execution platform, and performing manual analysis on a final suspicious result to further determine the problem.

2. The JS engine differential fuzzy test method based on deep learning as set forth in claim 1, wherein the specific code preprocessing process in step 2 includes:

(1) extracting and storing the collected JavaScript files in the JavaScript items;

(2) analyzing each JavaScript file in the step (1) into AST by using a JavaScript analyzing tool esprima;

(5) and writing the finally obtained file into a database for storage.