CN117331808A - Test data processing method, device, computer equipment and storage medium - Google Patents

Test data processing method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN117331808A
CN117331808A CN202210726982.7A CN202210726982A CN117331808A CN 117331808 A CN117331808 A CN 117331808A CN 202210726982 A CN202210726982 A CN 202210726982A CN 117331808 A CN117331808 A CN 117331808A
Authority
CN
China
Prior art keywords
test case
initial
case
code
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210726982.7A
Other languages
Chinese (zh)
Inventor
叶贵鑫
弋雯
车小康
张博
钱文祥
何林书
陈志凯
杨勇
王巨宏
汤战勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NORTHWEST UNIVERSITY
Tencent Technology Shenzhen Co Ltd
Original Assignee
NORTHWEST UNIVERSITY
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NORTHWEST UNIVERSITY, Tencent Technology Shenzhen Co Ltd filed Critical NORTHWEST UNIVERSITY
Priority to CN202210726982.7A priority Critical patent/CN117331808A/en
Publication of CN117331808A publication Critical patent/CN117331808A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/368Test management for test version control, e.g. updating test cases to a new software version
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present application relates to a test data processing method, apparatus, computer device, storage medium and computer program product. The method comprises the following steps: acquiring an initial test case aiming at a to-be-tested object; determining mutation points based on the initial test case, intercepting code content from the initial test case to the mutation points, and writing codes meeting grammar standards on the basis of the intercepted code content so as to obtain an initial variation case; determining target code content to be replaced from the initial test case; determining the code content with the structure attribute matched with the structure attribute of the target code content in the initial mutation use case as the replacement code content; and replacing target code content in the initial test case based on the replacement code content to obtain a target variation case, wherein the target variation case is used for testing the object to be tested. By adopting the method, the accuracy of the test case obtained by mutation can be improved.

Description

Test data processing method, device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technology, and in particular, to a test data processing method, apparatus, computer device, storage medium, and computer program product.
Background
With the development of computer technology, a fuzzy test technology appears, and the core idea of the fuzzy test technology is to input a large number of automatically generated test cases into a to-be-tested object, monitor the abnormality of the to-be-tested object in the execution process, analyze the abnormality, and finally judge whether the to-be-tested object is a defect. The fuzzy test technology has the advantages of high automation degree, high test efficiency, no dependence on the source code of the test target program during the test, and the like, and is one of the most effective defect detection means at present. The key link of the fuzzy test is the generation of test cases, and the quality of the generated test cases determines the final effect of the fuzzy test.
In the conventional technology, a large number of test cases are generated by randomly mutating the collected original data samples, however, the accuracy of the test cases obtained by random mutating is low.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a test data processing method, apparatus, computer device, computer readable storage medium, and computer program product that can improve the accuracy of the resulting test cases.
In one aspect, the present application provides a test data processing method. The method comprises the following steps: acquiring an initial test case aiming at a to-be-tested object; determining mutation points based on the initial test case, intercepting code content from the initial test case to the mutation points, and writing codes meeting grammar standards on the basis of the intercepted code content so as to obtain an initial variation case; determining target code content to be replaced from the initial test case; determining the code content with the structure attribute matched with the structure attribute of the target code content in the initial mutation use case as the replacement code content; and replacing target code content in the initial test case based on the replacement code content to obtain a target variation case, wherein the target variation case is used for testing the object to be tested.
On the other hand, the application also provides a test data processing device. The device comprises: the initial case acquisition module is used for acquiring an initial test case aiming at a to-be-tested object; the code writing module is used for determining mutation points based on the initial test case, intercepting the code content from the initial test case to the mutation points, and writing codes meeting grammar standards on the basis of the intercepted code content so as to obtain an initial variation case; the target code determining module is used for determining target code content to be replaced from the initial test case; the replacement code determining module is used for determining the code content with the structure attribute matched with the structure attribute of the target code content in the initial mutation use case as the replacement code content; and the code replacement module is used for replacing target code content in the initial test case based on the replacement code content to obtain a target variation case, wherein the target variation case is used for testing the object to be tested.
In another aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of: acquiring an initial test case aiming at a to-be-tested object; determining mutation points based on the initial test case, intercepting code content from the initial test case to the mutation points, and writing codes meeting grammar standards on the basis of the intercepted code content so as to obtain an initial variation case; determining target code content to be replaced from the initial test case; determining the code content with the structure attribute matched with the structure attribute of the target code content in the initial mutation use case as the replacement code content; and replacing target code content in the initial test case based on the replacement code content to obtain a target variation case, wherein the target variation case is used for testing the object to be tested.
In another aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of: acquiring an initial test case aiming at a to-be-tested object; determining mutation points based on the initial test case, intercepting code content from the initial test case to the mutation points, and writing codes meeting grammar standards on the basis of the intercepted code content so as to obtain an initial variation case; determining target code content to be replaced from the initial test case; determining the code content with the structure attribute matched with the structure attribute of the target code content in the initial mutation use case as the replacement code content; and replacing target code content in the initial test case based on the replacement code content to obtain a target variation case, wherein the target variation case is used for testing the object to be tested.
In another aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of: acquiring an initial test case aiming at a to-be-tested object; determining mutation points based on the initial test case, intercepting code content from the initial test case to the mutation points, and writing codes meeting grammar standards on the basis of the intercepted code content so as to obtain an initial variation case; determining target code content to be replaced from the initial test case; determining the code content with the structure attribute matched with the structure attribute of the target code content in the initial mutation use case as the replacement code content; and replacing target code content in the initial test case based on the replacement code content to obtain a target variation case, wherein the target variation case is used for testing the object to be tested.
According to the test data processing method, the device, the computer equipment, the storage medium and the computer program product, the mutation points are determined based on the initial test cases, the code content which is cut off to the mutation points in the initial test cases is intercepted, codes meeting grammar standards are continuously written on the basis of the intercepted code content to obtain the initial variation cases, the target code content to be replaced is determined from the initial test cases, the code content with the structure attribute matched with the structure attribute of the target code content in the initial variation cases is determined to be the replacement code content, the target code content in the initial test cases is replaced based on the replacement code content to obtain the target variation cases, and because the target variation cases are obtained by replacing the target code content in the initial test cases based on the replacement code content and the structure attribute of the replacement code content is matched with the target code content, the target variation cases can retain the semantic information in the initial test cases as much as possible, the structure of the initial test cases is prevented from being damaged, the grammar correctness is ensured, and the accuracy of the obtained test cases can be improved.
Drawings
FIG. 1 is a diagram of an application environment for a test data processing method in one embodiment;
FIG. 2 is a flow chart of a test data processing method in one embodiment;
FIG. 3 is a schematic diagram of intercepting code content in one embodiment;
FIG. 4 is a flow chart of a test data processing method according to another embodiment;
FIG. 5 is a flow chart of a test data processing method according to another embodiment;
FIG. 6 is a flow chart of a test data processing method according to another embodiment;
FIG. 7 is a flow chart of a test data processing method according to yet another embodiment;
FIG. 8 is a general flow diagram of a test data processing method in one embodiment;
FIG. 9 is a block diagram of a test data processing device in one embodiment;
FIG. 10 is an internal block diagram of a computer device in one embodiment;
FIG. 11 is an internal block diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The test data processing method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process, for example, data of an original test case, a target variant case, and the like. The data storage system may be integrated on the server 104 or may be located on the cloud or other servers. In one embodiment, after receiving a test request sent by the terminal 102 for a to-be-tested object, the server 104 may obtain an initial test case for the to-be-tested object, determine a mutation point based on the initial test case, intercept code content from the test case that ends up at the mutation point, write a code meeting a grammar standard on the basis of the intercepted code content to obtain an initial variant case, determine target code content to be replaced from the initial test case, determine, as replacement code content, code content in the initial variant case whose structural attribute matches with the structural attribute of the target code content, replace the target code content in the initial test case based on the replacement code content, obtain a target variant case, and based on the obtained target variant case, the server 104 may test the to-be-tested object and return a test result to the terminal 102.
The server 104 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms. The terminals 102 include, but are not limited to, smart phones, tablet computers, notebook computers, desktop computers, intelligent voice interaction devices, smart appliances, vehicle terminals, aircraft, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited herein.
The embodiments of the present application may be applied to various scenarios including, but not limited to, cloud technology, artificial intelligence, intelligent transportation, assisted driving, and the like.
In one embodiment, as shown in fig. 2, a test data processing method is provided, and the method is applied to a computer device for example, where the computer device may be the terminal 102 in fig. 1, the server 104 in fig. 1, or a system formed by the terminal 102 and the server 104. Specifically, the test data processing method comprises the following steps:
Step 202, an initial test case for a subject to be tested is obtained.
Wherein, the object to be tested refers to the program which needs to be tested. The object to be tested may be various compilers. For example, the object to be tested may be a javascript engine, a Java virtual machine (Java virtual machine), or a GCC (GNU Compiler Collection, GNU compiler suite). The initial test case of the object to be tested refers to an original test case, and the test data processing method provided by the embodiment of the application can generate and mutate the original test case to obtain a large number of test cases which can be used for continuously testing the object to be tested. The initial test case may be a test case including only a Function body (Function), or may be a complete executable file.
Specifically, the computer device may obtain one or more initial test cases for the object to be tested, so as to execute a subsequent mutation process based on the initial test cases, so as to generate a target mutation case.
In one embodiment, when the object to be tested includes a plurality of different versions, the computer device may obtain initial test cases for the test objects of the respective versions, respectively. Wherein a plurality of different versions of the object to be tested may correspond to different developers, respectively. For example, assuming that the object to be tested is a javascript engine, the javascript engine includes JavascriptCore, jerryScript and QuickJS versions, then for each version of the javascript engine, the computer device may obtain one or more initial test cases for that version of the javascript engine. Wherein, a plurality of referring to in this application refers to at least two.
Considering the random test case, the test is easily guided to the code module irrelevant to the defect or the loophole, the test efficiency is reduced, and the existing research proves that the continuous test of the code module covered by the test case which has triggered the code defect or the loophole is more likely to trigger the defect or the loophole of the object to be tested again. Therefore, in the embodiment of the application, the initial test case is obtained by fully utilizing the historical script and the standard test suite which trigger the defects or the loopholes of the object to be tested, so that the test can be guided to the code module which is easy to generate the loopholes or the defects, and the test efficiency and the loophole discovery probability are improved. Based on this, in acquiring an initial test case for a subject to be tested, the following embodiments are provided at the time of this application:
in one embodiment, a computer device may obtain an initial test case from a first set of test cases. The first test case is a test case obtained based on a standard test suite. Standard test suites refer to test suites written based on grammatical standards. In this embodiment, the computer device may obtain, in advance, standard test suites from an official website of the object to be tested or an official test case library corresponding to the object to be tested, and obtain the first test case based on the obtained standard test suites.
In another embodiment, the computer device may obtain the initial test case from the second set of test cases. The second test case is obtained based on the history vulnerability script. In this embodiment, the computer device may collect, in advance, a history script that triggers a defect or a vulnerability of the object to be tested to obtain the second test case.
Further, considering that the number of initial test cases collected in the above manner is limited, in other embodiments, the computer device may further perform code renewal by adopting a pre-trained neural network model based on the test cases in the first test case set or the test cases in the second test case set, to obtain a plurality of third test cases to form a third test case set. In the process of code writing, the computer equipment can select a mutation point based on a test case in the first test case set or a test case in the second test case set, intercept code content cut off to the mutation point in the initial test case, input the intercepted code content into a trained neural network model for code writing or acquire a model generated based on grammar standards for code writing, and splice the code obtained by writing and the intercepted code content to obtain a third test case set. Thus, the computer device can acquire any one test case from the third test case set as an initial test case.
And 204, determining mutation points based on the initial test case, intercepting code content from the initial test case to the mutation points, and writing codes meeting grammar standards on the basis of the intercepted code content so as to obtain the initial variation case.
The mutation point may be an index number of a character in the initial test case, and a corresponding character may be determined in the test case according to the index number. The code content ending up to the mutation point refers to the portion of the code content from the beginning of the initial test case to the character of the mutation point index. The mutation points can be selected randomly or according to a specific rule. For example, as shown in table 3, in order to intercept the code content, fig. 3 (a) is an initial test case, and fig. 3 (b) is a graph showing that the mutation point selected by the graph (b) is a "corresponding index number", the intercepted code content is an "cut-off") code content, as shown in fig. 3 (c). The grammar standard is a file for normalizing the code writing process of the test case to be tested, and may be ECMAScript, ECMA-262, for example.
Specifically, the computer device may determine the mutation point based on the initial test case, intercept the code content from the initial test case up to the mutation point, and write the code meeting the grammar standard with the intercepted code content as the prefix, so as to obtain the initial mutation case.
In one embodiment, after the code content is obtained by interception, the computer device may input the intercepted code content as a prefix into a pre-trained neural network model, write a code meeting a grammar standard through the pre-trained neural network model, and splice the code obtained by writing and the intercepted code content to obtain an initial variant use case. The neural network model can be a text generation model, the codes are regarded as texts, and in the training process, the neural network model learns information related to grammar standards in the codes, such as code formats, grammar structures, coding rules, calling relations of various APIs and the like, so that codes meeting the grammar standards can be written on intercepted code contents.
In a specific embodiment, the neural network model used for the surrogate write may be a generating Pre-Training (GPT) model, for example, GPT-2, and GPT-2 is a model based on a transducer architecture, which has long-range dependence problems and is costly, unlike conventional recurrent neural networks, and uses an attention mechanism to give different attention to different segments of the input sequence at each processing step. Compared with a cyclic neural network, the layers of the transducer model are highly parallelizable, have low calculation cost and stronger feature extraction capability, and are excellent in natural language processing direction. The training process of GPT-2 is divided into two stages of pre-training and fine-tuning. The pre-training stage is to perform unsupervised word prediction based on huge monolingual corpus, and train a generated language model; the fine tuning stage is further fine tuning training on the target dataset. In this embodiment, a basic model with a parameter of GPT-2 being 117M is used to obtain a test case as training data as a fine adjustment data set to update weight fine adjustment models of the last two full connection layers of the GPT-2 model obtained by pre-training, and the training duration is determined according to the quality of the generated sample, so as to finally obtain a generated model conforming to the training data. It will be appreciated that in other embodiments, LSTM (Long Short-Term Memory network), RNN (Recurrent Neural Network ) and the like may be used as the neural network model for writing, and the model structure used in the present application is not limited.
In order to ensure accuracy of writing, when training the neural network model, the data format of the training data and the data format of the initial test case are the same, for example, when the initial test case is a completed executable file, the complete test case can be directly used for training the neural network model during training, when the initial test case is a test case only comprising functions, when training, independent functions are required to be extracted from the collected data for training the neural network model, and when the training data is functions, the neural network model can better learn data such as grammar structures, coding rules and calling relations of various APIs in the training data because the functions are simpler than the complete test case, so that the code obtained by writing can better accord with grammar standards.
In another embodiment, the computer device may obtain a template written based on the grammar standard, and based on the obtained template, write code conforming to the grammar standard on the basis of the intercepted code content.
In one embodiment, for the obtained initial test case, the computer device may further test the object to be tested through the initial test case, and determine the mutation point from the initial test case to perform mutation only when the obtained test result is an abnormal test result, so that the obtained target mutation case may improve the test efficiency and the vulnerability discovery probability.
In step 206, the target code content to be replaced is determined from the initial test case.
In step 208, the code content whose structure attribute matches the structure attribute of the target code content in the initial mutation use case is determined as the replacement code content.
The target code content refers to code content which needs to be replaced in the initial test case, and in order to keep semantic information of the initial test case as much as possible, the target code content may be part of code content in the initial test case. In one embodiment, the target code content may be a few code blocks in the initial test case. In other embodiments, the object code content may be one or more variable values in the initial test case. The structure attribute is used to describe the syntax structure that the code content represents in the test case.
Specifically, the computer device may determine the target code content to be replaced from the initial test case, and determine the code content with the structure attribute consistent with the structure attribute of the target code content in the initial variant case as the replacement code content.
And 210, replacing target code content in the initial test case based on the replacement code content to obtain a target variant case, wherein the target variant case is used for testing the object to be tested.
Specifically, after obtaining the replacement code content, the computer device may replace the target code content in the initial test case based on the replacement code content to obtain a target variant case, where the syntax structure of the obtained target variant case is matched with the syntax structure of the initial test case.
In one embodiment, there are a plurality of target code contents to be replaced, which are determined from the initial test cases, and a plurality of replacement code contents, which are determined from the initial variant cases, and then the computer device performs replacement, for each target code content, with the replacement code content matching the structural attribute of the target code content. For example, assuming that the object code content includes one code block and a plurality of variable values, the replacement code content also includes one code block and a plurality of variable values, the computer device may replace the code blocks in the object code content with the code blocks in the replacement code content and replace the variable values in the object code content with the variable values in the replacement code content.
It can be understood that, since the target variant case is obtained by replacing the target code content in the initial test case, the format of the target variant case is consistent with that of the initial test case, that is, when the initial test case is a complete executable file, the obtained target variant case is a complete test case, and when the initial test case is a Function only, the obtained target variant case is a Function case, because the Function case to be tested directly executes and has grammar errors, when the target variant case needs to be used for testing an object to be tested, the target variant case needs to be assembled, specifically, variable declaration, function name, random transfer of Function parameters, function call and output of call results are added to the Function to form a complete test case, the random transfer of the Function parameters can increase the randomness and diversity of the test case, and the Function call and the result output are favorable for analyzing the test result.
In one embodiment, after obtaining the target variation case, the computer device may iteratively execute steps 204 to 210 with the target variation case as an initial variation case until an iteration stop condition is satisfied, so that a plurality of target variation cases may be obtained through multiple variations, and the utilization rate of the initial test case is improved.
According to the test data processing method, the initial test case aiming at the object to be tested is obtained, the mutation point is determined based on the initial test case, the code content which is cut off to the mutation point in the initial test case is intercepted, the code which accords with the grammar standard is rewritten on the basis of the intercepted code content so as to obtain the initial variation case, the target code content which is to be replaced is determined from the initial test case, the code content which is in the initial variation case and is matched with the structure attribute of the target code content is determined as the replacement code content, the target code content in the initial test case is replaced based on the replacement code content so as to obtain the target variation case, and because the target variation case is obtained by replacing the target code content in the initial test case based on the replacement code content and is matched with the structure attribute of the target code content, the target variation case can retain the semantic information in the initial test case as much as possible, the structure of the initial test case is prevented from being damaged, the grammar correctness is ensured, and the accuracy of the test case obtained by variation is improved.
In one embodiment, as shown in fig. 4, there is provided a test data processing method including the steps of:
Step 402, an initial test case for a subject to be tested is obtained.
And step 404, determining mutation points based on the initial test case, intercepting code content from the initial test case to the mutation points, and writing codes meeting grammar standards on the basis of the intercepted code content so as to obtain the initial mutation case.
And 406, determining the current code block with the position sequence before the sequence threshold value as the target code content to be replaced in the code blocks which do not contain intercepted code content in the initial test case.
The code blocks refer to the complete code semantic representation of various sentences of the first stage inside the test case, such as for the following functions, the first code block of the second behavior (hereinafter referred to as code block 1), the third and fourth behavior (hereinafter referred to as code block 2) and the sixth behavior (hereinafter referred to as code block 3). The fourth line, while being a complete code semantic representation, belongs to the code of the second level and therefore cannot be divided alone as a code block.
The code blocks which do not contain the mutation points, namely the code blocks which are completely composed of the code written in the initial use case, can be ordered according to the positions of the code blocks, the code blocks which are closer to the mutation points are ordered more front, and the code blocks which are ordered in the first position are the code blocks which do not contain the first row of the intercepted code content. In the above example, assuming that the first line of codes is a mutation point, code block 1, code block 2, and code block 3 are all code blocks that do not contain truncated code content, and the three code blocks are ordered as follows: code block 1, code block 2, code block 3.
Specifically, considering that the relevance between the code blocks closer to the mutation point and the initial test case is larger, the semantic matching degree is larger in the code blocks obtained by writing, in this embodiment, the code blocks which do not contain the mutation point in the initial test case and are positioned before the ordering threshold value can be determined as target code blocks, and the target code blocks are the target code contents to be replaced. The sorting threshold can be set according to the needs, and the smaller the sorting threshold is, the larger the semantic matching degree between the replaced target variant case and the initial test case is. For example, in the above example, assuming that the first line of codes is a mutation point and the sorting threshold is 3, code block 1 and code block 2 may be determined as target code blocks, and assuming that the sorting threshold is 2, code block 1 may be determined as target code blocks.
In step 408, a code block corresponding to the sorting position of the target code block is determined as a replacement code block corresponding to the target code block from among code blocks not including the intercepted code content in the initial mutation use case.
Specifically, the computer device may determine, as the replacement code block, a code block having the same sort position as the sort position of the target code block, among code blocks in the initial mutation use case that do not contain the intercepted code content. In the above example, assuming that the code block 1 is the target code block, among the code blocks in the initial mutation use case that do not include the truncated code content, the code block ordered in the first bit is the code block in which the first line that does not include the truncated code content is located, and the code block is the replacement code block.
And 410, replacing the target code block in the initial test case with a replacement code block corresponding to the target code block to obtain a target variant case.
The target variant use case is used for testing the object to be tested.
In order to preserve as much semantic information as possible in the original case, in a specific embodiment, the code blocks that do not include the truncated code content in the initial test case may be determined to be target code blocks by sorting the code blocks that are in the first order, that is, the code blocks that do not include the first row of the truncated code content, and similarly, the replacement code content is the code blocks that are in the first order among the code blocks that do not include the truncated code content in the initial variant case. For example, assume that the initial test case is Function1, and Function1 is as follows:
assume that the third line of code "var tips= [ ] in Function 1; "determining a mutation point, and performing code renewal based on the mutation point, wherein the obtained initial mutation case is Function2, and the Function2 is as follows:
the fourth row of the first behavior case without the intercepted code content in the Function1 is a fourth row and a fifth row of code blocks, the fourth row of the first behavior case without the intercepted code content in the initial variation case Function2 obtained by renewing writing is a fourth row, the fifth row and the sixth row of the code blocks, the code blocks sequenced in the first position in the initial test case Function2 are replaced by the code blocks sequenced in the first position in the subsequent test case Function2, and after replacement, function3 is obtained, as follows:
Considering that there may be a plurality of target code blocks, in one embodiment, for each target code block, the computer device determines, from the code blocks in the initial variant use case that do not include the intercepted code content, a code block corresponding to a position of each target code block, to obtain each replaced code block corresponding to each target code block, and further replaces each corresponding target code block in the initial test case with each replaced code block, to obtain the target variant use case. In the above example, assuming that the code block 1 and the code block 2 are target code blocks, the code block 1 in the initial test case is replaced by the replacement code block corresponding to the code block 1, and the code block 2 in the initial test case is replaced by the replacement code block corresponding to the code block 2, so as to obtain the target mutation case corresponding to the mutation point in the first row.
In this embodiment, the computer device divides the initial test case and the initial mutation case by taking the code blocks as units, so that after determining the target code blocks, the code blocks corresponding to the positions of the target code blocks in the code blocks which do not include mutation points in the initial mutation case can be determined as replacement code blocks, and then the target code blocks in the initial test case are replaced by the replacement code blocks.
In one embodiment, determining a mutation point based on an initial test case, intercepting code content cut to the mutation point in the initial test case, and writing a code meeting a grammar standard on the basis of the intercepted code content to obtain an initial mutation case, wherein the method comprises the following steps: traversing a plurality of lines of code contents in an initial test case, and sequentially determining mutation points based on the traversed lines of code contents; and intercepting the code content cut-off to each mutation point in the initial test case respectively, and writing the code meeting the grammar standard on the basis of each intercepted code content respectively so as to obtain the initial mutation case corresponding to each intercepted code content.
Specifically, the computer device may traverse from the first line code to the second last line code of the initial test case, sequentially determine the mutation points from each line code traversed, for example, may randomly select one character from each line code traversed as a mutation point, or take the last character of the line of traversed code as the mutation point. And then intercepting the code content cut off to each mutation point in the initial test case, and respectively writing the codes meeting the grammar standard on the basis of the intercepted code content so as to obtain initial mutation cases corresponding to the intercepted code content, wherein the code content in the initial test case is reserved to different degrees in the obtained initial mutation cases, and the utilization degree of the initial test case can be improved to the greatest extent as the first row of codes to the last and second row of codes are taken as mutation points, so that the mutation can obtain as many target mutation cases as possible.
Further, determining a code block corresponding to the sorting position of the target code block as a replacement code block corresponding to the target code block from code blocks not containing intercepted code content in the initial mutation use case, including: for each initial variation case, determining a code block corresponding to the position of a target code block as a replacement code block corresponding to the target code block in the code blocks which do not contain the intercepted code content corresponding to the initial variation case, so as to obtain a plurality of replacement code blocks corresponding to the target code block; replacing the target code block in the initial test case with a replacement code block corresponding to the target code block to obtain a target variant case, wherein the method comprises the following steps: and replacing the target code blocks in the initial test cases with the replacement code blocks corresponding to the target code blocks respectively to obtain a plurality of target variant cases.
For example, assuming that a certain initial test case includes 4 lines of codes, each line of codes selects one mutation point, 3 mutation points can be obtained from the first line to the next to last line, for each mutation point, a corresponding initial mutation case can be obtained by writing the code content up to the mutation point, so that the initial test case can obtain 3 initial mutation cases, namely, the first line of codes are taken as the mutation point to obtain the code content 1, the initial mutation case 1 is obtained by writing the code content 1 on the basis of the first line of codes, the second line of codes are taken as the mutation point to obtain the code content 2, the initial mutation case 2 is obtained by writing the code content 2 on the basis of the third line of codes as the mutation point to obtain the code content 3, a replacement code block corresponding to the target code block position is determined from the code blocks which do not include the code content 1 in the initial mutation case 1, a replacement code block corresponding to the target code block position is determined from the code blocks which do not include the code content 2 in the initial mutation case 2, the target code content is obtained from the three target mutation blocks corresponding to the target code blocks in the initial mutation case 3, and the target code block position is obtained by replacing the target code blocks respectively.
In the above embodiment, by traversing the multiple lines of code content in the initial test case, multiple mutation points are obtained, multiple target mutation cases can be obtained based on the multiple mutation points, and the utilization degree of the initial test case is improved, so that as many target mutation cases as possible are obtained through mutation.
In one embodiment, as shown in fig. 5, there is provided a test data processing method including the steps of:
step 502, an initial test case for a subject to be tested is obtained.
And step 504, determining mutation points based on the initial test case, intercepting code content from the initial test case to the mutation points, and writing codes meeting grammar standards on the basis of the intercepted code content so as to obtain the initial mutation case.
In a specific embodiment, considering that the declaration of the variable is generally defined at the beginning of the function, the randomly selected mutation point may be selected to be at a position behind the definition of the variable of the function body, and the variable declaration is not performed again during the writing, so in this embodiment, the computer device may determine the last character of the first line code of the initial test case as the mutation point, and perform the code writing based on the mutation point. For example, assuming that the initial test case is Function1 above, one initial variant case obtained by renewing may be Function4, where Function4 is as follows:
At step 506, at least one variable value in the initial test case is determined as the target code content to be replaced.
In particular, the computer device may determine one or more variable values in the initial test case as target code content to be replaced.
Step 508, extracting the variable values in the initial variant, obtaining a variable value set, and selecting the target variable value from the variable value set.
And 510, replacing the variable value in the target code content by the target variable value to obtain the target variant use case.
The target variant use case is used for testing the object to be tested.
Specifically, the computer device may extract all the variable values in the initial variation case to obtain a variable value set, and store the variable value set in the variable list, so that variable values with the number consistent with that of the variable values included in the target code content may be randomly selected from the set as target variable values, and the variable values in the target code content in the initial test case are replaced with target variable values to obtain the target variation case. For example, for Function4 above, the variable "array. Length" defined in the second row and the variable "len" defined in the third row are saved in a variable list, the values in the variable list are randomly selected, the variable values defined in the initial test case Function1 are replaced in a different order, and the resulting target variant cases can be seen in Function5 and Function6. Wherein:
Function5 is as follows:
function6 is as follows:
in the above embodiment, by using the variable value as the target code content, the variable value in the initial test case is replaced by the variable value in the initial variation case, so that the semantic information and the grammar structure in the initial test case can be reserved to the maximum in the generated target variation case, and the grammar correctness of the target variation case is further improved.
In one embodiment, determining a mutation point based on an initial test case, intercepting code content cut to the mutation point in the initial test case, and writing a code meeting a grammar standard on the basis of the intercepted code content to obtain an initial mutation case, wherein the method comprises the following steps: determining mutation points based on the first line of code content of the initial test case, intercepting the code content from the initial test case to the mutation points, and writing codes meeting grammar standards for a plurality of times on the basis of the intercepted code content so as to obtain a plurality of initial variation cases; extracting variable values in the initial variation use case to obtain a variable value set, and selecting a target variable value from the variable value set, wherein the method comprises the following steps: extracting variable values from a plurality of initial variation cases to obtain a variable value set; combining variable values in the variable value set according to the variable value quantity contained in the object code content to obtain a plurality of replacement variable groups; and sequentially replacing the variable values corresponding to the target code content with the variable values in each replacement variable group to obtain a plurality of target variation cases.
Specifically, referring to fig. 6, after an initial test case is obtained, the computer device determines the obtained initial test case, if the initial test case does not include a variable definition, the mutation flow of the initial test case is ended, the next initial test case is obtained again to perform the determination, if the initial test case includes a variable definition, on the one hand, the last character of the first row of the initial test case is determined to be a mutation point, so that the code content of the first row is intercepted to perform repeated code renewal to obtain a plurality of initial mutation cases, the parameter information of the initial test case can be obtained by using the first row as a prefix to perform renewal, and when the variable statement is performed, the variable related to the parameter can be generated, and the connection between the generated variable and the initial test case can be enhanced. The computer equipment can extract all variable values in the initial variation use cases to obtain a variable value set, in order to avoid the situation that the variation time is too long due to too many combination conditions in the subsequent variable value combination process, the computer equipment can further judge whether the number of the variable values in the variable value set is larger than the target number, if the number of the variable values is larger than the target number, the variable values with the target number are selected from the variable value set to be stored in a variable list, otherwise, all the variable values in the variable value set are stored in the variable list.
On the other hand, the computer equipment can extract variable values in the initial test cases, combine the variable values in the variable list according to the number of the variable values in the initial test cases to obtain a plurality of replacement variable groups, and replace the variable values corresponding to the object code content with the variable values in each replacement variable group in sequence to obtain a plurality of object variation cases. The target number may be set as needed, and the target number may be 10, for example.
In the above embodiment, since the code is rewritten for a plurality of times, a plurality of initial variant cases are obtained, and the variable values extracted from the initial variant cases are arranged and combined to obtain a plurality of replacement variable groups, the utilization rate of the initial test cases can be improved, and as many target variant cases as possible are obtained.
In one embodiment, as shown in fig. 7, a test data processing method is provided, referring to fig. 7, after the computer device obtains an initial test case, it first determines the initial test case, if the initial test case does not include a variable definition, it directly enters a step of replacing a code block: firstly determining mutation points from an initial test case, intercepting code content intercepted to the mutation points in the initial test case, performing code renewal to obtain an initial variation case, taking the initial variation case as a target variation case, determining target code blocks from the initial test case, and replacing the target code blocks in the initial test case with replacement code blocks at corresponding positions in the initial variation case to obtain the target variation case; if the initial test case contains variable definition, firstly keeping the variable definition row unchanged, and entering a step of replacing a code block: and taking the next line defined by the variable as a mutation point to carry out code renewal to obtain an initial mutation case, replacing the target code block in the initial test case with a replacement code block at a corresponding position in the initial mutation case, keeping the code block unchanged, and replacing the variable to obtain the target mutation case. The step of replacing the code block may refer to the description of fig. 4, and the step of replacing the variable may refer to the description of fig. 5, which is not repeated herein.
In one embodiment, determining the mutation point based on the initial test case includes: testing the object to be tested based on the initial test case to obtain a test result corresponding to the initial test case; under the condition that the test result is an abnormal test result, determining mutation points from the initial test case; and under the condition that the test result is a normal test result, re-acquiring the test case different from the initial test case, determining the re-acquired test case as the initial test case, and entering the step of testing the object to be tested based on the initial test case to continue to execute.
The test cases different from the initial test case refer to test cases different from the initial test case. In this embodiment, after the computer device obtains the initial test case, the computer device may enter a test flow, test the object to be tested with the initial test case to obtain a test result corresponding to the initial test case, and determine whether the test result is a normal test result or an abnormal test result, where the test result is an abnormal test result, which indicates that the code block corresponding to the initial test case in the object to be tested has an abnormality, and is likely to be a code block that is prone to generate a bug or a defect, then the computer device may determine a mutation point from the initial test case to perform a subsequent mutation step to obtain a target mutation case, and further may perform a continuous test on the object to be tested through the target mutation case, thereby improving the code coverage rate in the test process and increasing the bug discovery probability in the test process.
Under the condition that the test result is a normal test result, the code block corresponding to the initial test case in the object to be tested is the correct code block, then the computer equipment can filter the initial test case, acquire the test case different from the initial test case again, determine the acquired test case as the initial test case, and enter the step of testing the object to be tested based on the initial test case to continue to be executed. Here, to avoid the infinite loop of this step, a stop condition may be set as needed, for example, the loop time may reach a preset time or the loop number may reach a preset number.
It can be appreciated that in one embodiment, when the initial test case is a test case including only functions, the functions need to be assembled first to obtain a complete test case, so that the test can be performed on the object to be tested.
In the above embodiment, when determining the mutation point based on the initial test case, the method may enter a test flow, and test the object to be tested, on one hand, the method may continuously test the test case to be tested, and on the other hand, the test case with correct code may be filtered through the test result, so that the test process leads to a code module which is easy to be abnormal, and the test efficiency and the vulnerability discovery probability are improved.
In one embodiment, testing an object to be tested based on an initial test case to obtain a test result corresponding to the initial test case includes: testing a plurality of test objects with different versions based on the initial test case to obtain respective corresponding test results of each test object to be tested; under the condition that the test results corresponding to all the objects to be tested are consistent, determining the test result of the initial test case as a normal test result; under the condition that the test results corresponding to the objects to be tested are inconsistent, determining the test result of the initial test case as an abnormal test result.
Specifically, the computer device may test a plurality of test objects of different versions based on the initial test case, obtain respective test results corresponding to each test object, compare the test results, if the test results corresponding to each test object are consistent, the test results of the initial test case are normal test results, if one or more test results of the test object is inconsistent with the test results of other test objects, which indicate that there is a test object with a defect or a bug, the computer device may determine that the test results of the initial test case are abnormal test results, so that the initial test case is mutated to obtain more target mutation cases, and continuously test the test object, thereby guiding the test to a code module with a bug or a bug.
In the embodiment, the initial test case can be accurately and rapidly judged to be the abnormal test case by comparing the differentiated behaviors of a plurality of to-be-tested objects with different versions to the same initial test case, so that the test efficiency and the vulnerability discovery probability are improved.
In one embodiment, the test data processing method further includes: acquiring a first preset variation rule; and mutating the initial test case based on a first preset mutation rule to obtain a first rule mutation case, wherein the first rule mutation case is used for testing the object to be tested.
The first preset mutation rule is obtained based on target priori knowledge, the target priori knowledge is obtained based on historical vulnerability information summary, and the historical vulnerability information is a use case or script which historically triggers the vulnerability. In this embodiment, the computer device may perform mutation on the initial test case according to the obtained first preset mutation rule to obtain a first rule mutation case, and further may perform a test on the object to be tested according to the first rule mutation case.
In one embodiment, the computer device may test the initial test case to obtain a test result, and only if the test result is an abnormal test result, the step of mutating the initial test case based on the first preset mutation rule to obtain a first rule mutation case is performed, so that the correct test case can be filtered, and the test efficiency is improved.
In one embodiment, the computer device may generate a grammar tree for the initial test case, traverse each node in the grammar tree, and perform a corresponding mutation operation on the nodes that meet the first preset mutation rule, the mutation operation including at least one of modifying the nodes, adding the nodes, deleting the nodes, and replacing the nodes.
In the above embodiment, since the first preset mutation rule is based on a priori knowledge summarized by the use cases in which the vulnerability is triggered historically, such mutation policy is more directional, and can perform continuous high-strength test on the potential defect module, so that the probability of vulnerability discovery can be improved. The first preset variation rule will be illustrated by a plurality of embodiments.
In one embodiment, mutating the initial test case based on a first preset mutation rule includes: acquiring a plurality of preset application program interface pairs; the application program interface pair comprises two semantically matched application program interfaces; when the initial test case includes any one of the application program interfaces in the application program interface pair, the included application program interface is replaced with the other application program interface in the application program interface pair.
Considering that semantically similar APIs (Application Programming Interface, application program interfaces) often call the same underlying code module when implemented by a developer, replacing between these APIs can guide the variation to the same code module, a more comprehensive test engine. The computer device may obtain a plurality of preset application program interface pairs, where the application program interface pairs include two semantically matched application program interfaces, compare the application program interfaces in the initial test case with the application program interfaces involved in the application program interface pairs, and replace the included application program interface with another application program interface in the application program interface pairs when the initial test case includes any one of the application program interfaces in the application program interface pairs.
In a specific embodiment, the preset plurality of application program interface pairs include one or more of the following:
regexp.protype and regexp.protype, test, regexp.protype [ @ and @ match ] and string.protype, regexp.protype [ @ and match ] and string.protype, match all, regexp.protype [ @ and search ] and string.protype, search, regexp.protype [ @ and replace ] and string.protype, replay, regexp.protype @ and split ] and string.protype.
For example, assuming that the preset plurality of application program interface pairs include regexp.protype.exec and regexp.protype.test, when the initial test case includes regexp.protype.exec, the regexp.protype.exec may be replaced with regexp.protype.test.
In one embodiment, considering that the grammar correctness of the test case can be ensured by replacing the APIs with the same return value, and the forms of the APIs with the same return value are more various, the test range can be enlarged, and the preset plurality of application program interface pairs acquired by the computer equipment can also be interface pairs comprising two application program interfaces with the same return value. Such as String () and object. Similarly, the computer device may compare the application program interfaces in the initial test case with those of the application program interface pairs, and when the initial test case includes any one of the application program interfaces in the application program interface pairs, replace the included application program interface with the other application program interface in the application program interface pairs.
In one embodiment, mutating the initial test case based on a first preset mutation rule includes: identifying an instance object in the initial test case; for the identified at least one instance object, prototype-chain properties of the instance object are modified to prototype-chain pollute the instance object.
The prototype chain pollution is a vulnerability type, and mainly refers to that the attribute of __ proto __ of an instance object is modified, so that the related attribute of the prototype chain of the object is modified, and the prototype chain pollution is caused. In this embodiment, the computer device may analyze the syntax tree corresponding to the initial test case, identify the Object instance Object in the initial test case, and modify the prototype chain attribute by the Object in the case through __ proco __ and Object.
By way of example, assume that the partial code of the initial test case is as follows:
then, after the prototype chain attribute corresponding to the instance object is modified, the partial codes of the obtained first rule variation use case are as follows:
it can be seen that the first rule variation case is obtained by adding the code line "c. __ proto __. Foo= 'G'; "prototype chain properties are modified so that the prototype chain can be tested for contamination of this vulnerability type.
In one embodiment, mutating the initial test case based on a first preset mutation rule includes: identifying an array in the initial test case; and modifying the length attribute of the array into one of a boundary value or a target value for the identified at least one array, wherein the target value is a value larger than a preset value threshold.
Herein, plastic overflow is also a type of vulnerability, which means that when the size of the system in calculating the allocated space is too small, data is put into a smaller storage space than itself, and thus overflow occurs. Such loopholes exist in array operations, for example, when JavaScript performs an array expansion operation, a function first calculates the length of an output array, and then performs space allocation and initialization operations. However, the array space length calculated at this time is unknown and no boundary detection is performed at the time of execution, so that overflow is likely to occur. Based on this, in this embodiment, the computer device may analyze the syntax tree corresponding to the initial test case to identify the array definition, and then use the get/set, __ defingetter __ (prop, func), __ defineSetter __ (prop, func) to modify the attributes related to the array in the case, mainly to modify the length of the syntax tree to some boundary value or target value, where the target value is a value greater than a preset value threshold, and the preset value threshold may be set as required.
By way of example, assume that the partial code of the initial test case is as follows:
the first rule variation obtained after modifying the length attribute of the array is as follows:
In the above example, the entire overflow can be tested for this type of vulnerability by modifying the array length to-4294967295 by o. __ defining a fingetter __ (0, function (o.length= -4294967295;)).
In one embodiment, mutating the initial test case based on a first preset mutation rule includes: identifying a function in the initial test case; and for the identified at least one function, adding a preset number of times of cyclic calling to the function, wherein the preset number of times is larger than a preset number of times threshold.
Specifically, in order to improve performance, some objects to be tested introduce a JIT (Just-In-Time) mechanism that dynamically compiles JS code into machine code at runtime for functions that are continuously called multiple times In a short period of Time In the code, rather than using an interpreter to execute conventionally, to increase the running speed. However, this part also has a large number of vulnerabilities due to imperfections in the implementation of functions, such as imperfections in the type checking of code during normal compilation execution and JIT execution. In this embodiment, the functions in the use case are identified according to the syntax tree corresponding to the initial test case, and then loops are added to the functions, and each function is looped for a preset number of times, so as to trigger JIT optimization, and further find out related problems. The preset number of times may be set as needed, for example, 1000 times.
By way of example, assume that the partial code of the initial test case is as follows:
the first rule variation obtained by identifying the function therein is represented by, for example, (part of the code) as follows:
it can be seen that in the above example, 1000 loop calls are added to the function func () identified in the initial test case, so that JIT-type vulnerabilities can be tested.
In one embodiment, the method further comprises:
specifically, the computer device may mutate the initial test case according to a second preset mutation rule to obtain a second rule mutation case, and further may test the object to be tested through the second rule mutation case. The first preset variation rule will be illustrated by a plurality of embodiments.
In one embodiment, the computer device may replace the operators in the initial test case with operators consistent with the number of its operational objects. Specifically, the computer device may determine a node in the code according to the syntax tree corresponding to the initial test case, and if the node is an operator node, replace the node, where the replacement is performed between operators having the number of operation objects. The operator nodes include, but are not limited to, arithmetic operators BinaryExpress and UpdateExpress, assignment operators AssignmentExpress, comparison operators BinaryExpress, logical operators LogicalExpress and UnaryExpress, and conditional operators Condition Express. For example, addition (+) is a binary arithmetic operator that accepts only two operands, and also multiplication (x), then addition and multiplication may be replaced, such as 1+3 becoming 1*3.
In one embodiment, the conditional statement in the initial test case is replaced with a conditional statement that semantically matches it. Specifically, the computer device may preset conditional sentences that can be mutually replaced and that are semantically matched, match nodes in the initial test case according to the syntax tree of the initial test case, and replace the conditional sentences with the conditional sentences that are semantically matched if the nodes are matched with the set conditional sentences. For example, the conditions of the while sentence (whitestatement) and the if sentence (ifstate) not including else are boolean values, the grammar structure similarity of the two is big and the semantic similarity is small, if one of the two nodes is successfully matched according to the grammar tree of the initial test case, the two nodes can be replaced by another conditional sentence to form a new grammar tree.
In one embodiment, a block of code other than the variable declaration is randomly deleted from the initial test case. Specifically, the computer device may randomly select a node other than a variable declaration (VariableDeclaration) on the nodes in the function body to delete after parsing the initial test case into the syntax tree.
In one embodiment, a code block is randomly selected from the initial test case, and the selected code block is wrapped with statements that are true. Specifically, taking into account that the random code blocks in the initial test case are wrapped by statements with true use conditions, the control flow of the initial test case can be changed under the condition that the grammar of the test case is correct after mutation, a certain change is added, and the insertion of a dead loop can cause a certain memory and optimization problem.
In one embodiment, code is added in the initial test case. Specifically, the code adding mainly comprises two modes, namely, the live code adding, wherein the computer equipment can add codes related to the test case in the initial test case, so that the data flow and the control flow of the initial test case are changed; and secondly, adding dead codes, wherein the main purpose of the dead code addition is to test the optimization problem of the object to be tested, and the computer equipment can add codes which are completely irrelevant to the initial test case or codes which are relevant to the initial test case but do not affect the final execution result.
In the above embodiment, the initial test cases are mutated according to the second preset mutation rule, so that a large number of second rule mutation cases can be rapidly generated, and the test efficiency is improved.
In a specific embodiment, the application further provides an application scenario, in which the test data processing method is applied to test a JavaScript engine (hereinafter abbreviated as JS engine), and tests security, functionality and optimization problems of the JavaScript engine. In this embodiment, a JavaScript script including a defect or vulnerability of a JS engine triggered by history and a JS engine official test suite are collected as an initial test case seed pool, and a deep learning technology is used to expand the initial test case seed pool, and based on priori knowledge of JavaScript vulnerability types and in combination with a guiding mutation rule of a general test case mutation method, a test case generating an abnormal test result in a differential test process is subjected to directional mutation, so as to generate a test case with guiding property, and a code module easily generating the defect or vulnerability in the JS engine is subjected to high-intensity continuous test. Compared with the traditional test method, the test data processing method provided by the embodiment can guide the test to the code module which is easy to generate loopholes or defects on the premise of considering the validity and the diversity of the test cases, and further improves the test efficiency and the probability of loophole discovery.
As shown in fig. 8, the overall flowchart of the test data processing method provided in this embodiment may be divided into four modules, namely, a data collection and preprocessing module, a test case initial seed pool expansion module, a differential test module, and an iterative mutation module based on abnormal behaviors, where the functions of the modules are as follows:
and the data collection and preprocessing module is used for: the module mainly collects two parts of data: one part is to collect a JavaScript program in an open source code warehouse for training a test case generation model; and the other part is to collect initial test scripts in each JavaScript engine test suite and vulnerability library for constructing an initial test case seed pool.
The test case initial seed pool expansion module: the quantity of the initially collected official test suites of each engine and the JavaScript scripts of which the historic trigger JS engine is defective or loophole is limited, and the data is expanded by using a mutation generation method in a mutation module, so that the data quantity and diversity of a seed pool are increased.
And a differential test module: the module performs differential test on JS engines of multiple manufacturers on test cases in a seed pool, saves and compares test results, filters and marks out test cases with abnormal test results, and performs manual analysis and confirmation on the test cases.
An iterative mutation module based on abnormal behaviors: based on the abnormal test cases of the test result, the module generates new test cases by utilizing the guidance variation rule and adds the new test cases into the seed pool, and then, the iterative test is continuously carried out by utilizing the newly generated test cases until no abnormal behavior is generated any more. The next round of testing will randomly select a test case from the seed pool.
The following is a specific implementation process of each module:
1. data collection and preprocessing module
The module firstly crawls a large number of JavaScript engineering projects from Github according to collection quantity, then acquires JavaScript scripts of which the history triggers JavaScript engine defects or loopholes from the official network of each JavaScript engine and the JavaScript engine official TEST case library TEST262, then retrieves the two major data, traverses and reserves the JavaScript scripts, extracts the scripts by taking Function as a unit through data preprocessing, takes the former part of data as model training data, and takes the latter part of data as original data of an initial TEST case seed pool.
1.1 model training data acquisition
The model GPT-2 is generated by using a natural language based on a transducer, the model is an unsupervised deep learning model, codes are actually regarded as texts by using the model, and the model is enabled to generate new texts according to the text formats and interface calls of the codes in training data, so that the training data are required to be ensured to be correct in format and wide in content.
Github is the largest open source social programming and code hosting website worldwide, and has a wide variety of fully functional open source code libraries that can be categorized according to the language of the project source code. In this embodiment, javaScript categories are selected according to the language categories of GitHub, then these items are ordered according to the star number, the actual engineering items of the first 4000 are selected as initial data of model training data, and then the initial data are processed into independent functions through data preprocessing, and the independent functions are used as model training data for fine tuning of the model.
1.2 raw data acquisition of initial test case seed pool
If GPT-2 is used alone to test cases, the generated cases have too high randomness, so that the test is easily guided to code modules irrelevant to defects or loopholes, the test efficiency is reduced, and the existing research proves that the continuous test of the engine code modules covered by the test cases which have triggered code Bug (fault or defect) is more likely to trigger Bug of the engine again. Therefore, the present embodiment makes full use of JavaScript scripts of defects or vulnerabilities of the history-triggered JS engine and official test cases of each big engine, constructs an initial test case seed pool by using these as initial test cases, then performs data preprocessing on the initial test case seed pool to be a Function with the same data format as that of the generated model, and uses the test case generation model obtained based on deep learning training to generate variation on the Function so as to increase the data quantity of the seed pool and enrich the code content of the seed pool.
1.3 data preprocessing
To ensure learning effect of the language model and facilitate subsequent test analysis, the module processes the raw training data collected from the GitHub into independent functions. The module mainly uses three tools: esprima, an ECMAScript parser with excellent performance and meeting the standard, which is used for lexical or grammatical analysis of JS codes; the JSHint is a commonly used JS code specification checking tool, which provides nearly 60 optional configurations for users, and has extremely high flexibility; uglifyJS is a practical tool integrating functions of JS interpreter, code minimization, code compression, code beautification and the like. Firstly, traversing all JavaScript items, extracting JavaScript files in the JavaScript items, extracting functions in the JavaScript files according to regular matching if the number of lines of the content of the files is more than 1000 lines, otherwise, generating a grammar tree by using Esprima, and extracting functions according to the grammar tree; after the Function is extracted, the Function needs to be backfilled with variables to ensure that the Function can keep the information of the original file and undefined variables cannot appear; after the complete Function is obtained, the JSHint is required to be used for grammar filtering to ensure the effectiveness of training data; in order to further improve the quality of the data set, the effective test cases are subjected to duplication removal and annotation removal, the rules manually formulated according to the characteristics of the data set are filtered, and finally, the result is stored in a database for standby by using a UglifyJS unified code format.
The model training data is used for fine tuning of the test case generation model GPT-2, and the original data of the initial test case seed pool is used for expansion of the test case seed pool.
1.4 model training
The training process of GPT-2 is divided into two stages of pre-training and fine-tuning. The pre-training stage is to perform unsupervised word prediction based on huge monolingual corpus, and train a generated language model; the fine tuning stage is to perform further fine tuning training on the target data set. In this embodiment, a basic model with parameters of GPT-2 being 117M is used, the JS code obtained by preprocessing is used as a fine adjustment data set to update the weight fine adjustment model of the last two full connection layers of the GPT-2 model obtained by pre-training, the training duration is determined according to the quality of the generated sample, and finally the generated model conforming to the training data is obtained.
2. Initial seed pool expansion module for test case
The fuzzy test needs enough data volume to ensure the comprehensiveness of the test range and the effectiveness of the test effect, the official test suite of the engine and the JavaScript script of the history trigger JS engine defect or loophole collected on the CVE official network are obtained from the official network of each JavaScript engine, the quantity of the JavaScript script is small after the data is preprocessed, and the requirement of the fuzzy test is difficult to meet, so that the data is required to be expanded. The data expansion is time-consuming and labor-consuming, huge manpower is consumed, and the GPT-2 model which is subjected to fine adjustment training has excellent JavaScript code writing capability and can replace manpower. And generating variation on the Function extracted from the initial test case seed pool, and expanding the scale of the test case seed pool. Specific mutagenesis procedures are described in detail in reference to 4.2.
3. Differential test module
3.1 test case Assembly
For the functions extracted from the original file, if the grammar errors exist in the direct use engine execution, the complete use case needs to be assembled first. And adding variable declarations, function names, random transfer of Function parameters, function calling and output of calling results to each extracted Function to form a complete test case. The randomness and diversity of the test cases can be increased through the random transfer of the function parameters, and the comparison of the differential results of the engines is facilitated through the function call and the result output.
For random transfer of function parameters, parameter number judgment is carried out according to parameter definition in AST, seven basic types (Object, boolean, number, string, array, null, undefined) in JavaScript are randomly selected for transfer by each parameter, and each test case is assembled ten times.
3.2 differential testing
The differential test technology can automatically infer an engine which is possibly in error by comparing the differential behaviors of a plurality of compilers on the same test case, and reduce the human participation degree as much as possible by simplifying and positioning the test case. If the execution result of one compiler is inconsistent with the result of other engines, it is likely that a Bug exists. After screening, the following eight engines are selected for differential test, see table 1, wherein a JavaScript engine installation update tool jsvu is selected for each engine during installation, and the latest version of each JavaScript engine can be easily installed without compiling from source codes.
TABLE 1
ID Subject to be tested Version of
1 V8 v9.9.1
2 SpiderMonkey JavaScript-C96.0
3 ChakraCore v1.11.24.0
4 JavaScriptCore v286936
5 QuickJS v2021-03-27
6 JerryScript v3.0.0(fea10bb7)
7 Gralljs v21.3.0
Performing differential test on initial test cases in a seed pool one by one, performing compiling execution on all engines, voting an execution result, marking the test cases with inconsistent execution results and most engine execution results as suspicious cases, namely test cases with abnormal test results, and storing the test cases and related information such as corresponding source functions, test results and the like into a suspicious case database. The suspicious cases are subjected to iterative mutation based on abnormal behaviors after being marked, and the test cases obtained through mutation enter the differential test module again for circulation. In addition, the test case with abnormal test results also needs to be manually analyzed, and the step-by-step analysis is performed by combining the execution results of all the engines and the differential test results and combining the ECMA-262 standard to determine whether the suspicious case is false alarm or bug. If the bug is determined to be the bug, the bug needs to be submitted to the corresponding engine developer, and the developer waits for confirmation.
4. Iterative variation module based on abnormal behaviors
4.1, generation of variants
Generating the mutation refers to using a GPT-2 model which is trained by fine tuning to supplement and write the Function test cases, so that incomplete functions form complete functions. There are three main ways to generate the variation:
1) Direct code renewal
Determining initial test cases, determining mutation points based on code contents of a first row to a second last row in the initial test cases respectively, intercepting the code contents from the initial test cases to all mutation points respectively, writing codes meeting grammar standards on the basis of the intercepted code contents respectively, splicing the intercepted code contents with corresponding writing codes respectively to obtain initial variant cases corresponding to the intercepted code contents respectively, and taking the initial variant cases as target variant cases by computer equipment.
The initial test case can be selected from an initial test case seed pool, or can be a test case with abnormal test results obtained in a differential test module.
2) Replacement code block
And determining the code block of the initial test case, which does not contain the first line of the intercepted code content, as a target code block, determining the code block of the initial test case, which does not contain the first line of the intercepted code content, as a replacement code block for each initial variation case, and replacing the target code block of the initial test case with each replacement code block respectively to obtain a plurality of target variation cases.
3) Substitution variable
Determining at least one variable value in an initial test case as target code content to be replaced, determining the last character of the first line code of the initial test case as a mutation point, intercepting the code content which is cut off to the mutation point in the initial test case, writing the code which accords with the grammar standard on the basis of the intercepted code content for multiple times to obtain multiple initial variation cases, extracting variable values from the multiple initial variation cases to obtain a variable value set, combining the variable values in the variable value set according to the variable value quantity contained in the target code content to obtain multiple replacement variable groups, and replacing the variable values corresponding to the target code content with the variable values in each replacement variable group in sequence to obtain multiple target variation cases.
After the target variant cases obtained in the three modes are assembled to form the complete test case through the test cases, the complete test case can be added into an initial test case seed pool, and the initial test case seed pool is expanded.
4.2 rule variation
The rule mutation is to perform secondary mutation on the test cases with abnormal test results obtained from the differential test module, so that a large number of test cases with similar code content are obtained, and the test range of the trigger problem module is enlarged. Unlike the Function as the variant object, the variant object of the rule variant is the complete test case which is already assembled.
Mutation is realized by using a JavaScript grammar analysis tool Esprima, an AST node traversing tool Estraverse and an AST generation source code tool Escodegen. Analyzing the source code into AST by using Esprima, performing traversal matching on the nodes of the AST by using Esreverse according to a mutation rule, performing corresponding operations on the nodes, such as adding nodes, replacing nodes and the like, and generating JS source codes from the AST by using Escodegen after the modified AST is obtained, thereby realizing the whole mutation process. When AST nodes are replaced, the main steps are that AST nodes are initialized, then an input file is analyzed to be AST, whether the AST contains a replaceable method node or not is matched, if yes, a corresponding node is selected to replace, a corresponding method name and a variable name of the original node are stored, and finally the corresponding method name and the variable name in the replaced node are replaced. When the AST node is added, the main steps are that the AST node is initialized, then an input file is analyzed to be AST, the previous node of the node to be added is matched, the variable names, the function names and the positions of father nodes where the node is located in the node are stored, then the node addition is carried out after the matched node, and the corresponding variable names and function names are replaced after the addition is finished.
4.2.1 rule mutation tool
1、Esprima
Esprima is an ECMAScript parser with excellent performance and meeting the standard, is used for lexical or grammatical analysis of JS codes, and is also a JS grammar parsing tool widely applied at present. The tool comprises the following characteristics: fully support ECMAScript2016; the grammar tree has reasonable structure, and the positions of grammar nodes can be selectively monitored; through strict experimental tests; JSX is supported. Esprima mainly provides two APIs, namely: the paramescript and paramodule may be selected based on whether the code contains statements similar to import, export. The JS codes are analyzed by Esprima to form a grammar tree with reasonable format, and then the grammar tree level of the test case can be modified according to the grammar tree.
2、Estraverse
Estraverse is a widely used JS syntax tree node traversal tool, typically used in combination with Esprima to traverse the syntax tree. Estraverse contains two APIs: the estraverse. And estraverse. Replace, the estraverse. Is mainly used for node traversal, and the estraverse. Replace is mainly used for node replacement.
3、Escodegen
Escodegen (escodegen) is a JavaScript syntax tree code generator from Mozilla, which is mainly used for generating JS source codes reversely from a JS syntax tree generated by parsing the JavaScript syntax tree code. Escodegen can be used in a browser or installed using npm.
4.2.2 rule variation main content
The rule mutation mainly comprises two major types, namely, the general rule mutation (namely, the second preset mutation rule) applicable to most test cases comprises a plurality of common mutation methods; and secondly, carrying out targeted mutation on the test case by summarizing a special mutation rule (namely a first preset mutation rule) obtained by the historical vulnerability information.
1. General mutation rules. The universal variation rules include one or more of the following:
1) Replacing the operators in the initial test case with operators with the same number as the operation objects;
2) Replacing the conditional statement in the initial test case with a conditional statement semantically matched with the conditional statement;
3) Randomly deleting a code block except for a variable declaration from the initial test case;
4) Randomly selecting a code block from the initial test case, and wrapping the selected code block by using a statement with true conditions;
5) Code is added to the initial test case.
Reference may be made specifically to the description of the embodiments above, and this application is not repeated here.
2. A special mutation rule. The special variation rule includes one or more of the following rules:
1) Acquiring a plurality of preset application program interface pairs; the application program interface pair comprises two semantically matched application program interfaces or two application program interfaces with the same return value; when the initial test case includes any one of the application program interfaces in the application program interface pair, the included application program interface is replaced with the other application program interface in the application program interface pair.
2) Identifying an instance object in the initial test case; for the identified at least one instance object, prototype-chain properties of the instance object are modified to prototype-chain pollute the instance object.
3) Identifying an array in the initial test case; and modifying the length attribute of the array into one of a boundary value or a target value for the identified at least one array, wherein the target value is a value larger than a preset value threshold.
4) Identifying a function in the initial test case; and for the identified at least one function, adding a preset number of times of cyclic calling to the function, wherein the preset number of times is larger than a preset number of times threshold.
Reference may be made specifically to the description of the embodiments above, and this application is not repeated here.
4.3 iterative differential testing
The test cases after the mutation and the rule mutation are generated need to be subjected to differential test again, wherein the test cases obtained by the mutation are functions, the test cases need to be assembled by 3.1 and then enter a differential test module, and the test cases after the rule mutation can be directly tested. And (3) through differential test, the test cases generating the abnormal test results and relevant execution information are stored, and then the test cases causing the abnormal test results enter an 'abnormal behavior-based iterative variation module' again, so that continuous iterative differential test is realized.
It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a test data processing device for realizing the above-mentioned test data processing method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation of one or more embodiments of the test data processing device provided below may refer to the limitation of the test data processing method hereinabove, and will not be repeated herein.
In one embodiment, as shown in FIG. 9, there is provided a test data processing apparatus 900 comprising:
an initial case acquisition module 902, configured to acquire an initial test case for a to-be-tested object;
the code writing module 904 is configured to determine a mutation point based on the initial test case, intercept the code content from the initial test case up to the mutation point, and write the code meeting the grammar standard on the basis of the intercepted code content, so as to obtain an initial mutation case;
an object code determining module 906, configured to determine object code content to be replaced from the initial test case;
a replacement code determining module 908, configured to determine, as replacement code content, code content in which the structure attribute matches the structure attribute of the target code content in the initial mutation use case;
The code replacement module 910 is configured to replace the target code content in the initial test case based on the replacement code content, so as to obtain a target variant case, where the target variant case is used for testing the object to be tested.
According to the test data processing device, the initial test case aiming at the object to be tested is obtained, the mutation point is determined based on the initial test case, the code content which is cut off to the mutation point in the initial test case is intercepted, the code which accords with the grammar standard is rewritten on the basis of the intercepted code content so as to obtain the initial variation case, the target code content which is to be replaced is determined from the initial test case, the code content which is in the initial variation case and is matched with the structure attribute of the target code content is determined as the replacement code content, the target code content in the initial test case is replaced based on the replacement code content so as to obtain the target variation case, and because the target variation case is obtained by replacing the target code content in the initial test case based on the replacement code content and is matched with the structure attribute of the target code content, the target variation case can keep the semantic information in the initial test case as much as possible, the structure of the initial test case is prevented from being damaged, the grammar correctness is ensured, and the accuracy of the test case obtained by variation is improved.
In one embodiment, the target code determining module is further configured to determine, as target code content to be replaced, a target code block whose position is ordered before the ordering threshold value among code blocks in the initial test case that do not contain the intercepted code content; the replacing code determining module is further used for determining a code block corresponding to the sorting position of the target code block as a replacing code block corresponding to the target code block in the code blocks which do not contain the intercepted code content in the initial variation use case; and the code replacement module is also used for replacing the target code block in the initial test case with a replacement code block corresponding to the target code block to obtain a target variant case.
In one embodiment, the code writing module is further configured to traverse a plurality of lines of code contents in the initial test case, and sequentially determine mutation points based on the traversed lines of code contents; intercepting code content cut-off to each mutation point in the initial test case respectively, and writing codes meeting grammar standards on the basis of the intercepted code content respectively so as to obtain initial variant cases corresponding to the intercepted code content respectively; the replacement code determining module is further configured to determine, for each initial mutation case, a code block corresponding to a position of the target code block from code blocks in the initial mutation case that do not include the intercepted code content corresponding to the initial mutation case as a replacement code block corresponding to the target code block, so as to obtain a plurality of replacement code blocks corresponding to the target code block; and the code replacement module is also used for replacing the target code blocks in the initial test cases with the replacement code blocks corresponding to the target code blocks respectively to obtain a plurality of target variant cases.
In one embodiment, the object code determining module is further configured to determine at least one variable value in the initial test case as the object code content to be replaced; the replacement code determining module is also used for extracting variable values in the initial variation use case to obtain a variable value set, and selecting a target variable value from the variable value set; and the code replacement module is also used for replacing the variable value in the target code content by the target variable value to obtain the target variant use case.
In one embodiment, the code writing module is further configured to determine a mutation point based on the first line of code content of the initial test case, intercept the code content from the initial test case up to the mutation point, and write the code meeting the grammar standard for multiple times based on the intercepted code content, so as to obtain multiple initial variant cases; the replacement code determining module is also used for extracting variable values from a plurality of initial variation use cases to obtain a variable value set; combining variable values in the variable value set according to the variable value quantity contained in the object code content to obtain a plurality of replacement variable groups; and the code replacement module is also used for replacing the variable values corresponding to the target code content with the variable values in each replacement variable group in sequence respectively to obtain a plurality of target variant cases.
In one embodiment, the code writing module is further configured to: testing the object to be tested based on the initial test case to obtain a test result corresponding to the initial test case; under the condition that the test result is an abnormal test result, determining mutation points from the initial test case; and under the condition that the test result is a normal test result, re-acquiring the test case different from the initial test case, determining the re-acquired test case as the initial test case, and entering the step of testing the object to be tested based on the initial test case to continue to execute.
In one embodiment, the code writing module is further configured to: testing a plurality of different versions of objects to be tested based on the initial test case to obtain respective corresponding test results of the objects to be tested; under the condition that the test results corresponding to all the objects to be tested are consistent, determining the test result of the initial test case as a normal test result; under the condition that the test results corresponding to the objects to be tested are inconsistent, determining the test result of the initial test case as an abnormal test result.
In one embodiment, the apparatus further comprises: the first rule mutation module is used for acquiring a first preset mutation rule, wherein the first preset mutation rule is obtained based on target priori knowledge, and the target priori knowledge is obtained based on historical vulnerability information; and mutating the initial test case based on a first preset mutation rule to obtain a first rule mutation case, wherein the first rule mutation case is used for testing the object to be tested.
In one embodiment, the first rule mutation module is further configured to obtain a plurality of preset application program interface pairs; the application program interface pair comprises two semantically matched application program interfaces or two application program interfaces with the same return value; when the initial test case includes any one of the application program interfaces in the application program interface pair, the included application program interface is replaced with the other application program interface in the application program interface pair.
In one embodiment, the first rule mutation module is further configured to identify an instance object in the initial test case; for the identified at least one instance object, prototype-chain properties of the instance object are modified to prototype-chain pollute the instance object.
In one embodiment, the first rule mutation module is further configured to identify an array in the initial test case; and modifying the length attribute of the array into one of a boundary value or a target value for the identified at least one array, wherein the target value is a value larger than a preset value threshold.
In one embodiment, the first rule mutation module is further configured to identify a function in the initial test case; and for the identified at least one function, adding a preset number of times of cyclic calling to the function, wherein the preset number of times is larger than a preset number of times threshold.
In one embodiment, the apparatus further comprises: the first rule mutation module is used for mutating the initial test case according to a second preset mutation rule to obtain a second rule mutation case, and the second rule mutation case is used for testing the object to be tested; the second preset variation rule includes at least one of: replacing the operators in the initial test case with operators with the same number as the operation objects; replacing the conditional statement in the initial test case with a conditional statement semantically matched with the conditional statement; randomly deleting a code block except for a variable declaration from the initial test case; randomly selecting a code block from the initial test case, and wrapping the selected code block by using a statement with true conditions; code is added to the initial test case.
In one embodiment, the initial test cases include at least one of a first test case, a second test case, or a third test case; the first test case is a test case obtained based on a standard test suite, the second test case is a test case obtained based on a history vulnerability script, and the third test case is obtained by performing code writing based on the first test case or the second test case.
The various modules in the test data processing apparatus described above may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 10. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store test case data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a test data processing method.
In one embodiment, a computer device is provided, which may be a terminal, and the internal structure thereof may be as shown in fig. 11. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a test data processing method. The display unit of the computer equipment is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device, wherein the display screen can be a liquid crystal display screen or an electronic ink display screen, the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on a shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structures shown in fig. 10 and 11 are block diagrams of only some of the structures associated with the present application and are not intended to limit the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided comprising a memory having a computer program stored therein and a processor that implements the steps of the test data processing method described above when the computer program is executed.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, implements the steps of the test data processing method described above.
In an embodiment a computer program product is provided comprising a computer program which, when executed by a processor, implements the steps of the test data processing method described above.
It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related countries and regions.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (18)

1. A method of testing data processing, the method comprising:
acquiring an initial test case aiming at a to-be-tested object;
determining mutation points based on the initial test case, intercepting code content from the initial test case to the mutation points, and writing codes meeting grammar standards on the basis of the intercepted code content so as to obtain an initial variation case;
Determining target code content to be replaced from the initial test case;
determining the code content with the structure attribute matched with the structure attribute of the target code content in the initial mutation use case as the replacement code content;
and replacing target code content in the initial test case based on the replacement code content to obtain a target variation case, wherein the target variation case is used for testing the object to be tested.
2. The method of claim 1, wherein the determining target code content to be replaced from the initial test case comprises:
determining target code blocks, of the code blocks of the initial test case, which do not contain the intercepted code content, wherein the target code blocks are positioned before a sequencing threshold value as target code content to be replaced;
the determining the code content with the structure attribute matched with the structure attribute of the target code content as the replacement code content in the initial mutation case comprises the following steps:
determining a code block corresponding to the sorting position of the target code block as a replacement code block corresponding to the target code block in the code blocks which do not contain the intercepted code content in the initial variation use case;
The replacing the target code content in the initial test case based on the replacement code content to obtain a target variant case includes:
and replacing the target code block in the initial test case with a replacement code block corresponding to the target code block to obtain a target variant case.
3. The method according to claim 2, wherein determining a mutation point based on the initial test case, intercepting code content of the initial test case up to the mutation point, and writing code meeting a grammar standard based on the intercepted code content to obtain an initial mutation case, includes:
traversing a plurality of lines of code contents in the initial test case, and sequentially determining mutation points based on the traversed lines of code contents;
intercepting code contents cut off to each mutation point in the initial test case respectively, and writing codes meeting grammar standards on the basis of the intercepted code contents respectively so as to obtain initial mutation cases corresponding to the intercepted code contents respectively;
the determining, from among the code blocks in the initial mutation use case that do not include the intercepted code content, a code block corresponding to the sorting position of the target code block as a replacement code block corresponding to the target code block includes:
For each initial mutation case, determining a code block corresponding to the position of the target code block as a replacement code block corresponding to the target code block in the code blocks which do not contain the intercepted code content corresponding to the initial mutation case, so as to obtain a plurality of replacement code blocks corresponding to the target code block;
the replacing the target code block in the initial test case with the replacement code block corresponding to the target code block to obtain a target variant case includes:
and respectively replacing the target code blocks in the initial test cases with the replacement code blocks corresponding to the target code blocks to obtain a plurality of target variant cases.
4. The method of claim 1, wherein the determining target code content to be replaced from the initial test case comprises:
determining at least one variable value in the initial test case as target code content to be replaced;
the determining the code content with the structure attribute matched with the structure attribute of the target code content as the replacement code content in the initial mutation case comprises the following steps:
extracting variable values in the initial variation use case to obtain a variable value set, and selecting a target variable value from the variable value set;
The replacing the target code content in the initial test case based on the replacement code content to obtain a target variant case includes:
and replacing the variable value in the target code content by the target variable value to obtain a target variation use case.
5. The method of claim 4, wherein determining a mutation point based on the initial test case, intercepting code content of the initial test case up to the mutation point, and writing code meeting grammar standard based on the intercepted code content to obtain an initial mutation case, comprises:
determining mutation points based on the first line of code content of the initial test case, intercepting the code content from the initial test case up to the mutation points, and writing codes meeting grammar standards repeatedly on the basis of the intercepted code content to obtain a plurality of initial variation cases;
extracting the variable value in the initial variation case to obtain a variable value set, and selecting a target variable value from the variable value set, wherein the method comprises the following steps:
extracting variable values from the plurality of initial variation cases to obtain a variable value set;
Combining variable values in the variable value set according to the variable value quantity contained in the target code content to obtain a plurality of replacement variable groups;
the replacing the variable value in the target code content by the target variable value to obtain a target variant use case comprises the following steps:
and sequentially replacing the variable values corresponding to the target code content with the variable values in each replacement variable group respectively to obtain a plurality of target variation cases.
6. The method of any one of claims 1 to 4, wherein the determining the mutation point based on the initial test case comprises:
testing the object to be tested based on the initial test case to obtain a test result corresponding to the initial test case;
under the condition that the test result is an abnormal test result, determining mutation points from the initial test case;
and under the condition that the test result is a normal test result, re-acquiring a test case different from the initial test case, determining the re-acquired test case as the initial test case, and entering the step of testing the object to be tested based on the initial test case to continue to execute.
7. The method of claim 6, wherein the testing the object to be tested based on the initial test case to obtain a test result corresponding to the initial test case comprises:
testing a plurality of different versions of to-be-tested objects based on the initial test case to obtain respective corresponding test results of each to-be-tested object;
under the condition that the test results corresponding to all the objects to be tested are consistent, determining that the test result of the initial test case is a normal test result;
and under the condition that the test results corresponding to the objects to be tested are inconsistent, determining the test result of the initial test case as an abnormal test result.
8. The method according to claim 1, wherein the method further comprises:
acquiring a first preset variation rule, wherein the first preset variation rule is obtained based on target priori knowledge, and the target priori knowledge is obtained based on historical vulnerability information;
and mutating the initial test case based on the first preset mutation rule to obtain a first rule mutation case, wherein the first rule mutation case is used for testing the to-be-tested object.
9. The method of claim 8, wherein mutating the initial test case based on the first preset mutation rule comprises:
acquiring a plurality of preset application program interface pairs; the application program interface pair comprises two semantically matched application program interfaces or two application program interfaces with consistent return values;
when the initial test case includes any one of the application program interfaces in the application program interface pair, the included application program interface is replaced with the other application program interface in the application program interface pair.
10. The method of claim 8, wherein mutating the initial test case based on the first preset mutation rule comprises:
identifying an instance object in the initial test case;
for at least one identified instance object, prototype chain properties of the instance object are modified to prototype chain pollute the instance object.
11. The method of claim 8, wherein mutating the initial test case based on the first preset mutation rule comprises:
Identifying an array in the initial test case;
and modifying the length attribute of the array into one of a boundary value or a target value for the identified at least one array, wherein the target value is a value larger than a preset value threshold.
12. The method of claim 8, wherein mutating the initial test case based on the first preset mutation rule comprises:
identifying a function in the initial test case;
and for the identified at least one function, adding a preset number of times of cyclic calling to the function, wherein the preset number of times is larger than a preset number of times threshold.
13. The method according to claim 1, wherein the method further comprises:
mutating the initial test case according to a second preset mutation rule to obtain a second rule mutation case, wherein the second rule mutation case is used for testing the object to be tested;
the second preset variation rule includes at least one of:
replacing the operators in the initial test case with operators with the same number as the operation objects;
replacing the conditional statement in the initial test case with a conditional statement matched with the semantic meaning of the conditional statement;
Randomly deleting a code block except a variable declaration from the initial test case;
randomly selecting a code block from the initial test case, and wrapping the selected code block by using a statement with true conditions;
adding codes in the initial test cases.
14. The method of any of claims 1 to 13, wherein the initial test case comprises at least one of a first test case, a second test case, or a third test case;
the first test case is a test case obtained based on a standard test suite, the second test case is a test case obtained based on a history vulnerability script, and the third test case is obtained by performing code writing based on the first test case or the second test case.
15. A test data processing apparatus, the apparatus comprising:
the initial case acquisition module is used for acquiring an initial test case aiming at a to-be-tested object;
the code writing module is used for determining mutation points based on the initial test case, intercepting the code content from the initial test case to the mutation points, and writing codes meeting grammar standards on the basis of the intercepted code content so as to obtain an initial variation case;
The target code determining module is used for determining target code content to be replaced from the initial test case;
the replacement code determining module is used for determining the code content with the structure attribute matched with the structure attribute of the target code content in the initial mutation use case as the replacement code content;
and the code replacement module is used for replacing target code content in the initial test case based on the replacement code content to obtain a target variation case, wherein the target variation case is used for testing the object to be tested.
16. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 14 when the computer program is executed.
17. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 14.
18. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 14.
CN202210726982.7A 2022-06-24 2022-06-24 Test data processing method, device, computer equipment and storage medium Pending CN117331808A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210726982.7A CN117331808A (en) 2022-06-24 2022-06-24 Test data processing method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210726982.7A CN117331808A (en) 2022-06-24 2022-06-24 Test data processing method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117331808A true CN117331808A (en) 2024-01-02

Family

ID=89290745

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210726982.7A Pending CN117331808A (en) 2022-06-24 2022-06-24 Test data processing method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117331808A (en)

Similar Documents

Publication Publication Date Title
US11797298B2 (en) Automating identification of code snippets for library suggestion models
US11354225B2 (en) Automating identification of test cases for library suggestion models
CN108614707B (en) Static code checking method, device, storage medium and computer equipment
US11494181B2 (en) Automating generation of library suggestion engine models
US20240126543A1 (en) Library Model Addition
WO2019075390A1 (en) Blackbox matching engine
US11775414B2 (en) Automated bug fixing using deep learning
US11327722B1 (en) Programming language corpus generation
CN111475820A (en) Binary vulnerability detection method and system based on executable program and storage medium
CN108563561B (en) Program implicit constraint extraction method and system
Stephan et al. MuMonDE: A framework for evaluating model clone detectors using model mutation analysis
CN116578980A (en) Code analysis method and device based on neural network and electronic equipment
CN113157565B (en) Feedback JS engine fuzzy test method and device based on seed case mutation
CN112131120B (en) Source code defect detection method and device
CN112131122B (en) Method and device for source code defect detection tool misinformation evaluation
CN117113347A (en) Large-scale code data feature extraction method and system
Zhou et al. Deeptle: Learning code-level features to predict code performance before it runs
CN113778852B (en) Code analysis method based on regular expression
CN117331808A (en) Test data processing method, device, computer equipment and storage medium
Utkin et al. Evaluating the impact of source code parsers on ML4SE models
CN112162777B (en) Source code feature extraction method and device
Avitan et al. Assembly Function Recognition in Embedded Systems as an Optimization Problem
CN117971675A (en) Evaluation method and device for code large model, computer equipment and storage medium
CN117827629A (en) Code detection method and device for flutter, computer equipment and storage medium
CN116431477A (en) JS engine differential fuzzy test method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination