CN116702738A

CN116702738A - Operator defect detection method based on QL-UE algorithm and multi-test prediction

Info

Publication number: CN116702738A
Application number: CN202310625586.XA
Authority: CN
Inventors: 单纯; 王余阳; 胡立国; 廖书妍; 吴沛丰
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2023-05-30
Filing date: 2023-05-30
Publication date: 2023-09-05

Abstract

The invention discloses an operator defect detection method based on a QL-UE algorithm and a multi-test prediction, which takes a set number of operator instances as input data and adds the operator instances into a seed queue; sequentially taking out all operator instances from the seed queue, constructing a parameter space for the parameters of the operators, carrying out mutation strategy design on the operator parameters based on the parameter space, and carrying out mutation on the parameters of the operator instances by a QL-UE algorithm according to the mutation strategy to obtain mutation operator instance samples; testing operator examples by using three test prediction schemes of similar parameter test, same library differential test and document test to generate potential defect files and successful files, and possibly finding defects by analyzing the potential defect files; finally judging whether the variation times reach the upper limit, if so, selecting a certain number of potential defect files and successful files for version test and environment test; the invention can generate effective test samples as much as possible and improve the test efficiency.

Description

Operator defect detection method based on QL-UE algorithm and multi-test prediction

Technical Field

The invention belongs to the technical field of deep learning library operator defect detection, and particularly relates to an operator defect detection method based on a QL-UE algorithm and multiple test predictors.

Background

The test work of the deep learning library, which has been started at present, can be roughly classified into two types. The first type is a model-based test, which tests a deep learning library from a model perspective. In the test method, the model construction, training and reasoning processes are all developed based on a deep learning library. If certain defects are found when the model is tested, the deep learning inventory can be indirectly deduced to be in the defects. The model-based test method requires a series of models and input data of the models, the models are operated in different deep learning libraries, and whether the deep learning libraries have defects is judged according to the operation results. The second type of test method is based on the interface of the deep learning library. The deep learning library exposes to the user a series of already packaged interfaces, and if these interfaces are defective, models built using these interfaces may also be defective. The interface-based test method does not need the assistance of a model, only needs to input a test sample, and directly judges whether the deep learning library has defects according to an operation result.

The most closely existing solution to the present invention is mainly the FreeFuzz process. FreeFuzz is an automated test method for deep learning interfaces. It forms a value space and a parameter value space by collecting codes related to the deep learning interfaces from three sources of deep learning library documents, deep learning library developer test codes, and deep learning models, and tracking dynamic information of each interface, including the type, value, and shape of input tensors, and stores them in a database. Next, a blur test is performed for each interface. The specific process of fuzzy test includes selecting several data from the database as seed test samples, selecting one mutation strategy randomly for these seed test samples, generating new test samples with fixed probability, inputting the newly generated test samples into the deep learning library interface and running in different back ends, and comparing the running results to judge whether the deep learning library has defects.

Model driven methods have some drawbacks in triggering deep learning library defects. First, in order to trigger the defects of the deep learning library more effectively, a large number of models are required. Although existing models have wide application in different fields and problems, the deep learning library codes involved in the processes of building, training and reasoning of the models are relatively fixed. Second, training the model takes a considerable amount of time to obtain information such as model weights and gradients. Third, defects in the model can only indirectly reflect defects of the deep learning library. If the defect is required to be traced, the model is required to be checked and positioned to the corresponding interface, and then the code segment of the interface is checked to find the defect source. The existing interface-based deep learning library test method has low test efficiency, a large number of test samples are needed for each interface to expose defects of the deep learning library code, and the judgment on whether the execution result of the tested interface under the given test input accords with the expectations is not accurate enough.

Disclosure of Invention

In view of the above, the invention provides an operator defect detection method based on a QL-UE algorithm and multiple test predictions, which can enable the generation of test samples to develop towards the direction of amplifying the interface defects of a deep learning library as much as possible, and in the limited test sample generation times, as many effective test samples as possible are generated, so that the test efficiency is improved, and a new method is provided to solve the problem of test predictions, improve the accuracy of judging test results, and reduce false alarms generated during program operation.

The technical scheme for realizing the invention is as follows:

the operator defect detection method based on QL-UE algorithm and multi-test prediction collects data of test cases, operator function document descriptions and parameter data types in an experiment preparation stage to form a plurality of operator instances, and stores the operator instances into a database; in the experimental stage, taking out a set number of operator instances as input data, and adding the operator instances into a seed queue; sequentially taking out all operator instances from the seed queue, constructing parameter spaces for parameters of the operators, wherein the parameter spaces comprise operator value spaces, type spaces and parameter value spaces, designing mutation strategies for the operator parameters based on the three spaces, and carrying out mutation on the parameters of the operator instances by a QL-UE algorithm according to the mutation strategies to obtain mutation operator instance samples; then, testing operator examples by using three test prediction schemes of similar parameter test, same library differential test and document test, generating potential defect files and success files, and updating a Q table; by analyzing the potential defect file, it is possible to find defects; finally judging whether the variation times reach the upper limit, if not, selecting a variation strategy by using a QL-UE algorithm, and carrying out subsequent work; if the upper limit is reached, selecting a certain number of potential defect files and successful files to carry out two test prediction schemes of version test and environment test, and detecting the potential defects of the deep learning library.

Further, the similarity parameter test is a test prediction scheme provided on the basis of equipment test, detects that a certain operator has defects through the equipment test, then searches for operators with similar parameters with the operator, and tests the similar operators, so that the defects of the similar operators are found; the specific process is as follows:

step1: testing the operator instance by using equipment test, if the operator shows inconsistent behaviors on the CPU and the GPU, generating a potential defect file, analyzing the file in the later period, and detecting whether the operator has defects;

step2: searching other operators with similar parameters with the original operators, and mainly searching two types of operators; the first is a similar operator covering parameter data types in the original operator examples, wherein the two operators need to have the same parameter quantity and sequence, and the similar operator needs to contain the parameter data types corresponding to the original operator examples on the data types of each parameter; the second is that the total number of parameters is more than the original operator, the number of necessary parameters is less than the original operator, the data type covers the parameter data type in the original operator instance, the condition allows the unnecessary parameters of similar operators to be deleted, thereby realizing the complete matching of the parameters of the two operators;

step3: replacing the names of the original operators and the parameter names with similar parameters, and keeping the parameter values unchanged to obtain a similar operator instance;

step4: the similar operator instance runs on the CPU and the GPU simultaneously, and two running results are input into the bimodal comparator for comparison, so that whether the similar operator has defects or not is found.

Further, the same-library differential test finds that a certain operator generates a potential defect file based on equipment test, then searches for a new operator with the same function as the operator, and replaces both parameters and input data; the new operator generates two results on the CPU and the GPU equipment, and the results are input into the bimodal comparator for comparison, so that whether the new operator exposes defects under the same parameter values and input data is found;

the operator for searching the same function specifically comprises the following steps: calculating the similarity of all operator document description information; when the similarity is larger than a specified threshold, preliminarily judging that the functions of the two operators are consistent; and finally, a union set is obtained for the matched results of the two methods, each operator can obtain an operator set with the same function, whether operators in the set meet the requirements is checked, and operators with different functions are moved out of the set.

Further, the document test is divided into two parts: the first part tests whether the data types of the document description parameters can all run; the second part verifies whether the data type of the description parameter is consistent with the description in the document in the exception information thrown by the interpreter;

the interpreter throws different abnormal information according to different triggering conditions, in order to make the abnormal information describe the data type information about the parameters as much as possible, firstly, a set containing all data types is prepared, then, the difference set is taken for the data types of the parameters, the data type set which is not the parameters is obtained, the data type set is replaced by the data type set to test operators, and the abnormal information related to the data types is thrown out with high probability.

Further, the version test is to run on the deep learning libraries of different versions for the same operator instance, compare whether the results of the operator instance and the deep learning libraries are consistent, and the consistent results indicate that the probability is high that the operator instance has no defect, but also have a certain probability that the two versions have the same defect, so that the results are consistent; if the results are inconsistent, whether the new version restores the old version of the defect, the new version introduces the new defect or the operator function of the new version is modified needs to be judged.

Further, the environment test firstly selects a part of potential defect files and successful files to run in the Windows operating system environment, and stores all parameter values and result values in a new file; transferring the new file to a Linux operating system for operation, and storing an operation result; finally, inputting the two results into a bimodal comparator for comparison, and judging whether an operator contains a defect or not;

the environment test cannot switch the operating system environment by using the script, and only after all files are completely run in the Windows operating system, the newly generated files are transferred to the Linux operating system and then run again.

The beneficial effects are that:

1. the invention uses the parameter selection mutation strategy of the QL-UE algorithm guide operator to carry out mutation, generates a large number of test samples and increases the defect discovery probability.

2. The invention designs five test prediction scheme operation test samples of similar parameter test, same library differential test, document test, version test and environment test, and totally discovers 97 defects of two deep learning libraries of PyTorch and TensorFlow.

3. The invention designs a bimodal comparator which can process both numerical and text input.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a flow chart of a bimodal comparator of the present invention.

FIG. 3 is a flow chart of a similar parameter test according to the present invention.

FIG. 4 is a flow chart of the document testing of the present invention.

FIG. 5 is a version test flowchart of the present invention.

Detailed Description

The invention will now be described in detail by way of example with reference to the accompanying drawings.

The invention provides an operator defect detection method integrating a QL-UE algorithm and a multi-test prediction, and the flow is shown in figure 1.

In the experiment preparation stage, data collection is required from the PyTorch and TensorFlow functional networks, mainly collecting test cases, operator function document descriptions, parameter data types and the like, forming operator examples, and storing the information into a database. In the experimental stage, a certain number of operator instances are taken out as input data first, and the operator instances are added into a seed queue. And then sequentially taking out all operator instances from the queue, constructing a parameter space for the parameters of the operators, wherein the parameter space comprises an operator value space, a type space and a parameter value space, and carrying out mutation on the parameters by matching a follow-up mutation strategy with the three spaces. And then the QL-UE algorithm selects a mutation strategy to mutate parameters of the operator instance to obtain a mutation operator instance sample. And then, testing operator examples by using three test prediction schemes of similar parameter test, same library differential test and document test, generating potential defect files, success files and the like, and updating the Q table. By analyzing the potential defect file, it is possible to find defects. Finally judging whether the variation times reach the upper limit, if not, selecting a variation strategy by using a QL-UE algorithm, and carrying out subsequent work; if the upper limit is reached, selecting a certain number of potential defect files and successful files from the files generated above to carry out two test prediction schemes of version test and environment test, and detecting the potential defects of the deep learning library.

QL-UE driven test sample generation

A large number of mutation operator samples are needed for detecting the deep learning library defects from the operator layer surface based on the fuzzy test technology, and mutation operator samples can be generated by using a mutation strategy to mutate operator instance parameters. And introducing three mutation strategies of value mutation (Vm), type mutation (Tm) and parameter substitution mutation (Pm) proposed by FreeFuzz to mutate operator parameters. Operator ith parameter op _i The mutation strategy mu is selected to obtain a new parameter op' _i (simply change the parameters)Numerical value or type of number) using op' _i When the operator is tested, if the defect can be exposed under the test prediction scheme, the mu is described as the parameter op _i Is effective. In the subsequent variation, the op is taken over _i In the case of mutation, the probability of mu selection should be increased. Based on the purpose, a QL-UE optimization algorithm is designed to guide a parameter selection mutation strategy, and effective mutation operator samples are generated as much as possible within limited mutation times.

The confidence upper bound Q-learning playback algorithm (Q-learning with Upper Confidence Bound and Experience Replay, QL-UE) is derived by optimizing Q-learning by combining the UCB algorithm with an empirical playback strategy. The UCB algorithm is introduced to optimize the selection of the Q-learning to the action, so that the Q-learning can consider the mean value of benefits and the confidence interval simultaneously when selecting the action, and the exploration degree is dynamically adjusted along with the time, thereby accelerating the convergence rate. An empirical playback strategy is introduced to destroy the correlation of adjacent data, improve generalization ability, and improve the utilization of data.

Operator parameter op _i The mutation strategy mu is selected to obtain a new parameter op _j Using op _j When the operator is tested, if the defect can be exposed under the test prediction scheme, the mu is stated to the op _i Effectively, the following time of mutation, when the op is turned _i In the case of mutation, the probability of mu selection should be increased. For this purpose, QL-UE algorithms were designed to guide the parameter selection mutation strategy. The mutation strategy is aimed at operators with parameters, and for operators without parameters, the parameter mutation is skipped, and the test prediction is directly operated. The purpose of mutating the parameter examples is to generate more parameter examples, namely operator test samples, so that a large number of test samples can be used for more comprehensively testing and predicting, and operator defects can be detected as much as possible. Representing an operator with n parameters as { p } ₁ ,p ₂ ,...,p _n The policy set is expressed as { Vm, tm, pm }, so that actions and states of the QL-UE algorithm can be abstracted simultaneously into a combination of parameters and policies, e.g. p ₁ Vm, if expressed as an action, means that the next action of the operator is the parameter p ₁ Selecting Vm policiesPerforming mutation, performing strategy, meaning that the operator enters p ₁ Vm state; if expressed as a state, meaning that the current state of the operator is the parameter p ₁ The Vm strategy has been selected for variation.

The default value of the Q value is 0 during initialization, the probability that all strategies are selected is the same at present, then the operator continuously selects the strategies, and the Q value is continuously updated until the algorithm converges or the circulation times reach the upper limit. After each variation of the operator parameters, testing and predicting the operator, and detecting whether a defect file is generated, wherein the defect file represents that the operator may contain potential defects, if the defect file is generated, updating the Q value by using a positive reward value, and if the defect file is not generated, updating the Q value by using a negative reward value.

Dual mode comparator

In the test forecast scheme, a scenario of comparing two result values is frequently involved, and the result values may be in a numerical form or a text form. For this case, a bimodal comparator is designed, a comparator that can handle both numeric and text types of input. The specific flow of the bimodal comparator is shown in fig. 2:

(1) For a numerical value type of one parameter and a text type of the other parameter, the inconsistency is returned directly.

(2) The dual-mode comparator receives two input parameters, and for the condition that the parameters are numerical values, firstly, whether the data types of the parameters are consistent needs to be judged, and for the composite type, whether the data types of the internal parameters are consistent needs to be judged, and only after all the data types are consistent, whether the two parameters are equal in a specified relative error and absolute error range is checked by using a formula, wherein the formula is as follows:

|inpit _i -input _j |≤atol+rtol×|input _j |

wherein input is input _i And input _j Is two parameters, atom is the absolute error and rtol is the relative error. For special cases, such as a single bool type parameter, the above equation cannot be used, and it can only be determined whether the two parameters are identical.

(3) For the case that the parameters are text, the pre-trained sentence bert model is used to convert the two parameters into vectors u and v with fixed lengths, cosine similarity of the vectors is calculated through a formula, the range of values is between-1 and 1, the larger the value is the higher the similarity of the two vectors, when the similarity is larger than a specified threshold value, the two parameters are similar, wherein, the dot product of the vectors is represented, and the sum of the values of the vectors u and v is represented. The formula for calculating cosine similarity is as follows:

test forecast protocol

Test predictors are described as a mechanism that can determine whether a program produces a correct output based on a given input. A large number of test samples are generated based on the QL-UE algorithm, and operator defects need to be found from these samples, which requires detection using an efficient test prediction scheme. In order to utilize the sample to the greatest extent possible, five test prediction schemes are designed, namely a similar parameter test, a same library difference test, a document test, a version test and an environment test.

Similar parameter testing

Deep learning inventory is built on a number of operators with the same parameter names, numbers, orders, and data types, which are focused mainly under a certain module, such as the tf. If a defect is found in the inspection of the operator sample, the defect can be found by looking for other operators with similar parameters, and testing using the same parameter values. For this case, this section devised a similar parametric test prediction scheme.

FreeFuzz proposes device testing, based on a differential test method, using the same operator instance to execute on both CPU and GPU hardware devices, respectively. Theoretically, the influence of factors such as CPU and GPU floating point type precision is eliminated, and the performance of the CPU and GPU floating point type precision is consistent. When the behavior is quite different, then it is necessary to analyze whether the operator is defective.

The similarity parameter test is a test prediction scheme provided on the basis of equipment test, detects that a certain operator has defects through the equipment test, then searches for operators with similar parameters with the operator, and tests the similar operators, so that the defects of the similar operators are found. The flow is shown in fig. 3, and the specific steps are as follows:

step1: and testing the operator instance by using equipment test, if the operator does not behave consistently on the CPU and the GPU, generating a potential defect file, analyzing the file at a later stage, and detecting whether the operator has defects.

Step2: searching all other operators with similar parameters with the original operator, and mainly searching two types of operators. The first type is a similar operator covering parameter data types in the original operator instance, the two operators need to have the same parameter quantity and sequence, and the similar operator needs to contain the parameter data types corresponding to the original operator instance on the data types of each parameter. The second is that the total number of parameters is more than that of the original operators, the number of necessary parameters is less than that of the original operators (the necessary parameters of similar operators can be replaced), the data types cover the parameter data types in the original operator examples, and the condition allows the unnecessary parameters of the similar operators to be deleted, so that the parameters of the two operators are completely matched.

Step3: and replacing the names of the original operators and the parameter names with similar parameters, and keeping the parameter values unchanged to obtain similar operator examples.

Co-library differential testing

There are numerous operators with the same functions in the deep learning library, for example, a torch.nn module in the pyrerch library and a torch.nn.functional module, and there are numerous operators with the same functions in the two modules, only the former is called in the form of class, and the latter is called in the form of function. Theoretically, operators with the same function are selected from the two modules, the same parameters are set, and the same output should be generated for the same input data. For this situation, the section designs a same-base differential test prediction scheme for testing operators with the same function under the same deep learning base.

The same-library differential test flow is similar to the similar parameter test, and is that a certain operator is found to generate a potential defect file based on equipment test, then a new operator with the same function as the operator is searched for, and both parameters and input data are replaced. The new operator generates two results on the CPU and the GPU equipment, and the results are input into the bimodal comparator for comparison, so that whether the new operator exposes defects under the same parameter values and input data is found.

The difference is that the same-library differential test is more complex when the parameter values and input data are replaced than the similar parameter test, because all parameters of the two operators need to be matched exactly. It is difficult to design a general method to achieve a good one-to-one correspondence of parameters and to replace them, especially where multiple parameters are involved. Thus for many operators a separate design parameter alternative is required. For parameters that are not complex, no parameters or a single parameter may be directly substituted, for example. For operators with multiple parameters, only confirming that the original operator has a defect truly, but not simply generating a potential defect file, searching operators with the same function, and replacing parameter values and input data of the original operator into a new operator for testing.

There are two methods to find operators of the same function. In the first method, in the deep learning library, a plurality of operators with the same functions have consistent name suffixes, and operator sets are screened out by matching the name suffixes of the operators. And then comparing the similarity of the operator document description information of the operators in the set, and preliminarily judging that the functions of the two operators are consistent when the similarity is larger than a specified threshold value. However, some operator names have different suffixes and the functions are the same, resulting in operator matching omission. In order to make up for the omission of the first method, a second method is designed, and the similarity of all operator document description information is directly calculated. And when the similarity is larger than a specified threshold, preliminarily judging that the functions of the two operators are consistent. And finally, a union set is obtained for the matched results of the two methods, each operator can obtain an operator set with the same function, whether operators in the set meet the requirements is checked, and operators with different functions are moved out of the set.

Document testing

The description of operators by official documents is not always correct, especially when the data type of the parameter is concerned. The problem with the document may be due to the fact that the writer inadvertently writes by mistake or misses writing when describing the parameter data type, or due to the fact that the document is updated not in time. For this case, this section designs a document test prophetic scheme that detects operators in the document that are incorrect with respect to the parameter descriptions.

Document testing is largely divided into two parts. The first part tests whether the data types of the document description parameters are all operational. The second section verifies whether the data type of the description parameter is consistent with that described in the document among the abnormality information thrown by the interpreter.

The interpreter throws different abnormal information according to different triggering conditions, in order to make the abnormal information describe the data type information about the parameters as much as possible, firstly, a set containing all data types is prepared, then, the difference set is taken for the data types of the parameters, the data type set which is not the parameters is obtained, the data type set is replaced by the data type set to test operators, and the abnormal information related to the data types is thrown out with high probability. The document test flow is shown in fig. 4, and the specific steps are as follows:

step1: initializing operator examples and parameter data types, and then testing the operator examples and the parameter data types in two parts, wherein the first part corresponds to Step2, and the second part corresponds to steps 3-Step 4.

Step2: and testing all data types of each parameter in sequence until operator operation breakdown is found, and analyzing the breakdown reason so as to check whether the information described by the operator document is correct.

Step3: and sequentially taking difference sets of the data types of each parameter to obtain a data type set which does not belong to the parameter, and then using the types of test operators to cause the operator operation to collapse as much as possible and collecting abnormal information thrown by an interpreter.

Step4: and analyzing the abnormal information, and extracting the data type of the interpreter permission parameter. Then, these data types are compared with the data types described in the operator document one by one, and if a mismatch is found during the comparison, whether the operator document is defective is further analyzed.

Version testing

Version iteration is an important link of deep learning library development, and the new version can generally introduce new functions, repair defects of old operators, solve software and hardware compatibility problems and the like. However, new defects may be introduced, for three common reasons. The first reason is that new versions may introduce new defects less fully due to consideration during reconstruction and modification of old operator code in order to improve operator performance. The second reason is that when new versions introduce new functions and properties to old operators, new version operators may be incompatible with the version libraries, since operators typically rely on multiple version libraries. The third reason is that the test coverage for the new version is not comprehensive enough and there are defects that are not found. For this situation, the section designs a version test prediction scheme to detect the deep learning library defects of different versions.

The core idea of the version test is to run on different versions of the deep learning library for the same operator instance, and compare whether their results are consistent. The results agree that there is a high probability that there is no defect, but there is also a probability that both versions have the same defect, so the results agree. If the results are inconsistent, it needs to be determined whether the new version repairs the old version of defects, whether the new version introduces new defects, whether the new version of operator functions are modified, and the like.

After each operator is subjected to two test prediction schemes, namely a similar parameter test and a same library difference test, a large number of potential defect files, successful files and the like are generated, and the version test finds defects by running the files on different deep learning library versions, wherein the specific flow is as shown in fig. 5, and the steps are as follows:

step1: running the potential defect file and the successful file in an old version deep learning library, and comparing each parameter value with a result value result ₁ A new file is saved. The files belonging to Pythe thon script file type has the characteristic of independent interpretation and operation, and the content of the thon script file type contains all information required by operator operation, such as operator name, parameter concrete value and the like

Step2: switching the starting script to a new version deep learning library, running a new file generated by Step1, and storing a running result ₂ 。

Step3: will result ₁ And result ₂ The two result values are input into a bimodal comparator for comparison, and whether the operator contains defects is judged.

Some operators do not have return values after the operation is completed, and for the operators, only whether the operators crash in the operation process is detected. For example, an operator normally operates in an old version, and directly crashes in a new version, and then the reason of the crash needs to be found out through the exception information thrown out by the interpreter, so as to judge whether the operator contains defects. For an operator with a return value, it is necessary to detect simultaneously if the operator crashes during running between different deep learning library versions and if its return value is consistent.

Environmental testing

Deep learning libraries typically use the underlying library function implementation algorithms and data structures of the operating system, but the library function implementation may be different between different operating systems. In addition, operators that run normally on one operating system may be problematic on another operating system due to differences in underlying architecture, file system, and rights management. Aiming at the situation, the section designs an environment test prediction scheme to find out the defects of the deep learning library on the Windows operating system and the Linux operating system.

The overall flow of the environment test is similar to the version test. Firstly, a part of potential defect files and successful files are selected to run in the Windows operating system environment, and all parameter values and result values are stored in a new file. And then transferring the new file to a Linux operating system for operation, and storing an operation result. And finally, inputting the two results into a bimodal comparator for comparison, and judging whether the operator contains defects or not.

In summary, the above embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The operator defect detection method based on the QL-UE algorithm and the multi-test forecast is characterized in that in the experiment preparation stage, data of test cases, operator function document descriptions and parameter data types are collected to form a plurality of operator examples, and the operator examples are stored in a database; in the experimental stage, taking out a set number of operator instances as input data, and adding the operator instances into a seed queue; sequentially taking out all operator instances from the seed queue, constructing parameter spaces for parameters of the operators, wherein the parameter spaces comprise operator value spaces, type spaces and parameter value spaces, designing mutation strategies for the operator parameters based on the three spaces, and carrying out mutation on the parameters of the operator instances by a QL-UE algorithm according to the mutation strategies to obtain mutation operator instance samples; testing operator examples by using three test prediction schemes of similar parameter test, same library differential test and document test, generating potential defect files and success files, and updating a Q table; by analyzing the potential defect file, it is possible to find defects; finally judging whether the variation times reach the upper limit, if not, then using QL-UE algorithm to select variation strategy for subsequent work; if the upper limit is reached, selecting a certain number of potential defect files and successful files to carry out two test prediction schemes of version test and environment test, and detecting the potential defects of the deep learning library.

2. The operator defect detection method according to claim 1, wherein the similarity parameter test is a test prediction scheme provided on the basis of equipment test, wherein the defect of a certain operator is detected through the equipment test, then an operator with similar parameters to the operator is searched, and the similar operator is tested, so that the defect of the similar operator is found; the specific process is as follows:

3. The operator defect detection method according to claim 1 or 2, wherein the same-library differential test finds that a certain operator generates a potential defect file based on equipment test, then searches for a new operator with the same function as the operator, and replaces both parameters and input data; the new operator generates two results on the CPU and the GPU equipment, and the results are input into the bimodal comparator for comparison, so that whether the new operator exposes defects under the same parameter values and input data is found;

4. The operator defect detection method of claim 1, wherein the document test is divided into two parts: the first part tests whether the data types of the document description parameters can all run; the second part verifies whether the data type of the description parameter is consistent with the description in the document in the exception information thrown by the interpreter;

5. The operator defect detection method of claim 1, wherein the version test is run on different versions of the deep learning library for the same operator instance, comparing whether the results are consistent, the consistent results indicate that the large probability is free of defects, but the certain probability is that the two versions have the same defects, so that the results are consistent; if the results are inconsistent, whether the new version restores the old version of the defect, the new version introduces the new defect or the operator function of the new version is modified needs to be judged.

6. The operator defect detection method of claim 1, wherein the environment test firstly picks out a part of potential defect files, runs the successful files in Windows operating system environment, and stores all parameter values and result values in a new file; transferring the new file to a Linux operating system for operation, and storing an operation result; finally, inputting the two results into a bimodal comparator for comparison, and judging whether an operator contains a defect or not;