CN114265764A

CN114265764A - Deep learning operator test data generation method based on weighted sampling

Info

Publication number: CN114265764A
Application number: CN202111471772.XA
Authority: CN
Inventors: 房春荣; 顾明政; 刘佳玮; 邹英龙; 林均劼; 陈振宇
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2021-11-30
Filing date: 2021-11-30
Publication date: 2022-04-01

Abstract

A deep learning operator test data generation method based on weighted sampling is characterized in that a group of mutation methods designed in advance are regarded as mapping to different data value taking domains, Monte Carlo sampling is carried out in advance, so that the effectiveness of each mutation method is obtained, sampling is carried out by taking the effectiveness as the weight, and finally, deep learning operator test data are generated more effectively while data diversity is considered. The data variation method combination can be flexibly added, deleted and changed, and the basic variation method provided by the method comprises the following steps: byte, noise, and inversion variations. The byte variation refers to the variation of byte operation on the floating point number binary code of the test data, and comprises the steps of adding, deleting, negating, shifting and randomly resetting each byte; noise variation refers to the random use of various types of noise applied to test data; the inverse variation is a variation in which the pseudo-reciprocal of test data is calculated using the test data as a denominator. The above variations are based on different basic ideas, taking into account both effectiveness and diversity.

Description

Deep learning operator test data generation method based on weighted sampling

Technical Field

The invention belongs to the field of deep learning test, in particular to the field of testing accuracy of deep learning operators. Users of various deep learning models wish to test whether the deep learning operators on which the models depend have accuracy problems by some method.

Background

With the gradual maturity of deep learning technology and the emerging demand of intelligent application, various deep learning models are proposed and widely applied to various scenes in various fields. However, people tend to pay attention to the correctness and completeness of the overall function or flow of a model when researching, using and testing a deep learning model, and ignore the correctness of a specific operator depended on the back of the model. In fact, the deep learning operator often causes a series of accuracy problems, which cause serious damage and influence on the model, and therefore, an accuracy test for the deep learning operator is necessary.

Different from the traditional test, the test case generation of the deep learning operator accuracy test is difficult, and people cannot accurately know the specific characteristics of the effective test case, so that the test case cannot be accurately designed, and the effective test case can be searched only in a random sampling mode. However, pure uniform random sampling has many problems, which are often extremely inefficient and do not give enough attention to some important value intervals, such as: floating point numbers less than 1. By taking the idea of the conventional mutation test as a reference, mutating the randomly generated test data can effectively improve the probability of the problem of the test triggering accuracy, but the problem still exists in that it is not possible to determine which mutation method is used at which time. The present invention solves the above problems well by not only providing a more efficient set of basic variation methods and allowing for flexible variation, but also by means of monte carlo methods and weighted sampling, a solution is proposed to determine how to use which variation method at what time.

Disclosure of Invention

The invention aims to solve the problems that: the test case generation of the deep learning operator accuracy test is difficult to accurately design, and the generation efficiency is low. People cannot accurately know the specific characteristics of the effective test cases, so that the test cases cannot be precisely designed, and the effective test cases can be searched only in a random sampling mode. However, pure uniform random sampling has many problems, which are often extremely inefficient and do not give enough attention to some important value intervals, such as: floating point numbers less than 1. Aiming at the problems, the invention designs a series of test data mutation methods, and evaluates the effectiveness of the methods through Monte Carlo sampling, so that a proper mutation method and corresponding weights are selected to carry out weighted mutation data sampling, and the generation efficiency, effectiveness and diversity of test cases of the deep learning operator accuracy test are greatly improved.

The technical scheme of the invention is as follows: a deep learning operator test data generation method based on weighted sampling is characterized in that a group of data mutation methods which are designed in advance are regarded as mapping to different data value taking domains, and Monte Carlo sampling is carried out in advance, so that the effectiveness of each mutation method is obtained, sampling is carried out with weights by taking the effectiveness as the weights, and finally the purpose of more effectively generating deep learning operator test data under the condition of considering data diversity is achieved. The data mutation method combination can be flexibly added, deleted and modified according to the needs, and the basic data mutation method provided by the method comprises the following steps: byte variation, noise variation, and inversion variation. The byte variation refers to the variation of byte operation on floating point number binary codes of the test data, the operation comprises the addition, deletion, negation, shift and random reset of each byte of the binary codes, and the floating 32 and floating 16 format codes are supported; noise variance refers to uniformly randomly applying gaussian noise or uniform noise to test data; the inversion variation is the variation of the pseudo reciprocal of the test data in a value domain calculated by taking the test data as a denominator. The above variations are based on different basic ideas, such as: the boundary data is more likely to cause abnormality, the uniform sampling is disadvantageous for a number less than 1, and the like, and both effectiveness and diversity are taken into consideration. The method comprises the following steps:

1) the variation method is defined as follows: the invention predefines several variation methods as basic variation groups, the specific variation methods are stated in 1.1, and the variation methods can be added, reduced and modified as required during actual use to define the variation method group specific to the practice.

1.1) the basic variation method group comprises the following variation methods: byte variation, noise variation, and inversion variation. The byte variation refers to the variation of byte operation on floating point number binary codes of the test data, the operation comprises the addition, deletion, negation, shift and random reset of each byte of the binary codes, and the floating 32 and floating 16 format codes are supported; noise variance refers to uniformly randomly applying gaussian noise or uniform noise to test data; the inversion variation is to calculate the variation of the pseudo reciprocal of the test data in a value domain by taking the test data as a denominator;

2) random sampling is carried out, and the accuracy problem triggering success rate after variation of each variation method is obtained;

2.1) obtaining a seed random test case by using uniform random sampling;

2.2) applying each variation method to the seed random test case respectively and testing the variation effect, and counting the accuracy problem triggering success rate of each variation method after variation after multiple iterations;

3) and (3) effectiveness analysis: according to the basic idea of the Monte Carlo sampling method, the triggering success rate of the post-mutation accuracy problem of each mutation method is regarded as the effectiveness of the mutation method under the operator, and the sampling weight matrix is obtained by normalizing the triggering success rate;

4) screening by a variation method: screening the variation methods in the variation method group according to needs;

4.1) mutation method elimination: for some special cases, such as: the method has a success rate of 0 or a variation method which greatly reduces the success rate, a large number of variation methods which have a success rate almost equal to that of a random method, application scenes do not require diversity and a large number of samples are urgently needed with high success rate, and the like, and a variation method group needs to be further screened;

4.2) the screening and rejecting operations mainly comprise: screening out the mutation method with high effectiveness of pre-K in the mutation group to form a new mutation group, and rejecting the mutation method with low effectiveness of pre-K in the mutation group to form a new mutation group, and if necessary, redefining the mutation group and repeating the whole process

4.3) matrix modification: modifying the validity matrix according to the screening, rejecting or modifying result of the variation method group, and regenerating a weight matrix for sampling with weight;

5) sampling with weight: according to the weight matrix obtained in step 4.2, sampling with weight and checking whether the accuracy problem is successfully triggered;

5.1) sequencing the weight matrix from large to small, simultaneously recording the index of each variation method, then uniformly sampling in the interval from 0 to the sum of the weights, and finding out the corresponding variation method, wherein the sum of the weights of the method and the previous variation methods is greater than or equal to the random value, and the sum of the weights of the previous variation methods is less than the random value;

5.2) randomly sampling to obtain seed test data, applying a mutation method obtained by sampling to the data to obtain final test data, and inputting the final test data into an operator to check whether the accuracy problem is successfully triggered;

the invention is characterized in that:

1. the method is a novel method, and the deep learning operator test data generation method giving consideration to data generation efficiency, effectiveness and diversity is realized by applying various variation methods to randomly generated seed test cases and respectively evaluating the effectiveness of the seed test cases;

2. a basic variation method group which is proved to be effective through a large amount of experiments and an effectiveness reference thereof are provided, and a method user is allowed to flexibly add, delete and modify variation methods according to needs;

3. the success rate of accuracy triggering after variation obtained by Monte Carlo sampling is regarded as the effectiveness measurement of the variation method, and a novel variation method evaluation method with theoretical basis is provided;

based on the three points, the method can effectively solve the problem of low test case generation efficiency of the deep learning operator accuracy test, remarkably improves the test case generation efficiency, effectiveness and diversity of the deep learning operator accuracy test, and better provides support for the subsequent deep learning operator accuracy test.

Drawings

FIG. 1 is a general architecture diagram of the present invention

FIG. 2 is a diagram of the variation method definition sub-process architecture of the present invention

FIG. 3 is a diagram of the Monte Carlo stochastic sampling subprocess architecture of the present invention

FIG. 4 is a diagram of the validation analysis subprocess architecture of the present invention

FIG. 5 is a diagram showing the framework of the screening subprocess of the mutation method of the present invention

FIG. 6 is a diagram of the weighted sampling sub-process architecture of the present invention

Detailed Description

The key technology related by the invention is that some existing deep learning operators are utilized to sample and generate test data based on a weighted mutation method, accuracy problem detection is carried out, the generation and the mutation of a random tensor are mainly realized through NumPy, and the accuracy problem detection and the deep learning operators mainly relate to TensorFlow, PyTorch, MNN framework and MRE/MARE algorithm.

1. Tensor generation and variation

In the invention, the production and variation of the random tensor are mainly carried out through a NumPy library, the NumPy is an extended program library of Python language, supports a large number of dimensional array and matrix operations, and provides a large number of mathematical function libraries aiming at array operations, including functions of linear algebra, random number generation and the like.

2. Deep learning operator

The invention relates to an operator under three frames, which are respectively: tensorflow, PyTorch, and MNN. TensorFlow is a symbolic mathematical system developed and maintained by Google Brain (Google Brain) of Google artificial intelligence team and based on data flow programming, and is widely applied to programming realization of various machine learning algorithms, and the predecessor of the symbolic mathematical system is a neural network algorithm library DistBelef of Google. PyTorch is an open-source Python machine learning library introduced by Facebook Artificial Intelligence research institute (FAIR) and is commonly used in various machine learning applications such as natural language processing based on Torch. MNN is an efficient, lightweight deep learning framework developed by ali that supports deep model reasoning and training, especially with the performance of reasoning and training at the end-side leading up in the industry. Currently, MNN has been widely used in more than 20 apps of acriba, moto, mao, youku, etc.

The deep learning operator related by the invention is realized under three frames respectively as follows:

(1)TensorFlow：tf.nn.bias_add、tf.nn.avg_pool、tf.nn.max_pool、tf.nn.softmax、tf.nn.sigmoid、tf.nn.tanh、tf.nn.relu、tf.nn.conv2d、tf.nn.reduce_mean、tf.matmul、tf.nn.reduce_max、tf.keras.layers.BatchNormalization。

(2)PyTorch：torch.add、F.avg_pool2d、F.max_pool2d、F.softmax、torch.sigmoid、torch.tanh、torch.nn.functional.relu、torch.nn.Conv2d、torch.mean、torch.matmul、torch.max、torch.nn.BatchNorm2d。

(3)MNN：MNN.expr.bias_add、MNN.expr.avg_pool、MNN.expr.max_pool、MNN.expr.softmax、MNN.expr.sigmoid、MNN.expr.tanh、MNN.expr.relu、MNN.nn.conv、MNN.expr.reduce_mean、MNN.expr.matmul、MNN.expr.reduce_max、MNN.nn.batch_norm。

3. accuracy problem detection

The algorithm for detecting the accuracy problem mainly comprises an MRE algorithm and a MARE algorithm. The MRE and MARE algorithms are defined as follows:

let the computation results of TensorFlow, PyTorch, and MNN be f for a given operator and a given input, respectively_t，f_p，f_mThe variance between these results is Var_TM(Tensorflow and MNN), Var_TP(Tensorflow and PyTorch) and Var_MP(MNN and PyTorch), the benchmark result f of the operator calculation_bThe calculation method is as follows: if min (Var)_TP，Var_TM，Var_MP)＝Var_TP，then fb＝(f_t+f_p)/2；min(Var_TP，Var_TM，Var_MP)＝Var_TM，then fb＝(f_t+f_m)/2；if(Var_TP，Var_TM，Var_MP)＝Var_MP，thenfb＝(f_m+f_p)/2。

MRE and MARE are defined based on the error between the actual computation result and the benchmark result, respectively, as:

in actual use, whether the accuracy problem occurs is detected by comparing the actual calculated values of the MRE and the MARE with a preset threshold value.

4. Examples of the invention

The following uses specific examples to illustrate the steps of the present invention and to show the results.

The experimental environment is as follows: tensorflow 2.0, PyTorch 1.8.1, MNN 1.1.4, the graphics card is GeForce GTX 1080 Ti.

The overall process of the invention is shown in fig. 1, and the specific implementation steps are as follows:

1) defining a variation method group, wherein a basic variation method group is directly used in an experiment, the method group comprises 34 variation methods of 3 types, the random sampling iteration frequency is set to be 20000 times, and the sampling frequency with weight is set to be 15000 times;

2) and respectively carrying out experiments under MRE and MARE by using 36 operators under 3 frames, presetting two groups of different MRE and MARE algorithm threshold values beta and gamma for the operators, and respectively ensuring that the problem triggering rate of the accuracy of random sampling is below 15% and between 30% and 50%. The accuracy problem triggering rate of all operators under different conditions is obtained in the stage;

3) and (3) regarding the accuracy problem triggering rate as the effectiveness of the mutation method, converting the accuracy problem triggering rate into normalized weight matrixes, carrying out weighted sampling according to the weight matrixes after proper screening and modification, and testing after each sampling to detect whether the generated data successfully triggers the accuracy problem of the operator.

4) The experimental results generated by the operator test data under each frame are shown in table 1. The data are obtained by averaging the results of 4 conditions of two different threshold values of MRE and MARE, and the success rate of sampling performed by using the method is obviously higher than that of a random sampling algorithm under various operators of various frames.

Table 1 Generation of experimental results of operator test data under frames

Claims

1. A deep learning operator test data generation method based on weighted sampling is characterized in that a variation method group is defined based on a basic variation method group which is defined in advance, Monte Carlo random sampling is carried out on each method in advance, so that the effectiveness of each variation method is obtained, the variation groups are further evaluated and refined, weighted sampling is finally carried out, and the purpose of more effectively generating deep learning operator test data under the condition of considering data diversity is finally achieved. The method comprises the following steps:

1) the variation method is defined as follows: the method predefines a plurality of variation methods as basic variation groups, and can add, reduce and modify the variation methods as required during actual use to define the variation method group exclusive for the practice, wherein the variation methods all belong to tensor variation methods and are different from the traditional scalar variation;

2) and (3) carrying out Monte Carlo random sampling, and counting to obtain the post-mutation accuracy problem trigger success rate of each mutation method: compared with the traditional scalar variation test, tensor variation is more complex, the effect is more difficult to predict and directly evaluate, and the most intuitive and reasonable method is to approximate the success rate of each variation method through Monte Carlo random sampling;

5) sampling with weight: and according to the weight matrix obtained by the step 4.2, sampling with the weight and checking whether the accuracy problem is successfully triggered.

2. The variation method definition sub-process of claim 1, wherein:

1) a basic set of variant methods is predefined, the set of methods comprising: byte variation, noise variation, and inversion variation. The byte variation refers to the variation of byte operation on floating point number binary codes of the test data, the operation comprises the addition, deletion, negation, shift and random reset of each byte of the binary codes, and the floating 32 and floating 16 format codes are supported; noise variance refers to uniformly randomly applying gaussian noise or uniform noise to test data; the inversion variation is to calculate the variation of the pseudo reciprocal of the test data in a value domain by taking the test data as a denominator;

2) the above-described predefined basic variation methods are based on different basic ideas, such as: the boundary data are easy to cause abnormity, uniform sampling is not beneficial to the number less than 1, and the like, a large number of experiments prove that both effectiveness and diversity are considered, the target is tensor variation, and the variation operation is more complex than scalar variation and has more statistical characteristics;

3) based on the basic mutation method group, the basic mutation method group can be amplified, reduced and modified according to actual needs to form a new mutation method group, but the mutation methods are required to be ensured to be suitable for tensor and simultaneously suitable for float32 and float16 format coding.

3. The monte carlo random sampling sub-process of claim 1, wherein:

1) obtaining a seed random test case by using uniform random sampling;

2) each variation method is applied to the seed random test case respectively, the variation effect is tested, after multiple iterations, the accuracy problem triggering success rate after variation of each variation method is counted, all the variation methods are guaranteed to be treated fairly in the random sampling process, and the accuracy problem triggering success rate after variation is equal to the number of samples successfully triggering the accuracy problem/the total number of samples after variation.

4. The validity analysis subprocess of claim 1, wherein:

1) the triggering success rate of the post-mutation accuracy problem of each mutation method is regarded as the effectiveness of the mutation method under the operator, and the effectiveness of the mutation method is further analyzed;

2) and (4) regarding the effectiveness of each mutation method under a certain operator as a proportion expected to be used in weighted sampling, and obtaining a weight matrix of the mutation method group through normalization.

5. The variation method screening subprocess according to claim 1, characterized in that:

1) for some special cases: the method has a success rate of 0 or a variation method which greatly reduces the success rate, a large number of variation methods which have a success rate almost equal to that of a random method, application scenes do not require diversity and a large number of samples are urgently needed with high success rate, and the like, and a variation method group needs to be further screened;

2) the screening and removing operation mainly comprises the following steps: screening out the mutation method with high effectiveness of the front K in the mutation group to form a new mutation group, and rejecting the mutation method with low effectiveness of the front K in the mutation group to form a new mutation group, and if necessary, redefining the mutation group and repeating the whole process;

3) and modifying the validity matrix according to the screening, rejecting or modifying result of the variation method group, and regenerating a weight matrix for weighted sampling.

6. The weighted sampling sub-process of claim 1, wherein:

1) sorting the weight matrix from large to small, simultaneously recording the index of each variation method, uniformly sampling in the interval from 0 to the sum of the weights, and finding out the corresponding variation method, wherein the sum of the weights of the method and the previous variation methods is greater than or equal to the random value, and the sum of the weights of the previous variation methods is less than the random value;

2) and randomly sampling to obtain seed test data, applying a mutation method obtained by sampling to the data to obtain final test data, and inputting the final test data into an operator to check whether the accuracy problem is successfully triggered.