CN112529156A

CN112529156A - Neural network test multi-method mixed selection input method based on clustering

Info

Publication number: CN112529156A
Application number: CN202011418759.3A
Authority: CN
Inventors: 黄如兵; 毛青青; 陈锦富
Original assignee: Jiangsu University
Current assignee: Jiangsu University
Priority date: 2020-12-07
Filing date: 2020-12-07
Publication date: 2021-03-19

Abstract

The invention discloses a neural network test multi-method mixed selection input method based on clustering, which aims to combine the traditional machine learning with the traditional software test algorithm, and divide all test cases into two categories according to the clustering effect by using the clustering algorithm: a portion of the multiple performance classes that can be clustered, and another portion of the noise classes that cannot be clustered. In the two categories, algorithms which are highlighted in traditional machine learning and traditional software testing are respectively used, global optimization of test case selection is achieved under the condition that local optimization of the two algorithms is guaranteed, and performance of retraining the neural network model is effectively improved. Compared with ARS-DNNT, the method provided by the invention has the advantage that the effectiveness experiment effect of the CMS-DNNT is better than that of the ARS-DNNT no matter whether the value of the selected test case is high or low in terms of the selection effect. The experimental result shows that the precision of the retraining model of the selected case of the CMS-DNNT is improved by at least 26 percent compared with that of the ARS-DNNT.

Description

Neural network test multi-method mixed selection input method based on clustering

Technical Field

The invention belongs to the technical field of automatic testing in neural network testing, and particularly provides a clustering-based neural network testing multi-method mixed selection input method, namely CMS-DNNT.

Background

Today, deep neural networks are developed day by day, and the fields such as image classification, face recognition, even natural language processing and the like are widely applied. However, as more security-oriented fields (such as automatic driving, medical diagnosis, etc.) start to use the deep neural network, a new requirement, namely robustness, is put on the deep neural network. The neural network tests which have been developed in recent years do have a certain practical significance in these aspects. In the process of testing the neural network, in order to improve the generalization of the model under the diversified abnormal mode, expanding the sample can be said to be a good strategy, but while expanding the sample, a new problem is brought, that is, the tested model usually has a huge image input, so that it has important significance to select a part of test input which can effectively identify the model failure as a test case. Meanwhile, the development of the scale of the deep learning system continuously promotes the development of the diversity of the neural network testing technology. Among many Testing methods, Random Testing (RT) is widely used for quality assurance of various deep learning systems because of its simple concept and application method. More specifically, RT randomly generates a test case in the input domain to drive the execution of the neural network to be tested, and determines whether the test case causes an output abnormality of the neural network. However, the RT only generates the test cases blindly and randomly, but ignores the relevant information of the test cases that have already been executed, and does not notice the attribute of the neural network input itself that triggered the failure. Therefore, researchers have made some controversial efforts on RT potency. For the fault detection capability of RT, Adaptive Random Testing (ART) is one of the improved methods proposed for RT.

The advent of ART in the literature dates back to a journal paper published by Chen et al 2001. ART is motivated by observations about software failure modes reported independently by many researchers from many different fields, such failure modes being: inputs that trigger program faults tend to gather in a contiguous area, referred to as a fault area. Three failure modes are defined in the paper by Chan et al: the three failure modes are represented in fig. 1, where the input domain is two-dimensional, and the bounding box represents the input domain boundary. Where black bars, blocks or dots represent the input distribution that caused the fault. Early studies showed that stripe and block failure modes are more common than spot modes.

By the above analysis, if the failure region is continuous, the non-failure region should also be continuous in the entire input domain. Specifically, the method comprises the following steps: if a test case t_cIs an input that triggers a program failure, then its neighbors are likely to be inputs that trigger a program failure. Based on this, the goal of ART is to achieve a uniform distribution of test cases over the input domain, while ensuring randomness of the RT. Two processes are typically involved: one is the random generation of test inputs and the other is to ensure a uniform distribution of generated test cases throughout the input domain.

ART is a cluster of methods based on diverse intuitive and guidelines testing methods, with the Fixed candidate Set ART algorithm (FSCS-ART) being the most classical and commonly accepted one of the algorithms. The core technology for FSCS-ART is as follows: the concept of similarity between two test cases, namely distance measurement; two test case sets are used: a set of executed test cases E (a set of test case constituents that have already been executed and do not capture any exceptions in the program under test) and a set of candidate test cases C (a set of k test case constituents randomly generated in the input domain). If C' e C wants to become the next test case, the condition must be satisfied:

c′∈C，min_e∈Ed(c′，e)≥min_e∈Ed(c，e)，

where d (x, y) is used to describe the similarity between test cases x and y, and represents the Euclidean distance between x and y.

The Euclidean distance is defined as follows: test case tc₁And tc₂The euclidean distance between them is recorded as: d (tc)₁，tc₂). For an m-dimensional input field, there is a test case a ═ a₁，a₂，...，a_mB and test case b ═ b₁，b₂，...，b_mAnd then:

ART was introduced into a deep learning system, which dates back to a journal paper published by Yan et al in 2019. In order to improve the effectiveness of RT, ARTDL combining the algorithm with a deep learning system is proposed on the basis of a fixed candidate set self-adaptive random algorithm FSCS-ART which shows prominence in a traditional software test. The basic idea of the algorithm is to design an automatic black box testing method based on a deep learning system, so as to improve and replace the RT. In the experimental design process, on the basis of extracting Feature vectors output by each convolution layer of the neural network, Feature Euclidean Distance (FED) based on a deep learning system is defined, and then a failure mode of a Feature vector visualization DL model, namely a three-dimensional space diagram of input causing faults and input not causing faults, is proposed. Figure 2 shows that the characteristic euclidean distance can be used as a metric to compare the similarity of both fault-causing and non-fault-causing inputs.

The characteristic euclidean distance is defined as follows: test case t_aAnd t_bThe Euclidean distance between them is denoted as FED (t)_a，t_b). For m-dimensional feature vectors, there is a test case t_a＝v_a＝{a₁，a₂...a_mAnd test cases t_b＝v_b＝{b₁，b₂...b_mAnd then:

through the analysis, the ARTDL method simply introduces a Fixed candidate Set adaptive random test method (FSCS-ART) with prominent performance in the traditional software test into a deep learning system. As an improved version of the randomness test of the deep learning system, ARTDL proposed by MIN YAN certainly has a qualitative leap in performance. But when the method meets the requirement of high robustness in more safety up to the upper field nowadays, the performance is slightly insufficient, and under the condition that the similarity of the extracted image characteristic data is larger and the corresponding testing capability is more consistent, the method is based on the traditional clustering algorithm in the deep learning system, and optimizes the ARTDL, so that the performance of the deep learning system is further improved.

After the test inputs with different test capabilities are divided into different groups, test cases are selected from each group to form a small set of test inputs. For multiple performance classes with more consistent test performance, it is difficult to perfectly distinguish different test capabilities by clustering, each group is noisy, i.e. test inputs should not be divided into this group. To avoid the effects of each set of noise, the set of test inputs that best represent the test capability is selected to be interpretable, also referred to as a prototype. The most advanced example-based Maximum Mean variance algorithm (MMD) in machine learning, which determines a selected prototype by calculating the difference between the prototype distribution and the subgroup distribution while ensuring the minimum value of the difference, has certain advantages in increasing the interpretability of the selected test input. The MMD calculates the expected value of the prototype and the group in the function space F, and uses it as a measure of the difference between the prototype and the group, which is expressed as follows:

disclosure of Invention

Aiming at the problems, the invention provides a multi-method mixed selection input method based on clustering neural network test, in order to effectively improve the efficiency of a self-adaptive random neural network test method. The technical scheme of the invention comprises the following steps:

step 1, determining a pre-trained neural network model D' to be tested and a confrontation sample set S;

step 2, extracting a confrontation sample feature set V;

step 3, two different categories are obtained: several performance classes l that account for a significant fraction of the total number of samples₁，l₂...l_jAnd a minority of noise classes l₀；

Step 4, setting the selection proportion value alpha of the two categories to be 0.6 under the condition that the number of the selected use cases is determined to be n;

step 5, aiming at two different categories, combining the traditional machine learning with the traditional software testing algorithm by adopting a divide-and-conquer concept, and respectively selecting alpha x n test cases and (1-alpha) x n test cases;

step 6, using the selected use case as a training set T, and then training an old model D 'to obtain a new model D';

step 7, finally returning the prediction error number of the new model D' aiming at the confrontation sample, and taking the prediction error number as the measurement for judging the performance of various selection algorithms;

further, the specific process of step 1 is as follows:

step 1.1, determining a tested neural network model D and a corresponding seed image database (N seed image sets I) according to a previous software design document;

step 1.2, initializing the measured neural network model according to the measured neural network model D determined in the step 1.1 and a corresponding seed image database (N seed image sets I), namely pre-training the measured neural network model D through the corresponding seed image database, and further determining a pre-trained neural network model D' to be measured;

step 1.3, according to the seed image database (N seed image sets I) determined in step 1.1, sample expansion is performed in 33 transformation methods shown in table 1 to generate 33 × N confrontation sample sets S.

The specific process of the step 2 is as follows:

step 2.1, obtaining all the convolution layers cov according to the D' determined in step 1.2₁，cov₂，...；

Step 2.2, extract it at cov based on S determined at 1.3₁，cov₂,.₁，V₂,., choose a variable V that will affect the performance_i. The characteristics used for the best effect are verified by the experimental resultsThe set is used as an optimal feature set V;

the specific process of the step 3 is as follows:

step 3.1, Density hierarchical clustering algorithm (HDBSCAN) is used. And taking the distance as a similarity measure, and classifying all test cases into different categories according to the test performance according to the idea that the more consistent the test performance is, the more concentrated the corresponding test cases are. The specific steps are as follows:

and 3.1.1, transforming the space according to the density/sparsity. The core of the clustering algorithm is single-link clustering, which is very sensitive to noise: a single noise data point at the wrong location may act as a bridge between islands, bonding them together. To eliminate the effect of noise, dense points (with a lower core distance) are kept at the same distance from each other, but sparser points are pushed apart so that their core distance is at least away from any other point. The mutual reachable distance is defined as follows:

d_mreach-k(a，b)＝max{core_k(a)，core_k(b)，d(a，b)}

wherein core_k(a)、core_k(b) Respectively, as the distance from the point a, b to the respective k near points, and d (a, b) represents the original distance between the points a, b.

And 3.1.2, establishing a minimum spanning tree of the distance weighted graph. The data is treated as a weighted graph, where the data points are vertices and the weight of the edge between any two points is equal to the mutual reachable distance between those points, i.e. the value determined in step 3.1.1. Finding a minimal set of edges, such that removing any edge from the set results in a component disconnection, i.e., building a minimal spanning tree very efficiently by Prim's algorithm.

And 3.1.3, constructing a cluster hierarchical structure of the connecting component. The edges of the tree are sorted by distance (in increasing order) and then traversed to create a new merged cluster for each edge. By jointly looking up the data structure, each edge clustering 2 together is determined. Eventually, a hierarchy of connected components (from fully connected to fully unconnected) is obtained at different threshold levels.

And step 3.1.4, compressing the cluster hierarchical structure according to the minimum cluster size. A concept of minimum cluster size is introduced as a parameter of HDBSCAN. In the process of traversing the hierarchical structure, whether the number of points of a new cluster created by segmentation is less than the size of the minimum cluster is judged during each segmentation. If less than the minimum cluster size, then it is declared a ' point removed from cluster ', and the larger cluster is retained the parent cluster's identity. And if the cluster size obtained by splitting is larger than the minimum cluster size, the split is kept in the tree. The result is a much smaller tree with a small number of nodes, each node having data on how the cluster size of the node decreases with different distances.

Step 3.1.5, stable clusters are extracted from the compressed tree. To ensure that the selected cluster persists and has a longer life cycle, i.e., a cluster is selected, no cluster can be selected for its descendants. Consider using a metric other than distance to consider the persistence of the clusters, i.e., defining the density metric for each point as:

where distance represents the shortest distance between the point and the remaining cluster emphasis.

Defining a value λ for a given cluster_birthAnd λ_deatIs the value of λ when a cluster separates and becomes its own cluster, where λ_biRepresents a λ value at the time of cluster formation; lambda [ alpha ]_deathRepresents the lambda value at which the cluster splits into two sub-clusters; and lambda values (if any) when clusters are split into smaller clusters, respectively. Conversely, for a given cluster, for each point P in the cluster, we can assign the value λ_PA value of λ defined as "falling out of cluster", which is between λ and λ_birAnd λ_deathIn the meantime. Now, for each cluster, the stability is calculated as follows:

S_cluster＝∑_p∈cluster(λ_p-λ_birt)

λ_Pto be from the clusterLambda value when isolated. All leaf nodes are declared as selected clusters, and whether the sum of the stabilities of the sub-clusters is greater than the stability of the clusters is judged by traversing the tree (reverse topological sorting order). If so, setting the cluster stability as the sum of the sub-cluster stabilities, otherwise, declaring the cluster as the selected cluster and deselecting all the sub-clusters thereof. When traversing to the root node, returning all cluster sets selected currently.

Step 3.2, according to the step 3.1.4, setting the minimum cluster parameter value to 50, judging whether the number of the test cases in the obtained cluster is less than the threshold value 50, and converging all the clusters with the threshold value less than or equal to 50, namely all 'points removed from the clusters' into a noise class l₀The clusters l which account for the largest part of the total number of samples remain₁，l₂，...，l_jDefined as a number of performance classes.

The specific process of the step 5 is as follows:

and 5.1, selecting a Maximum Mean difference algorithm (MMD) with outstanding capability from the use cases with more consistent test capability in machine learning aiming at a plurality of performance classes. In the selection process, the distribution P of the selected cases in each performance class in the performance class is respectively calculated₁、P₂...P_jAnd the class distribution G of each performance class in the total performance class, and taking the MMD as a test statistic. To make the two distributions as identical as possible, the chosen α × n test cases need to make the MMD value as small as possible. In addition, the number of the test cases selected in each performance class is selected according to the proportion value of the number of the test cases of the performance class in the total number of the test cases of the total performance class;

step 5.2, selecting an Adaptive Random Selection (FSCS-ARS) method which has stronger performance in a traditional software test and is based on a fixed candidate set compared with a dispersed case, and selecting the number (1-alpha) n of test cases;

step 5.2.1, the similarity measurement between the test cases is carried out by using Feature Euclidean Distance (FED), and the closer the distance is, the higher the similarity of the two test cases is;

step 5.2.2, initializing an executed test case set E and a candidate test case set C according to the V obtained in the step 2; e initial value t₀For a randomly selected one of the non-abnormal countermeasure sample vector feature vectors v₀The C size was fixed to 10.

Step 5.2.3, calculating each candidate test case C in C_iAnd each element E in E_i(currently only one element t in E₀) Selecting the minimum distance as the similarity value of the candidate test case, and selecting the test case with the lowest similarity value (the maximum minimum distance) in the candidate test cases as the next test case t to be tested₁；

The characteristic euclidean distance is defined as follows: test case t_aAnd t_bThe Euclidean distance between them is denoted as FED (t)_a，t_b). For m-dimensional feature vectors, there is a test case t_a＝v_a＝{a₁，a₂，...，a_mAnd test cases t_b＝v_b＝{b₁，b₂，...，b_mIn which a is_i、b_iRespectively representing test cases t_a、t_bIn the feature vector of the ith dimension, then:

step 5.2.4, if t₁Detecting an abnormality, and comparing t₁Add to E and repeat step 5.2.3 until the number of selected cases reaches (1- α) × n.

The specific process of the step 6 is as follows:

step 6.1, using the n test cases selected in step 5 as a training set T ═ T for retraining the old model D ″₀，t₁，...，t_n}；

6.2, retraining the model D 'according to the T obtained in the step 6.1 to obtain a final new model D';

the specific process of the step 7 is as follows:

step 7.1, the residue from step 1As a test set T ' ═ T ' retraining the old model D '₀，t′₁，...，t′_n}；

And 7.2, inputting a new model D ' according to the T ' obtained in the step 7.1, judging whether the predicted value of the new model D ' is consistent with the label value, and further judging whether the performance of the test case algorithm is good or bad according to the obtained error number.

TABLE 1 33 transformations performed in sample expansion of the original database

The invention has the beneficial effects that:

1. the invention constructs a clustering-based neural network test multi-method mixed selection input method, and can improve the algorithm performance of ARTDL in application and neural network test when the number of test cases is fixed.

2. In the test case selection process, the invention is based on the clustering algorithm, adopts the concept of divide-and-conquer, and well and properly combines the FSCS-ARS and the MMD algorithm in the traditional test, so that the method has essential improvement in the process of inputting and selecting the neural network test compared with the ARTDL directly using the FSCS-ARS in the prior art.

Drawings

FIG. 1 is three failure modes for a two-dimensional case;

FIG. 2 is a failure mode of a feature vector visualization DL model under different convolution layers;

FIG. 3 is a flow chart of a FSCS _ ARS method;

fig. 4 is a flow chart of the method of the present invention.

Detailed Description

The invention combines the traditional machine learning with the traditional software testing algorithm, and divides all test cases into two categories according to the clustering effect by using the clustering algorithm: a portion of the multiple performance classes that can be clustered, and another portion of the noise classes that cannot be clustered. In the two categories, algorithms which are highlighted in traditional machine learning and traditional software testing are respectively used, global optimization of test case selection is achieved under the condition that local optimization of the two algorithms is guaranteed, and performance of retraining the neural network model is effectively improved. The invention mainly comprises the following steps: 1. expanding the sample to generate a confrontation sample; 2. extracting confrontation sample characteristics based on different convolutional layers of a neural network; 3. and (3) using a density hierarchical clustering algorithm to classify all test cases into different categories according to test performance, wherein the classified categories are too small due to the special performance of a few test cases, and the small categories are classified into a large category, namely a noise category. There are several different classes that exist: a number of performance classes that account for the total number of major heads; the noise class is the minority. 4. Under the condition that the number of the selected use cases is determined to be n, setting the selection proportion value alpha of the two categories to be 0.6; 5. aiming at two different categories, respectively adopting a traditional machine learning algorithm and a traditional software testing algorithm, namely selecting a Maximum Mean value difference algorithm (MMD) with outstanding capability in case with more consistent testing capability in machine learning aiming at a plurality of performance categories; aiming at noise, a fixed candidate set-based Adaptive Random Selection (FSCS-ARS) method which has stronger performance in a software test than a dispersed case is selected. 6. The selected use case is used as a training set, and then the old model is trained, so that a small amount of confrontation samples are utilized, interference caused by artificial labels is reduced, and the recognition accuracy of the neural network under various abnormal environments can be improved to the greatest extent. Through experimental verification, compared with ARS-DNNT, the method provided by the invention has the advantage that the effectiveness experimental effect of CMS-DNNT is better than that of ARS-DNNT no matter whether the value of the selected test case is high or low. The experimental result shows that the precision of the retraining model of the selected case of the CMS-DNNT is improved by at least 26 percent compared with that of the ARS-DNNT.

The invention will be further explained with reference to the drawings.

The invention provides a clustering-based neural network test multi-method mixed selection input method aiming at improving the selection performance of an FSCS _ ARS method. To verify the method of the present invention, the method is described by an experiment of selecting 10, 20, 30, 40, 50 cases and inputting the cases into a database as MNIST.

As shown in fig. 4, the algorithm proposed by the present invention comprises the following steps:

step 2, extracting a confrontation sample feature set V;

step 3, two different categories are obtained: several performance classes l that account for a significant fraction of the total number of samples₁，l₂，...，l_jAnd a minority of noise classes l₀；

the specific process of the step 1 is as follows:

The specific process of the step 2 is as follows:

Step 2.2, extract it at cov based on S determined at 1.3₁，cov₂,.₁，V₂,., choose a variable V that will affect the performance_i. Through experimental result verification, the feature set used for the best effect is used as an optimal feature set V;

the specific process of the step 3 is as follows:

d_mreach-k(a，b)＝max{core_k(a)，core_k(b)，d(a，b)}

Step 3.1.5, stable clusters are extracted from the compressed tree. To ensure that the selected cluster persists and has a longer life cycle, i.e., a cluster is selected, no cluster can be selected for its descendants. Defining a metric other than distance to take into account the persistence of the clusters, i.e. defining the density metric of each point as:

Defining a value λ for a given cluster_birtAnd λ_deathIs the lambda value when a cluster separates and becomes its own cluster, and the lambda value (if any) when a cluster splits into smaller clusters, respectively. Conversely, for a given cluster, for each point in the clusterP, we can assign the value λ_PA value of λ defined as "falling out of cluster", which is between λ and λ_birthAnd λ_deatIn the meantime. Now, for each cluster, the stability is calculated as follows:

S_cluster＝∑_p∈cluster(λ_p-λ_birth)

wherein λ_birth: lambda value at cluster formation; lambda [ alpha ]_death: lambda value when a cluster is split into two sub-clusters; lambda [ alpha ]_PLambda value when separated from the cluster. All leaf nodes are declared as selected clusters, and whether the sum of the stabilities of the sub-clusters is greater than the stability of the clusters is judged by traversing the tree (reverse topological sorting order). If so, setting the cluster stability as the sum of the sub-cluster stabilities, otherwise, declaring the cluster as the selected cluster and deselecting all the sub-clusters thereof. When traversing to the root node, returning all cluster sets selected currently.

The specific process of the step 5 is as follows:

The specific process of the step 6 is as follows:

the specific process of the step 7 is as follows:

step 7.1, the challenge samples remaining in step 1 are taken as the test set T ' ═ { T ' of the retrained old model D '₀，t′₁，...，t′_n}；

Analysis of algorithm validation results

The invention aims to improve the selection performance under the condition of selecting a certain number of test cases. In the neural network test, whether the output result of the test case is correct or not is a remarkable characteristic for measuring the performance of the model, so that the method has important practical significance for predicting the error number caused by the residual confrontation samples as a check standard after the selected confrontation samples are trained to train the neural network model. The average number of errors caused by the predicted residual confrontation samples in the neural network testing process is used as a measurement standard and is called Error _ num. For the method of the invention, 5% accuracy and 95% confidence level were used, and the number of experiments required was at least 2000 according to the central limit theorem.

The invention sets the input sample database as MNIST, and sets the number of the selected cases as 10, 20, 30, 40 and 50, and the five groups of experiments acquire the number of prediction errors of a new model obtained by retraining the samples selected by the methods respectively by adopting an RT method, an MMD method, an FSCS-ARS method and the method of the invention, and the specific comparison result is shown in Table 2.

Through experimental verification, although the MMD-DNNT performance is better than that of the method provided by the invention under the condition of small selection sample size, the CMS selection effect is better than that of the MMD under the general condition. Compared with ARS-DNNT, the method provided by the invention has the advantage that the effectiveness experiment effect of the CMS-DNNT is better than that of the ARS-DNNT no matter whether the value of the selected test case is high or low in terms of the selection effect. The experimental result shows that the precision of the retraining model of the sample selected by the CMS-DNNT is improved by at least 26 percent compared with that of ART-DNNT.

TABLE 2 comparison of Error _ num Experimental results

The number n of choices	RT	MMD	FSCS-ART	The method of the invention	Rate of increase
						n＝10	860	93	136	99	27.20％
n＝20	524	67	68	49	27.90％
						n＝30	386	74	50	37	26.00％
n＝40	297	89	41	30	26.80％
						n＝50	252	105	34	25	26.40％

The above-listed series of detailed descriptions are merely specific illustrations of possible embodiments of the present invention, and they are not intended to limit the scope of the present invention, and all equivalent means or modifications that do not depart from the technical spirit of the present invention are intended to be included within the scope of the present invention.

Claims

1. A neural network test multi-method mixing selection input method based on clustering is characterized by comprising the following steps:

s1, determining a pre-trained neural network model D' to be tested and a confrontation sample set S;

s2, extracting a confrontation sample feature set V;

s3, two different categories are obtained: the first category is a majority of several performance classes l₁,l₂...l_jThe second category is the minority noise class l₀；

S4, setting a selection proportion value alpha of the two categories, wherein alpha is greater than 0.5;

and S5, aiming at two different categories, combining the traditional machine learning with the traditional software testing algorithm, and respectively selecting alpha x n test cases and (1-alpha) x n test cases when the number of the selected cases is n.

2. The method as claimed in claim 1, wherein the step S1 comprises the following steps:

s1.1, determining a tested neural network model D and a corresponding seed image database (N seed image sets I) according to a software design document;

s1.2, initializing the measured neural network model according to the measured neural network model D determined in the S1.1 and a corresponding seed image database (N seed image sets I), and pre-training the measured neural network model D through the corresponding seed image database to obtain a pre-trained neural network model D' to be measured;

and S1.3, expanding the samples by changing M operations such as brightness, translation, scaling, blurring and the like according to the corresponding seed image database (N seed image sets I) determined in the S1.1 to generate M × N confrontation sample sets S.

3. The method as claimed in claim 1, wherein the step S2 comprises the following steps:

s2.1, determining different convolutional layers cov according to the D' determined in S1.2₁,cov₂,…；

S2.2, based on S determined in S1.3, it is extracted at cov₁,cov₂… feature set V₁,V₂…, selecting a variable V that will perform well or poorly_iAnd determining an optimal feature set V.

4. The method as claimed in claim 1, wherein the step S3 comprises the following steps:

s3.1, classifying all test cases into different categories according to test performance by using a density hierarchical clustering algorithm;

s3.2, judging whether the number of the test cases in the obtained classes is smaller than a threshold value 50, and converging all the classes with the threshold value less than or equal to 50 into a noise class l₀The remaining class l₁,l₂,…,l_jDefined as a number of performance classes.

5. The clustering-based neural network test multi-method mixing selection input method as claimed in claim 4, wherein the specific steps of the step S3.1 include:

step 3.1.1, space is transformed according to density/sparsity: the core of the clustering algorithm is single-link clustering, which is very sensitive to noise: a single noise data point at the wrong location may act as a bridge between islands, bonding them together to eliminate the effect of noise, i.e. dense points are kept at the same distance from each other, but sparser points are pushed apart so that their core distance is at least away from any other point; the mutual reachable distance is defined as follows:

d_mreach-k(a,b)＝max{core_k(a),core_k(b),d(a,b)}

step 3.1.2, establishing a minimum spanning tree of the distance weighted graph: regarding the data as a weighted graph, wherein data points are vertices, and the weight of an edge between any two points is equal to the mutual reachable distance between the points, i.e., the value is determined in step 3.1.1 to find a minimum set of edges, so that deleting any edge from the set can cause the component to be disconnected, i.e., the minimum spanning tree is very effectively constructed by Prim algorithm;

step 3.1.3, constructing a cluster hierarchical structure of the connecting component: sequencing edges of the tree according to the distance in an increasing order, then traversing, creating a new combined cluster for each edge, and determining each edge which clusters 2 together by jointly searching a data structure; finally, a hierarchy of connected components is obtained at different threshold levels, i.e. from fully connected to fully unconnected;

step 3.1.4, compressing the cluster hierarchy according to the minimum cluster size: introducing a concept of minimum cluster size as a parameter of HDBSCAN, and judging whether the number of points of a new cluster created by segmentation is less than the minimum cluster size during each segmentation in the process of traversing the hierarchical structure; if the cluster size is smaller than the minimum cluster size, declaring 'points removed from the clusters', keeping the identity of a parent cluster of a larger cluster, and if the cluster size obtained by splitting is larger than the minimum cluster size, splitting and keeping the cluster in a tree; a much smaller tree is finally obtained with a small number of nodes, each node relating to data on how the cluster size of the node decreases with different distances;

step 3.1.5, extracting stable clusters from the compressed tree: to ensure that the selected cluster persists and has a longer life cycle, i.e., a cluster is selected and any cluster that is a descendant of the cluster cannot be selected, a metric other than distance is defined to take into account the persistence of the cluster, as shown in the following:

defining a value λ for a given cluster_birthAnd λ_deathIs the lambda value when a cluster separates and becomes its own cluster, and when the clusters are split into smaller clusters, respectively; conversely, for a given cluster, for each point P in the cluster, the value λ is_PA value of λ defined as "falling out of cluster", which is between λ and λ_birthAnd λ_deathNow, for each cluster, the stability is calculated as follows：

S_cluster＝∑_p∈cluster(λ_p-λ_bir)

Wherein λ_bir: lambda value at cluster formation; lambda [ alpha ]_dea: lambda value when a cluster is split into two sub-clusters; lambda [ alpha ]_PAnd (3) declaring all leaf nodes as selected clusters according to the lambda value when separating from the clusters, traversing the tree in a reverse topological sorting sequence, judging whether the sum of the stabilities of the sub-clusters is greater than the stability of the clusters, if so, setting the stability of the clusters as the sum of the stabilities of the sub-clusters, otherwise, declaring the clusters as selected clusters, and deselecting all the sub-clusters. When traversing to the root node, returning all cluster sets selected currently.

6. The method as claimed in claim 1, wherein α in step S4 is 0.6.

7. The method as claimed in claim 1, wherein the specific process of S5 comprises the following steps:

s5.1, selecting a maximum mean difference algorithm MMD with outstanding testing capability from the cases with more consistent testing capability in machine learning aiming at a plurality of performance classes, and selecting the number alpha x n of the test cases, wherein the number of the selected test cases in each performance class is selected according to the proportion value of the number of the test cases in the performance class to the total number of the test cases in the total performance class;

s5.2, aiming at noise, selecting a fixed candidate set-based self-adaptive random test FSCS-ART method which has stronger performance in a traditional software test compared with a dispersed case, and selecting the number (1-alpha) n of test cases; specifically, the method comprises the following steps:

s5.2.1, using the characteristic Euclidean distance to measure the similarity between the test cases, wherein the closer the distance is, the higher the similarity of the two test cases is;

s5.2.2, initializing the executed test case set E and the candidate test according to the V obtained in step S2A test case set C; e initial value t₀For a randomly selected one of the non-abnormal countermeasure sample vector feature vectors v₀The C size was fixed to 10.

S5.2.3, calculating each candidate test case C in C_iAnd each element E in E_iSelecting the minimum distance as the similarity value of the candidate test case, and selecting the test case with the lowest similarity value (the maximum minimum distance) in the candidate test cases as the next test case t to be tested₁；

The characteristic euclidean distance is defined as follows: test case t_aAnd t_bThe Euclidean distance between them is denoted as FED (t)_a,t_b) For m-dimensional feature vectors, there are test cases t_a＝v_a＝{a₁,a₂,…,a_mAnd test cases t_b＝v_b＝{b₁,b₂,…,b_mAnd then:

s5.2.4, if t₁Detecting an abnormality, and comparing t₁Add to E and repeat step 5.2.3 until the number of selected cases reaches (1- α) × n.

8. The method of claim 1, further comprising the step of testing the performance of the test by a cluster-based neural network test multi-method hybrid selection input method:

s6, training a neural network model D 'by taking the selected use case as a training set T to obtain a new model D';

and S7, according to the new model D', aiming at the prediction error number of the confrontation sample, the confrontation sample is used as a measure for judging the performance of various selection algorithms.

9. The method of claim 8, wherein the step of implementing the step of S6 comprises:

s6.1, using the n test cases selected in step S5 as the training set T ═ T for retraining the old model D ″₀,t₁,…,t_n}；

S6.2, retraining the model D 'according to the T obtained in the step S6.1 to obtain a final new model D'.

10. The method of claim 8, wherein the step of implementing the step of S7 comprises:

s7.1, the challenge samples remaining in step S1 are taken as the test set T ' ═ { T ' of the retrained old model D '₀,t′₁,…,t′_n}；

S7.2, inputting a new model D ' according to the T ' obtained in the step S7.1, judging whether the predicted value of the new model D ' is consistent with the label value, and judging whether the performance of the test case algorithm is good or bad according to the obtained error number.