CN115904920A

CN115904920A - Test case recommendation method and device, terminal and storage medium

Info

Publication number: CN115904920A
Application number: CN202110943582.7A
Authority: CN
Inventors: 李吉双
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Suzhou Software Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Suzhou Software Technology Co Ltd
Priority date: 2021-08-17
Filing date: 2021-08-17
Publication date: 2023-04-04

Abstract

The embodiment of the application provides a test case recommendation method, a test case recommendation device, a terminal and a computer storage medium, wherein the method comprises the following steps: determining a first test case of data to be tested in each random forest by adopting a classification network comprising a plurality of random forests to obtain a first test case set; determining a target test case meeting preset conditions in the first test case set; testing the data to be tested by adopting the target test case to obtain a test result; therefore, the best test case can be obtained, and all function points of the data to be tested can be covered as far as possible by using the least test cases, so that the test effect is more comprehensive and accurate.

Description

Test case recommendation method and device, terminal and storage medium

Technical Field

The present application relates to the field of computer application technologies, and in particular, to a method, an apparatus, a terminal, and a computer storage medium for recommending test cases.

Background

With the continuous complication of the functions of the software products and the shortening of the iteration time under the agile development, the new functions of the software products are tested, and meanwhile, the functions of the original historical iteration are ensured to be available. A plurality of test cases can be continuously generated in the whole life cycle of the test process, and the accumulated test cases are more and more along with the continuous increase of the test rounds. In the recommendation process of the test cases, when the historical data is preprocessed in the related technology, the features are extracted only aiming at the steps of each test case, and under the condition that the test step process is less than 3, the case is not used as the training test case data of the classification model; meanwhile, redundancy may occur in the function point test cases obtained from a combination of different test case steps.

Disclosure of Invention

In order to solve the above technical problems, embodiments of the present application provide a method, an apparatus, a terminal, and a computer storage medium for recommending test cases, so that an optimal test case can be obtained, and all function points of data to be tested can be covered as much as possible with the fewest test cases, and a test effect can be obtained more comprehensively and accurately.

The embodiment of the application provides a test case recommendation method, which comprises the following steps:

determining a first test case of data to be tested in each random forest by adopting a classification network comprising a plurality of random forests to obtain a first test case set;

determining a target test case meeting a preset condition in the first test case set;

and testing the data to be tested by adopting the target test case to obtain a test result.

The embodiment of the application provides a test case recommending terminal, which comprises:

the first determining module is used for determining a first test case of data to be tested in each random forest by adopting a classification network comprising a plurality of random forests to obtain a first test case set;

the second determination module is used for determining a target test case meeting preset conditions in the first test case set;

and the test module is used for testing the data to be tested by adopting the target test case to obtain a test result.

An embodiment of the present application provides a terminal, where the terminal at least includes: a controller and a storage medium configured to store executable instructions, wherein:

the controller is configured to execute stored executable instructions configured to perform the test case pushing method provided above.

An embodiment of the present application provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and the computer-executable instructions are configured to execute the test case recommendation method provided above.

The embodiment of the application provides a test case recommendation method, a test case recommendation device, a terminal and a computer storage medium, test data comprising a plurality of test cases are classified by adopting a classification network of a plurality of random forest models to obtain a first test case set of the to-be-tested data, so that the first test case set covering all function points of the to-be-tested data as far as possible can be obtained, the first test case meeting preset conditions is determined to be a target test case, the appearance of an outlier test case in the test case recommendation process is avoided, and finally the target test case is adopted to test the to-be-tested data, so that the minimum number of test cases can be used, all function points of the to-be-tested data are covered as far as possible, and the test effect is more comprehensive and more accurate.

Drawings

Fig. 1 is a schematic flow chart illustrating an implementation of a test case recommendation method provided in an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating an implementation flow of an improved decision tree training method provided in an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating another implementation flow of a test case recommendation method according to an embodiment of the present application;

fig. 4 is a schematic diagram of a process for recommending a merged test case of a random forest model according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a test case recommendation device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of the test case recommendation terminal provided in the embodiment of the present application.

Detailed Description

So that the manner in which the features and aspects of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings.

To facilitate understanding of the technical solutions of the embodiments of the present application, the following describes related arts of the embodiments of the present application.

In the related technology, a machine learning-based model construction function is adopted, each step process of a use case is decomposed, the steps after decomposition are used for modeling the recommendation process of the use case, and the model outputs the finally recombined use case. The testing steps are recombined according to the existing functional points, and the key test coverage process of the points which are easy to go wrong in the tested system and give historical experience data is not achieved.

The method comprises the steps of formatting preprocessed historical test case data by taking n continuous case steps as a unit to obtain specified format data, taking the first n-1 steps in each specified format data as training data, taking the nth step as a label to train a constructed machine learning model, and then recommending the steps of the test case by using the trained machine learning model, so that the combination of Artificial Intelligence (AI) and testing is realized, the case steps can be automatically recommended for testers to select without manually selecting a function library when the test case is compiled, the test case compiling efficiency is effectively improved, and the test efficiency is improved and the test cost is reduced.

In the recommendation technical scheme for the test case, a training set construction model with test steps as input is adopted, a new test case is output, the process needs strong correlation among the steps to construct a finished test case, and meanwhile, the step needing each function point is often larger than 3.

In the process of using machine learning, although the accuracy of the model is improved to a certain extent by the multi-model fusion technology, the opposite effect is achieved in many times, the models used in the prior art reach 3 types, and the effect achieved in the training process is not ideal, so that the relevance that the accuracy cannot be broken through is the result of paying more attention to the fusion of the models.

The extraction of the test case features is often the key focus of machine learning, the processing process of the steps in the existing scheme depends on a uniform function processing process, the obtained features are solidified into a uniform result, but some associated attribute information generated in the test case execution process and defect information discovered in the test process are not considered, and the information can become one of the key influence factors for obtaining a reliable case through reverse derivation.

Based on this, the embodiment of the application provides a test case recommendation method. So that the manner in which the features and elements of the present embodiments can be understood in detail, a more particular description of the embodiments, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings.

An embodiment of the present application provides a method for recommending test cases, and fig. 1 is a schematic flow chart illustrating an implementation of the method for recommending test cases provided in the embodiment of the present application, where as shown in fig. 1, the method for recommending test cases includes the following steps:

step S101: and determining a first test case of the data to be tested in each random forest by adopting a classification network comprising a plurality of random forests to obtain a first test case set.

Here, the random forest is a classifier including a plurality of decision trees, each decision tree is generated by training sample data obtained by randomly repeating random sampling from training sample data, and in the training process of the decision trees, a corrected sample kini purity coefficient is used to select a proper feature. The classification network is a large classifier composed of a plurality of random forests. The data to be tested is a function of the software product, or a newly developed software product, etc. The test case is used for describing a test task of one function of the software product or a newly developed software product, and embodies a test scheme, a method, a technology and a strategy. A set of test inputs, execution conditions, and results thereof, programmed for a feature object by a test case, are used to verify that a particular software requirement is met.

In some embodiments, each random forest model in the classification network may be used to classify the input test data of the multiple test cases to obtain a classification result of each random forest model, that is, a first test case, and a first test case set are obtained from the classification results of the multiple random forest models. The first test case comprises a classification result of each test case. In a specific example, the classification network includes 3 random forests a, B, and C, the test data includes 5 test cases 1 to 5, the classification result obtained by classifying the test cases 1 to 5 in the test data by the random forest a is that the test case 1 is recommended, the weight of the test case 1 is 0.85, the test case 2 is not recommended, the weight of the test case 2 is 0.11, the test case 3 is recommended, the weight of the test case 3 is 0.76, the test case 4 is recommended, the weight of the test case 4 is 0.65, the test case 5 is recommended, the weight of the test case 5 is 0.92, and then the first test case obtained by passing through the random forest model a is: test cases 1, 3, 4 and 5. Correspondingly, by imitating the method, the first test case obtained by classifying the test cases 1 to 5 by other random models can be obtained. And forming a first test case set by the first test cases of the three random forest models A, B and C.

Step S102: and determining a target test case meeting preset conditions in the first test case set.

Here, in a first test case set including a plurality of first test cases, statistics and voting weighting are performed on the results of the first test cases of each random forest model, and the first test cases meeting preset conditions are used as target test cases. Presetting conditions, wherein the number of the same first test cases in the first test case set is the largest; or the cumulative frequency of the same first test cases in the first test case set is the maximum.

Step S103: and testing the data to be tested by adopting the target test case to obtain a test result.

After the target test case for testing the data to be tested is determined, the data to be tested is tested by adopting the target test case or a plurality of target test cases to obtain the test result of the data to be tested, the target test case can fully cover the stable function points of the data to be tested, and the most complete function points of the data to be tested can be covered by the fewest target test cases.

In the embodiment of the application, the test data comprising a plurality of test cases are classified by adopting a classification network of a plurality of random forest models to obtain the first test case set of the data to be tested, so that the first test case set covering all function points of the data to be tested as far as possible can be obtained, the first test case meeting the preset conditions is determined as the target test case, the occurrence of outliers in the recommendation process of the test cases is avoided, and finally the data to be tested is tested by adopting the target test case, so that all function points of the data to be tested can be covered as far as possible by using the fewest test cases, and the test effect is more comprehensive and more accurate.

In some realizable embodiments, each random forest in the classification network recommending test cases is formed by at least one decision tree, that is, each random forest includes at least one decision tree, and then, for each decision tree, features of test data can be extracted to obtain feature information of the test data, and according to the feature information, a first test case set is determined. Therefore, the above step S101 can be realized by the following steps:

the method comprises the following steps: in any random forest, performing feature extraction on the data to be tested by adopting a decision tree in any random forest, and determining first feature information corresponding to each decision tree.

Here, the decision Tree is a Classification And Regression (CART) Tree generated by training sample data obtained by repeating random sampling from training sample data at random. The CART decision tree classifies test data comprising a plurality of test cases to obtain classification results, and then a first test case under each random forest is obtained. After test data comprising a plurality of test cases are input into a decision tree of any random forest, first characteristic information of the data to be tested is determined according to splitting attributes of the decision tree. The first characteristic information is used for representing a plurality of function points of the data to be tested.

Step two: and determining the test cases of the data to be tested in each decision tree based on the first characteristic information to obtain a second test case set.

Here, based on the first feature information, a classification result of each decision tree for test data composed of a plurality of input test cases is determined, and a second test case set with weight values is obtained.

Step three: and determining the test cases with weights meeting preset weight values in the second test case set to obtain the first test case of any random forest.

In the second test case set, the weight of each second test case is determined, and the test case with the same weight mean value of the second test cases meeting the preset weight value is determined as the first test case of the random forest corresponding to the test case.

Step four: and obtaining the first test case set based on the first test cases of the plurality of random forests.

Here, the first test case of each random forest is obtained according to the second test case set which is the output result of the decision tree of each random forest of the classification network, and then the first test cases of the plurality of random forests form the first test case set.

In the embodiment of the application, the feature extraction can be performed on the data to be tested according to the decision trees of the random forests, all features of the data to be tested can be determined as much as possible, and the first test case set of the data to be tested is determined according to the features of the data to be tested, so that the first test case set covering all features of the data to be tested as much as possible can be obtained.

In some implementations, after obtaining the second set of test cases for the plurality of decision trees for any random forest, the first test case for any random forest may be determined by:

firstly: and screening the test cases with the weight meeting a first threshold value in the second test case set to obtain a third test case set.

After a second test case set of a plurality of decision trees of a random forest is obtained, the weight of each second test case in the second test case set is counted, and the test case with the weight meeting the first threshold is determined to be a third test case, so that a third test case set is obtained. In one example, the classification network includes 3 random forests a, B, and C, taking a random forest model a as an example, the random forest a includes 2 decision trees a and B, and output results obtained after classifying the test cases 1 to 5 are: a: test case 1, weight 0.6; test case 2, weight 0.02; test case 3, weight 0.15; test case 4, weight 0.07; test case 5, weight 0.16; b: test case 1, weight 0.35; test case 2, weight 0.03; test case 3, weight 0.30; test case 4, weight 0.06; test case 5, weight 0.26; if the first threshold is 0.10, screening out the second test cases larger than the first threshold to obtain a third test case set which is: { (test case 1, weight 0.6), (test case 3, weight 0.15), (test case 5, weight 0.16) and (test case 1, weight 0.35), (test case 3, weight 0.30), (test case 5, weight 0.26) }.

Secondly, the method comprises the following steps: and determining a weight set corresponding to the same third test case in the third test case set.

Here, the third test case set of all decision trees of each random forest can be obtained through the steps, the same third test case is counted in the third test case set, the weight of the same third test case is counted, and the weight set corresponding to the same third test case is obtained. In one example, in an example, the classification network includes 3 random forests a, B, and C, taking a random forest model a as an example, the random forest a includes 2 decision trees a and B, and the test cases 1 to 5 are input into the random forest model a, so as to obtain a third test case set of all decision trees of the random forest a as: { (test case 1, weight 0.6), (test case 3, weight 0.15), (test case 5, weight 0.16), (test case 1, weight 0.35), (test case 3, weight 0.30), (test case 5, weight 0.26) }, counting the same third test case in the third test case set, and obtaining the same test cases: the test cases 1, 3 and 5, and the obtained weight sets of the same test case are respectively: {0.6, 0.35}, {0.15, 0.30}, {0.16, 0.26}.

And finally: and determining the test case with the mean value of the weight set meeting a second threshold value as the first test case of any random forest.

Here, the weight of each identical third test case and the number of test cases are averaged, and the average is squared to obtain the average of the weight set of the identical test cases. And when the mean value of the weight set is larger than a second threshold value, determining the corresponding test case as a first test case of the random forest. In one example, the classification network includes 3 random forests a, B, and C, taking a random forest model a as an example, the random forest a includes 2 decision trees a and B, and the test cases 1 to 5 are input into the random forest model a, and the weight sets of the same test cases 1, 3, and 5 are respectively: {0.6, 0.35}, {0.15, 0.30}, {0.16, 0.26}. And averaging the weight of each identical test case with the total number of the test cases to obtain a weight average value of 0.0361 of the test case 1, a weight average value of 0.0081 of the test case 3, a weight average value of 0.0070 of the test case 5, setting the second threshold value to 0.01, wherein the weight average value greater than the second threshold value is 0.0361, and the corresponding test case is the test case 1, so that the first test case of the random forest A is obtained as the test case 1.

In the embodiment of the application, the output results of all decision trees in any random forest, namely the test cases in the second test case set, are subjected to weight screening, interference results are filtered out, a third test case set capable of comprehensively describing the characteristics of the data to be tested is obtained, in the third test case set, the weight mean value of the same test case is determined, the test cases with the weight mean value meeting a preset threshold value are used as the first test cases in the random forest, the test cases are subjected to cross verification, the data to be tested can be described from multiple aspects, and a better recommended test case is obtained.

In some realizable embodiments, after obtaining the first test case set composed of the first test cases of each random forest model in the classification network, the first test cases need to be counted and sorted to obtain the target test case. Therefore, the above step S103 can be realized by the following steps:

the method comprises the following steps: and counting the frequency of the same first test case in the first test case set.

Here, the first test case set includes a plurality of test cases, since the first test case is composed of output results of a plurality of random forest models. And inputting the test data comprising a plurality of test cases into a classification network comprising a plurality of random forests, and classifying the test data by each random forest to obtain a recommendation result comprising a plurality of test cases in the first test case set. That is, if there are repeated test cases in the first test case set, the same weight of the first test case is counted in the first test case set, and the weights are accumulated to obtain the same frequency of the first test case.

Step two: and determining the first test case with the maximum frequency as the target test case.

Here, in the first test case set, the frequency of each identical first test case is compared, and the first test case with the largest frequency is determined as the target test case of the classification network.

In the embodiment of the application, the frequency of the first test case is obtained by counting the accumulated weight of the same first test case in the first test case set of the multiple random forests of the classification network, and the first test case with the largest frequency is determined as the target test case, so that the occurrence of the outlier test case in the classification and recommendation processes of the test cases can be avoided, and the obtained target test case can comprehensively cover all function points of the data to be tested.

Fig. 2 is a schematic diagram illustrating an implementation flow of an improved decision tree training method provided in an embodiment of the present application, and as shown in fig. 2, the improved decision tree training method includes the following steps:

step S201: and performing feature extraction on the sampling sample data by adopting a decision tree to be trained to obtain second feature information.

Here, the second feature information includes at least one feature information of the sampling sample data; the sampling sample data is obtained by sampling the obtained sample data. In the process of training the decision tree in the random forest, sample data is historical test case data obtained by summarizing and summarizing test case data generated in the historical test iteration process of the data to be tested. And performing N times of resampling on the acquired sample data to obtain N sample data. Each sample data includes a plurality of historical test case data. And training the decision tree to be trained respectively by adopting each sampling sample data to obtain N decision trees.

In some embodiments, according to some associated attribute information in the execution process of the historical test case and defect information of the historical test case found in the historical test process of the data to be tested, feature extraction is performed on the sample data to obtain second feature information of the sample data. In some embodiments, after the second feature information of the sample data is obtained, since different second feature information is discrete, normalization processing needs to be performed on multiple second feature information, for example, maximum and minimum normalization processing is adopted to convert at least one feature information of the sample data into a range of [0,1], so as to train the decision tree to be trained according to the feature information of the sample data.

Step S202: predicting a probability that the class of the sample data is a labeled class based on the second feature information.

Here, the probability that the sample data point belongs to a certain category obtained by classifying the sample data under any feature information is calculated according to a plurality of pieces of second feature information of the acquired sample data, and the kini coefficient of the sample data is calculated according to the probability, wherein the kini coefficient reflects the probability that the sample randomly selected in the sample data set is classified into the wrong category. If one second characteristic information is selected as the classification attribute, the probability that the sample in the sampling sample data belongs to the labeled class k under the characteristic information is P _k Then the probability that the sample is classified as wrong, i.e., the Keyny coefficient, is 1-P _k The higher the probability that the category of the sample data is the marked category is, the smaller the kini coefficient of the sample data is, that is, the lower the probability that the sample data is classified as an error category is, which indicates that the purity of the sample data is higher.

Step S203: and associating the probability with an importance function to obtain the corrected probability.

Here, the importance function is a ratio of the number of samples in the sample data, which is determined as a test case in the last decision tree training process, to the total number of samples, and reflects the number of samples for correctly classifying the sample data, and the probability-associated importance function is obtained by multiplying a kini coefficient of the sample data, which is obtained according to the probability that the class of the sample data is a labeled class, by the importance function, so as to obtain the corrected probability. The probability is associated with the importance function so that the correctly classified sample data is concerned more in the process of classifying the sample data, and the classification result is more accurate.

In one example, K samples are selected from sample data to form sample data, feature extraction is performed on the sample data to obtain a plurality of feature information, and second feature information is formed: A. b and C. Firstly, determining the probability P1 that each feature divides sampling sample data into two classes, wherein any sample data belongs to the two classes, and obtaining a Gini coefficient according to the probability P1, wherein the Gini coefficient is shown in a formula (1):

according to the formula 1, under the characteristic a, the kini coefficient of the given sample data D is shown in the formula (2):

at this time, for the sample data set D, if D is divided into two parts D1 and D2 according to whether the feature a takes a certain value a, the kini coefficient expression of D is shown in formula (3) under the condition of the feature a:

gini (D) represents the uncertainty of the data set D, gini (D, a) represents the uncertainty of the sampled sample D after a = a segmentation, a larger kini coefficient indicating a larger uncertainty.

After the kini coefficient of the sample data is obtained, adding a characteristic importance function according to a formula (4) to obtain a corrected probability:

the importance function is

Wherein T is equal to the number of samples of recommended cases defined by data in the last random forest model in the historical data set X, and K is equal to the total number of samples X; the total number of training data is | D |, and the number of samples of a certain classificationIs | C _k |。

Step S204: and classifying the sampling sample data based on the corrected probability to obtain a trained decision tree.

Calculating the modified probability of each second feature downsampling sample data, selecting the feature with the minimum modified probability value as the splitting attribute of the decision tree, dividing the sample data into two categories, calculating the modified probabilities of the remaining second feature information downsampling sample data in the two categories, and selecting the feature corresponding to the minimum probability value as the child node in the decision tree; and traversing all second characteristics of the sample data according to the mode, classifying the sample data to obtain the trained decision tree.

In the embodiment of the application, the associated training characteristic information is obtained by extracting the characteristics of the sample data, and the probability that the category of the sample data is the marked category is corrected under the second characteristic information of the sample data to obtain the corrected probability, so that the decision tree to be trained can be trained according to the corrected probability to obtain the trained decision tree, the characteristic information which is correctly classified can be biased in the process of training the decision tree, and the classification result of the trained decision tree is more accurate.

In some realizable embodiments, feature extraction may be performed on sample data sampled in the process of training the decision tree by:

the method comprises the following steps: determining attribute information of the sample data and defect information generated when the sample data is executed.

Here, the attribute information includes the latest execution result of each historical test case in the sample data, the execution time length of each historical test case, and whether a prerequisite exists for the execution of each case. The defect information generated when the sample data is executed includes: whether a defect is generated in the execution process of each historical use case, whether a defect is generated in the latest execution, the defect level generated in the latest execution and the like. And analyzing the process and the result of testing the data to be tested according to the historical test case in the historical iteration process, and determining the attribute information of the sample data and the defect information generated when the sample data is executed.

Step two: and extracting attribute information and defect information of the sampling sample data to obtain the second characteristic information.

Here, the second characteristic information includes: the latest execution result of each historical test case, the execution time length of each historical test case, whether a prerequisite exists in the execution of each case, whether a defect is generated in the execution process of each historical case, whether a defect is generated in the latest execution, and the defect level generated in the latest execution.

In some embodiments, after obtaining the second feature information of the sample data, since the feature values of different feature information are discrete, normalization processing needs to be performed on the second feature information, and all the feature values of different feature information are converted into a range of [0,1], for example, a maximum and minimum normalization processing method is used.

In the embodiment of the application, when the characteristics of the sample data are extracted, the associated attribute information is added, the relevant condition of the defects is taken as the added attributes, and the characteristics of the sample data are extracted, so that a good effect is obtained in the training of the decision tree, and the hidden characteristic information of each historical test case of the sample data can be discovered.

In some realizable embodiments, before performing feature extraction on the sample data, the initial sample data needs to be further processed through the following steps to obtain the sample data:

the method comprises the following steps: and acquiring initial sampling data.

Here, the initial sample data is data obtained by sampling sample data. The sample data is a historical test case generated in the process of performing historical test iteration on the data to be tested. The initial sampling sample data is data formed by historical test cases.

Step two: and filtering noise sample data in the initial sample data to obtain the sample data.

Here, there may be a history case with invalid test in the history test cases and a history case with no test executed, and the noise sample data is the history case with invalid test and the test case with no test executed, and it is necessary to filter such noise data in the initial sample data to obtain the sample data.

In the embodiment of the application, noise data in the initial sampling sample data is filtered, so that the data quality of the sampling sample data for training the decision tree can be improved, and a more effective trained decision tree can be obtained.

In some realizable embodiments, multiple groups of sample data are obtained by resampling sample data for multiple times, training decision trees is respectively performed on the multiple groups of sample data to obtain multiple random forests, the obtained multiple random forests are put together, output results of the random forests are merged, and a classification network is obtained. Thus, a classification network can be obtained by:

the method comprises the following steps: obtaining the random forest based on at least one of the trained decision trees.

And carrying out statistics and combination on output results of at least one decision tree, namely second test cases, to obtain a random forest model.

Step two: and obtaining the classification network based on at least one random forest.

The method comprises the steps of inputting multiple groups of sampling sample data including historical test cases obtained by N groups of resampling into random forest models obtained by training respectively, obtaining output results of multiple random forest models, namely a first test case set, and counting and voting the output results of the multiple random forest models, namely the multiple first test cases in the first test case set to obtain a classification network.

In the embodiment of the application, the random forest model is generated according to the trained decision tree, and then the classification network is generated according to the random forest model, so that sample data can be described from different aspects, decision points in different directions can be found, and a better target test case can be obtained.

The embodiment of the application provides a test case recommendation method, which can be used for summarizing and summarizing test case data generated in a history iteration process aiming at history data, training a recommendation model by using a machine learning mode and a machine learning method, taking a case of each function point as a whole, expanding relevant attribute information in a history execution process of each case, integrating obtained defect information to obtain feature description of each case, modeling from different feature dimensions by adopting a random forest algorithm in machine learning, constructing a CRAT tree and establishing a random forest on a training set on the basis of a kini index through feature optimization, performing relevant weighted voting on a plurality of decision trees in a test set according to a final result, and selecting test case data with the score exceeding a threshold value. Fig. 3 is a schematic view of another flow implementation of the test case recommendation method according to the embodiment of the present application, as shown in fig. 3:

step S301: historical training data is input.

Here, the historical training data is historical test case data obtained by summarizing and summarizing test case data generated in a historical test iteration process of the data to be tested.

Step S302: and (4) preprocessing data.

Here, eliminated data in the historical data is removed 311, and the historical training data is subjected to data preprocessing, for example, including filtering failed use cases and processing partial missing (skip) use cases to obtain training data.

Step S303: and constructing data characteristic information.

Here, the feature information processing process 312 performs feature extraction on the historical test cases in each training data by constructing data feature information, where the extracted features include: the latest execution result of each use case, the execution time of each use case, whether a prerequisite exists in the execution of each use case, whether a defect is generated in the execution process of each use case, whether the latest execution generates a defect, and the defect level generated by the latest execution, and second feature information is obtained. Currently, the decision tree is trained by using the characteristic points concerned in the test work. Since different feature values are discrete, according to the process of the classical machine learning method, the feature needs to be normalized, that is, the feature value is converted into the range of [0,1] by using the maximum and minimum normalization processing (linear normalization), and the formula of the normalization processing is shown in formula (5):

step S304: different characteristic information is selected.

Here, the kini index is used to select different second characteristic information. The kini index represents the impurity degree of the decision tree to be trained (the impurity degree is the probability of the sample being wrongly divided in the decision tree defining the CRAT, the high impurity degree of the model represents the high probability of the sample being wrongly divided, conversely, the lower the impurity degree represents the higher the accuracy of the set classification), and the smaller the kini index is, the lower the impurity degree is. Selecting K samples from the sample data to form a sampling sample data set of a training model, and dividing a sampling sample D into two parts according to the characteristic A in the sampling sample data set: the two categories are D1 and D2, respectively, the kini index of the sample D under the feature a is shown in formula (3).

In some embodiments, the kini indexes of the sample data under different second feature information of the sample data may be calculated to select different second feature information, and the decision tree may be trained.

Step S305: and distinguishing the importance of different characteristic information.

Here, the importance weighting function 313 of different second feature information may be designed to distinguish the importance of different second feature information. And transforming the second kini index of the sample data under any second characteristic information, adding an importance function of the characteristic information, and converting to obtain an importance weight function, so that the kini index can be transformed, the importance of different second characteristic information can be distinguished, and the sample data which is correctly classified is concerned more in the classification process of the sample data of the sample, so as to obtain a correct test case.

The method for modifying the original Gini index is shown as formula (4), and an importance function f (x) is added to obtain an importance weight function. Wherein the importance function is

Wherein T is equal to the number of samples of recommended cases defined by data in the last random forest model in the historical data set X, and K is equal to the total number of samples X; the number of training data sets is | D |, and the number of a certain classification is | C | _k L, |; the purpose of the degree weight function is to pay more attention to the number of correctly classified samples in the sample classification process.

Step S306: a CRAT decision tree is constructed.

The CRAT decision tree is constructed by adopting a minimum binary tree construction mode, and the generation method is realized by the following steps:

step 1: the method comprises the steps of setting a data set of nodes as Y (the data set Y is a subset of a training sample set), setting a selected feature A as a calculation feature, carrying out degree weight function calculation on the position of each possible segmentation point (the selection of the segmentation point depends on the division of the feature value of the feature A on the data set Y) under the feature A in the data set to obtain a calculation result under the feature A, selecting the position of the value with the minimum calculation result as the segmentation point position of the feature A, and segmenting the data set Y into two data sets Y1 and Y2.

Step 2: according to the calculation process of the previous step, the calculation results of the degree weight function of each feature and the segmentation points of the calculation results are compared in the data set, the segmentation point with the minimum degree weight function is found and selected as the segmentation scheme with the minimum error, two child nodes are generated from the data set of the existing node, the training data set is distributed into the two child nodes, Y1 is distributed into the left child node by default, and Y2 is distributed into the right child node.

And step 3: and (5) iterating the steps 1 and 2, traversing and finishing the second characteristic information of all the sampling data to obtain the CRAT decision tree.

Step S307: weighted voting is performed on the different CRAT decision trees.

Here, after completing the construction of a CRAT decision tree, when the same batch of sample data is adopted, different sample data sets are generated by using the following strategies: n sampling sample data are selected from the sample data D through resampling, classifiers are built for the N sampling sample data, and each classifier is a decision tree; repeating k times results in k classifiers where: { h ₁ (X),h ₂ (X),…,h _k (X), voting is carried out through voting, and the voting mechanism is to carry out an average value summarizing process on output values of all decision trees, as shown in a formula (4).

Wherein,

the output threshold value of the use case is represented, the fact that the average value of the random forest is larger than the threshold value means that the use case can be recommended, the result is 1, otherwise, the result is 0, and the use case is not recommended.

Step S308: and generating a random forest.

The method comprises the steps of resampling sample data for multiple times to obtain multiple groups of sample data, respectively training the multiple groups of sample data to obtain at least one trained decision tree, counting and combining output results of the at least one decision tree, namely second test cases, designing a test case output threshold 314 to obtain a random forest model, wherein the fact that the average value of random forests is larger than the threshold represents that the case can be recommended, the result is 1, and otherwise, the result is 0, and the case is not recommended.

Step S309: and generating a model.

Here, the model refers to a classification network composed of a plurality of random forest models. After the random forest construction process is completed once in step S308, a recommendation process based on history sample data is performed, which is a process that needs to perform merging, summarizing and recommendation on random forests in each history recommendation process. After the corresponding random forest models are generated through a plurality of historical training processes, the historical random forest models are subjected to an integral summarizing process, so that the situation that a subsequent process is not recommended after the last use case is successfully executed can be guaranteed, and meanwhile, the main concerned point or the use case part which is easy to cause problems in historical data can be guaranteed; in the summarizing and recommending process, the use cases recommended by the model of each random forest are counted and summarized, and the most recommended candidate target of the part with the occurrence frequency higher than the threshold value is obtained.

Step S310: and obtaining evaluation data of the multi-dimensional indexes.

Here, the test data 315 including a plurality of test case data is input to the model formed by the training, and then the target test case, which is the evaluation data of the multidimensional index, can be obtained.

In the embodiment of the application, the historical case data is used as sample data to perform feature extraction, a CRAT decision tree is constructed, and then a process of constructing a random forest model is performed, and a filter is formed to screen out a candidate test case set needing to be recommended; in the process of feature extraction, associated attribute information is added, the relevant conditions of defects are taken as added attributes, a good effect is obtained in random forest model training, hidden features of historical samples can be mined, in addition, in the method, a single use case is directly taken as a feature extraction unit, the trained feature information is obtained through the relevant feature extraction process, the selected use cases are independent, and a good test case recommendation result can be better obtained in the recommendation process; in the process of constructing the CRAT decision tree, the calculation mode of the kini index is modified, and an importance harmony function is added, so that the sample data which is correctly classified can be better biased in the recommendation process, and the recommendation is more practical; by adopting a random forest structure classification network model, a plurality of decision trees can describe sample data from different aspects, so that decision points in different directions can be found, and a recommended target test case can be better summarized.

Fig. 4 is a schematic diagram of a process for recommending a merged test case of a random forest model according to an embodiment of the present application, as shown in fig. 4: 41 to 4n represent history test cases 1, 2, 3, 4, \8230, 8230, and n are obtained by resampling sample data, and data models 411, 412, 413, 414, \8230, 8230and 41n are generated respectively. 400, after test data including a plurality of test cases are input into each data model, the plurality of data models are combined to obtain a classification network, output results of the plurality of data models are voted, recommended test cases are recorded and sorted according to parts exceeding a threshold value of each decision tree, occurrence frequency is accumulated, and high-frequency data are selected as recommended data of the cases, namely target test cases.

Based on the corresponding processing logic process obtained in the steps, when 20 random forest decision trees are used in the primary historical data, the overall recommendation success rate of the model of the primary historical data reaches over 95%, the recommendation accuracy rate of the random forest at each time can meet the target requirement, and good effects can be achieved under the condition of subsequent multiple combination.

An embodiment of the present application provides a test case recommendation device, and fig. 5 is a schematic structural diagram of the test case recommendation device provided in the embodiment of the present application, and as shown in fig. 5, the test case recommendation device 500 includes:

the first determination module 501: the method comprises the steps that a classification network comprising a plurality of random forests is adopted to determine a first test case of data to be tested in each random forest, and a first test case set is obtained;

a second determining module 502, configured to determine, in the first test case set, a target test case that meets a preset condition;

the testing module 503 is configured to test the data to be tested by using the target test case to obtain a testing result.

In the above apparatus, any one of the random forests includes at least one decision tree, and the first determining module 501 includes:

the first determining submodule is used for performing feature extraction on the data to be tested by adopting the decision tree in any random forest and determining first feature information corresponding to each decision tree in any random forest;

the second determining submodule is used for determining a test case of the to-be-tested data in each decision tree based on the first characteristic information to obtain a second test case set;

the third determining submodule is used for determining a test case with the weight meeting a preset weight value in the second test case set to obtain a first test case of any random forest;

and the fourth determining submodule is used for obtaining the first test case set based on the first test cases of the random forests.

In the above apparatus, the third determining sub-module includes:

the screening unit is used for screening the test cases with the weight meeting a first threshold value in the second test case set to obtain a third test case set;

a first determining unit, configured to determine a weight set corresponding to a same third test case in the third test case set;

and the second determining unit is used for determining the test case of which the mean value of the weight set meets a second threshold as the first test case of any random forest.

In the above apparatus, the third determining module includes:

the counting submodule is used for counting the frequency of the same first test case in the first test case set;

and the fifth determining submodule is used for determining the first test case with the highest frequency as the target test case.

In the above apparatus, the apparatus further comprises a training module for training the decision tree, the training module comprising:

the characteristic extraction submodule is used for extracting the characteristics of the sampling sample data by adopting a decision tree to be trained to obtain second characteristic information; wherein the second feature information comprises at least one feature information of the sample data; the sampling sample data is obtained by sampling the obtained sample data;

a prediction sub-module, configured to predict, based on the second feature information, a probability that the category of the sample data is a labeled category;

a sixth determining submodule, configured to associate the probability with an importance function to obtain a modified probability;

and the classification submodule is used for classifying the sampling sample data based on the corrected probability to obtain a trained decision tree.

In the above apparatus, the feature extraction sub-module includes:

a second determining unit for determining attribute information of the sample data and defect information generated when the sample data is executed;

and the extraction unit is used for extracting the attribute information and the defect information of the sampling sample data to obtain the second characteristic information.

In the above apparatus, the training module further comprises:

a third determining unit, configured to obtain the random forest based on at least one trained decision tree;

a fourth determining unit, configured to obtain the classification network based on at least one of the random forests.

Correspondingly, an embodiment of the present application provides a test case recommending terminal, and fig. 6 is a schematic diagram of a composition structure of the test case recommending terminal provided in the embodiment of the present application, as shown in fig. 6, the test case recommending terminal 600 at least includes: a controller 601 and a storage medium 602 configured to store executable instructions, wherein:

the controller 601 is configured to execute stored executable instructions for implementing the provided test case recommendation method.

Here, it should be noted that: the above description of the storage medium and device embodiments is similar to the description of the method embodiments above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the storage medium and the apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application. The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a component of' 8230; \8230;" does not exclude the presence of another like element in a process, method, article, or apparatus that comprises the element.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.

Alternatively, the integrated unit described above may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present application or portions thereof that contribute to the related art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.

The above description is only for the embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for recommending test cases is characterized by comprising the following steps:

2. The method of claim 1, wherein any of the random forests comprises at least one decision tree, and wherein determining a first set of test cases for the data to be tested in each of the random forests using a classification network comprising a plurality of random forests to obtain a first set of test cases comprises:

in any random forest, performing feature extraction on the data to be tested by adopting a decision tree in the random forest to determine first feature information corresponding to each decision tree;

determining a test case of the data to be tested in each decision tree based on the first characteristic information to obtain a second test case set;

determining a test case with weight meeting a preset weight value in the second test case set to obtain a first test case of any random forest;

and obtaining the first test case set based on the first test cases of the plurality of random forests.

3. The method according to claim 2, wherein the determining, in the second set of test cases, a test case whose weight satisfies a preset weight value to obtain the first test case of any random forest includes:

screening test cases with weights meeting a first threshold value in the second test case set to obtain a third test case set;

determining a weight set corresponding to the same third test case in the third test case set;

and determining the test case with the mean value of the weight set meeting a second threshold value as the first test case of any random forest.

4. The method according to claim 1, wherein the determining, in the first set of test cases, a target test case that meets a preset condition includes:

counting the frequency of the same first test case in the first test case set;

and determining the first test case with the highest frequency as the target test case.

5. The method of claim 2, wherein the method for training the decision tree comprises:

performing feature extraction on the sample data by adopting a decision tree to be trained to obtain second feature information; wherein the second feature information comprises at least one feature information of the sample data; the sampling sample data is obtained by sampling the obtained sample data;

predicting the probability that the category of the sampling sample data is a marked category based on the second characteristic information;

associating the probability with an importance function to obtain a corrected probability;

and classifying the sampling sample data based on the corrected probability to obtain a trained decision tree.

6. The method according to claim 5, wherein the performing feature extraction on the sample data by using the decision tree to be trained to obtain second feature information comprises:

determining attribute information of the sample data and defect information generated when the sample data is executed;

and extracting attribute information and defect information of the sampling sample data to obtain the second characteristic information.

7. The method of claim 5, further comprising:

obtaining the random forest based on at least one trained decision tree;

and obtaining the classification network based on at least one random forest.

8. A test case recommendation apparatus, the apparatus comprising:

9. A test case recommending terminal is characterized in that the terminal at least comprises: a controller and a storage medium configured to store executable instructions, wherein:

the controller is configured to execute stored executable instructions configured to perform the method of test case recommendation provided in any of claims 1 to 7 above.

10. A computer-readable storage medium having computer-executable instructions stored thereon, the computer-executable instructions configured to perform the test case recommendation method provided by any one of claims 1 to 7.