CN113342648A

CN113342648A - Test result analysis method and device based on machine learning

Info

Publication number: CN113342648A
Application number: CN202110601501.5A
Authority: CN
Inventors: 李文婷; 黄琼; 李美娜; 贺克军
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-09-03

Abstract

The embodiment of the application provides a test result analysis method and device based on machine learning, which can also be used in the financial field, and the method comprises the following steps: collecting a plurality of sample data from an original test result set and forming a test result training set by played-back random samples; classifying according to different data types of sample data in the test result training set, and then performing sample data preprocessing on the test result training set to obtain a test result training set subjected to the sample data preprocessing; performing random feature extraction on sample data in the test result training set to obtain a random forest model, and obtaining a target test result according to the random forest model; the method and the device can accurately and conveniently analyze the test result, save the cost of checking the transaction result, and simultaneously do not need testers to know the program logic.

Description

Test result analysis method and device based on machine learning

Technical Field

The application relates to the field of machine learning and can also be used in the field of finance, in particular to a test result analysis method and device based on machine learning.

Background

With the transformation of the existing service architecture, more and more service scenes are tested by a platform under a host, and each transaction is to manually check the test result, which takes longer time. And under the condition that the maturity of the tester is low, the test result checking task is harder. Currently, it is common practice to automatically test the comparison between the output value of the transaction result and the expected value, or to override the program logic to check the result.

The inventor finds that the transaction test result may involve checking a plurality of fields of a plurality of tables, the requirement on service testers is extremely high, the manual checking cost is high, and the transaction test result check cannot be completely covered if the output value of the transaction result is judged not to contain the key fields influencing the transaction result. At the same time, the tester is required to be familiar with the code logic, which increases the test cost.

Disclosure of Invention

Aiming at the problems in the prior art, the application provides a test result analysis method and device based on machine learning, which can accurately and conveniently analyze a test result, save the cost of checking a transaction result, and simultaneously do not need testers to know program logic.

In order to solve at least one of the above problems, the present application provides the following technical solutions:

in a first aspect, the present application provides a method for analyzing test results based on machine learning, including:

collecting a plurality of sample data from an original test result set and forming a test result training set by played-back random samples;

classifying according to different data types of sample data in the test result training set, and then performing sample data preprocessing on the test result training set to obtain a test result training set subjected to the sample data preprocessing;

and performing random feature extraction on the sample data in the test result training set to obtain a random forest model, and obtaining a target test result according to the random forest model.

Further, the preprocessing the test result training set with the sample data to obtain the test result training set preprocessed with the sample data includes:

initializing the sample data in the test result training set to obtain a sample data matrix;

and after Euclidean distance calculation is carried out on the sample data matrix, missing value filling processing is carried out on the sample data according to the weight value of the sample data corresponding to at least one minimum Euclidean distance in the Euclidean distance calculation result, and a test result training set after the missing value filling is obtained.

Further, after the random forest model is obtained through the building, before a target test result is obtained according to the random forest model, the method further includes:

and sequentially carrying out parameter adjustment on the random forest model according to a preset network search algorithm and a set step length, and determining the optimal parameter with the highest precision in a set parameter range as the parameter of the random forest model.

Further, after determining the optimal parameter with the highest precision in the set parameter range as the parameter of the random forest model, the method further includes:

and determining the average accuracy value of each test result training set according to a preset cross validation algorithm, and determining the parameter combination of the test result training set corresponding to the maximum average accuracy value as the optimal parameter.

In a second aspect, the present application provides a test result analysis apparatus based on machine learning, including:

the test result training set constructing module is used for collecting a plurality of sample data from the original test result set and forming a test result training set by the played-back random samples;

the test result training set preprocessing module is used for classifying the test result training set according to different data types of the sample data in the test result training set and then preprocessing the sample data of the test result training set to obtain a test result training set preprocessed by the sample data;

and the random forest analysis module is used for extracting the random characteristics of the sample data in the test result training set, constructing a random forest model and obtaining a target test result according to the random forest model.

Further, the test result training set preprocessing module comprises:

the sample data initialization unit is used for initializing the sample data in the test result training set to obtain a sample data matrix;

and the missing value filling unit is used for performing missing value filling processing on the sample data according to the weight value of the sample data corresponding to at least one minimum Euclidean distance in the Euclidean distance calculation result after performing Euclidean distance calculation on the sample data matrix to obtain a test result training set after the missing value filling.

Further, the random forest analysis module comprises:

and the network searching unit is used for sequentially carrying out parameter adjustment on the random forest model according to a preset network searching algorithm and a set step length, and determining the optimal parameter with the highest precision in a set parameter range as the parameter of the random forest model.

Further, the random forest analysis module comprises:

and the cross validation unit is used for determining the average accuracy value of each test result training set according to a preset cross validation algorithm and determining the parameter combination of the test result training set corresponding to the maximum average accuracy value as the optimal parameter.

In a third aspect, the present application provides an electronic device, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for analyzing test results based on machine learning when executing the program.

In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the machine learning-based test result analysis method.

According to the technical scheme, the test result analysis method and device based on machine learning are improved on the basis of an original random forest method, the classification effect is improved, the generalization capability is enhanced, the prediction accuracy of the test result is improved, the test result can be accurately and conveniently analyzed, the cost of checking the transaction result is saved, and meanwhile, testers do not need to know program logic.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of a method for analyzing test results based on machine learning according to an embodiment of the present application;

FIG. 2 is a second flowchart of a test result analysis method based on machine learning according to an embodiment of the present application;

FIG. 3 is a block diagram of a test result analysis apparatus based on machine learning according to an embodiment of the present application;

FIG. 4 is a second block diagram of a test result analysis apparatus based on machine learning according to an embodiment of the present application;

FIG. 5 is a third block diagram of a test result analysis apparatus based on machine learning according to an embodiment of the present application;

FIG. 6 is a fourth block diagram of a test result analysis apparatus based on machine learning according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the prior art, the transaction test result may involve checking of a plurality of fields of a plurality of tables, so that the requirements on service testers are extremely high, the manual checking cost is high, and the transaction test result check cannot be completely covered if the output value of the transaction result is judged not to contain the key fields influencing the transaction result. The method and the device for analyzing the test result based on machine learning improve the original random forest method, improve the classification effect, enhance the generalization capability, and improve the prediction accuracy of the test result, so that the test result can be accurately and conveniently analyzed, the cost for checking the transaction result is saved, and meanwhile, the tester does not need to know the program logic.

In order to accurately and conveniently analyze a test result, save the cost of checking a transaction result, and simultaneously, avoid the need for a tester to know program logic, the present application provides an embodiment of a test result analysis method based on machine learning, and referring to fig. 1, the test result analysis method based on machine learning specifically includes the following contents:

step S101: a plurality of sample data are collected from an original test result set, and a test result training set is formed by played-back random samples.

Optionally, the method and the device can acquire N samples from the original test result set, and randomly return the samples to form a test result training set.

Step S102: and classifying according to different data types of the sample data in the test result training set, and then preprocessing the sample data in the test result training set to obtain the test result training set preprocessed by the sample data.

Optionally, the test data of the test scenario in the test scheme may be classified according to the data type of the test result of the test scenario and then preprocessed, so as to obtain a test result training set after the sample data preprocessing.

Step S103: and performing random feature extraction on the sample data in the test result training set to obtain a random forest model, and obtaining a target test result according to the random forest model.

Optionally, D features can be randomly extracted from the D features of the sample, D < D (the extracted D features have an optimization space), a random forest model is formed by the sampled features, data is input, each tree obtains respective results, the results are voted, and the final result is obtained after more votes are voted.

From the above description, the test result analysis method based on machine learning provided by the embodiment of the application can improve the classification effect, enhance the generalization ability and improve the prediction accuracy of the test result by improving the original random forest method, so that the test result can be accurately and conveniently analyzed, the cost of checking the transaction result is saved, and meanwhile, a tester does not need to know program logic.

In order to ensure the data integrity of the training set of test results, in an embodiment of the method for analyzing test results based on machine learning according to the present application, referring to fig. 2, the step S102 may further include the following steps:

step S201: initializing the sample data in the test result training set to obtain a sample data matrix.

Step S202: and after Euclidean distance calculation is carried out on the sample data matrix, missing value filling processing is carried out on the sample data according to the weight value of the sample data corresponding to at least one minimum Euclidean distance in the Euclidean distance calculation result, and a test result training set after the missing value filling is obtained.

Optionally, as the result data collected by the database may have dirty data such as missing values or abnormal values, the missing values of the data may be processed by using a K-nearest neighbor algorithm.

Specifically, firstly, initializing data to form a data matrix, and secondly, performing euclidean distance calculation on the data of the matrix, wherein the formula is as formula 1-1:

D＝(|x₁-x_i1|²+|x₂-x_i2|²+...+|x_n-x_in|²)^1/2equation 1-1

Selecting K data with minimum Euclidean distance, and calculating weight value of the selected data

The data is then filled in.

It is understood that the above only deals with discrete attribute features, and if some classification features are encountered, the above procedure needs to be discretized and then used.

It will be appreciated that one-hot encoding is a representation of the classification variables as binary vectors. The values of the features are expanded to an Euclidean space, a certain value of the discrete features is a certain point corresponding to the Euclidean space, and the distance between the features can be calculated more reasonably by using one-hot coding, so that the feature expansion function is achieved to a certain extent. In the method, CORPERF, INSTYPE, FEEMODE and REPAYMENTSCHEME belong to classification variables, and after one-hot coding is carried out on the classification variables, the characteristic values of a data set are changed from a number a to b.

In another possible embodiment of the present application, the present application may further perform data normalization, that is, the data is reduced to be within a specific threshold according to a certain proportion, and for different data, different dimensions exist, the calculation is complex, and normalization processing needs to be performed on the data. The min-max method can be used to change the data to values between 0 and 1. The calculation formula is as follows:

in order to improve the accuracy of the random forest model, in an embodiment of the method for analyzing test results based on machine learning according to the present application, the step S103 may further include the following steps:

Optionally, the method and the device can perform parameter tuning on a Random Forest (RF) algorithm by using a grid search algorithm. Setting the number of random forest classification trees to be n _ estimators, setting the range to be [10, 100, 200, 500, 1000], the height of the classification trees to be max _ depth, setting the range to be [3, 4, 5,6, 7, 8], the maximum leaf node number to be max _ leaf _ nodes, and setting the range to be [11, 12, 13, 14, 15,16 ]. Parameters are sequentially adjusted according to step length by using a Sciket-Learn.GridSearchCV grid search algorithm in Python, the parameter with the highest precision is found in a specified parameter range, and the parameter is evaluated by adopting a 10-time cross validation method.

In order to improve the accuracy of the random forest model, in an embodiment of the method for analyzing test results based on machine learning, the following may be specifically included:

Optionally, in the present application, 10-fold cross validation may be used to train the training data set, the data set is averagely divided into 10 parts, 9 parts of the training data set are randomly selected as the training set, the remaining 1 part of the training data set is used as the test set, 10 times of training and testing are performed in total, a selected scoring mode such as accuracy is used to calculate an average value, and then a maximum parameter combination corresponding to a score is found out to obtain an optimal parameter with the highest classification accuracy. After multiple optimization through a grid search algorithm, the optimal parameter set is determined to be [ n _ estimators:100, max _ depth:5, max _ leaf _ nodes:16], and the accuracy of the algorithm is highest.

In order to accurately and conveniently analyze a test result, save the cost of checking a transaction result, and simultaneously, avoid the need for a tester to know program logic, the present application provides an embodiment of a test result analysis apparatus based on machine learning, which is used for implementing all or part of the contents of the test result analysis method based on machine learning, and referring to fig. 3, the test result analysis apparatus based on machine learning specifically includes the following contents:

and the test result training set constructing module 10 is used for acquiring a plurality of sample data from the original test result set and forming a test result training set by the played-back random samples.

And the test result training set preprocessing module 20 is configured to perform sample data preprocessing on the test result training set after classifying according to different data types of sample data in the test result training set, so as to obtain the test result training set after the sample data preprocessing.

And the random forest analysis module 30 is used for extracting random features of the sample data in the test result training set, constructing a random forest model and obtaining a target test result according to the random forest model.

From the above description, the test result analysis device based on machine learning provided by the embodiment of the application can improve the classification effect, enhance the generalization ability and improve the prediction accuracy of the test result through improvement on the original random forest method, so that the test result can be accurately and conveniently analyzed, the cost of checking the transaction result is saved, and meanwhile, a tester does not need to know program logic.

In order to ensure the data integrity of the training set of test results, in an embodiment of the test result analysis apparatus based on machine learning of the present application, referring to fig. 4, the training set preprocessing module 20 includes:

a sample data initializing unit 21, configured to initialize the sample data in the test result training set to obtain a sample data matrix.

And the missing value filling unit 22 is configured to perform missing value filling processing on the sample data according to a weighted value of the sample data corresponding to at least one minimum euclidean distance in the euclidean distance calculation result after performing euclidean distance calculation on the sample data matrix, so as to obtain a test result training set after missing value filling.

In order to improve the accuracy of the random forest model, in an embodiment of the test result analysis apparatus based on machine learning of the present application, referring to fig. 5, the random forest analysis module 30 includes:

and the network searching unit 31 is configured to perform parameter adjustment on the random forest model in sequence according to a preset network searching algorithm and a set step length, and determine an optimal parameter with the highest precision in a set parameter range as a parameter of the random forest model.

In order to improve the accuracy of the random forest model, in an embodiment of the test result analysis apparatus based on machine learning of the present application, referring to fig. 6, the random forest analysis module 30 includes:

and the cross validation unit 32 is configured to determine an average accuracy value of each test result training set according to a preset cross validation algorithm, and determine a parameter combination of the test result training set corresponding to the maximum average accuracy value as an optimal parameter.

In order to accurately and conveniently analyze the test result and save the cost of checking the transaction result and to avoid the need for the tester to know the program logic, the application provides an embodiment of an electronic device for implementing all or part of the contents in the test result analysis method based on machine learning, and the electronic device specifically includes the following contents:

a processor (processor), a memory (memory), a communication Interface (Communications Interface), and a bus; the processor, the memory and the communication interface complete mutual communication through the bus; the communication interface is used for realizing information transmission between the test result analysis device based on machine learning and relevant equipment such as a core service system, a user terminal, a relevant database and the like; the logic controller may be a desktop computer, a tablet computer, a mobile terminal, and the like, but the embodiment is not limited thereto. In this embodiment, the logic controller may refer to the embodiment of the method for analyzing test results based on machine learning and the embodiment of the apparatus for analyzing test results based on machine learning in the embodiments, which are incorporated herein, and repeated details are not repeated.

It is understood that the user terminal may include a smart phone, a tablet electronic device, a network set-top box, a portable computer, a desktop computer, a Personal Digital Assistant (PDA), an in-vehicle device, a smart wearable device, and the like. Wherein, intelligence wearing equipment can include intelligent glasses, intelligent wrist-watch, intelligent bracelet etc..

In practical applications, part of the test result analysis method based on machine learning may be performed on the electronic device side as described above, or all operations may be performed in the client device. The selection may be specifically performed according to the processing capability of the client device, the limitation of the user usage scenario, and the like. This is not a limitation of the present application. The client device may further include a processor if all operations are performed in the client device.

The client device may have a communication module (i.e., a communication unit), and may be communicatively connected to a remote server to implement data transmission with the server. The server may include a server on the task scheduling center side, and in other implementation scenarios, the server may also include a server on an intermediate platform, for example, a server on a third-party server platform that is communicatively linked to the task scheduling center server. The server may include a single computer device, or may include a server cluster formed by a plurality of servers, or a server structure of a distributed apparatus.

Fig. 7 is a schematic block diagram of a system configuration of an electronic device 9600 according to an embodiment of the present application. As shown in fig. 7, the electronic device 9600 can include a central processor 9100 and a memory 9140; the memory 9140 is coupled to the central processor 9100. Notably, this fig. 7 is exemplary; other types of structures may also be used in addition to or in place of the structure to implement telecommunications or other functions.

In one embodiment, the machine learning based test result analysis method functions may be integrated into the central processor 9100. The central processor 9100 may be configured to control as follows:

From the above description, it can be seen that the electronic device provided in the embodiment of the present application improves on the original random forest method, improves the classification effect, enhances the generalization ability, and also improves the prediction accuracy of the test result, so that the test result can be accurately and conveniently analyzed, the cost for checking the transaction result is saved, and meanwhile, the tester does not need to know the program logic.

In another embodiment, the machine learning based test result analysis apparatus may be configured separately from the central processor 9100, for example, the machine learning based test result analysis apparatus may be configured as a chip connected to the central processor 9100, and the machine learning based test result analysis method function may be realized by the control of the central processor.

As shown in fig. 7, the electronic device 9600 may further include: a communication module 9110, an input unit 9120, an audio processor 9130, a display 9160, and a power supply 9170. It is noted that the electronic device 9600 also does not necessarily include all of the components shown in fig. 7; further, the electronic device 9600 may further include components not shown in fig. 7, which may be referred to in the art.

As shown in fig. 7, a central processor 9100, sometimes referred to as a controller or operational control, can include a microprocessor or other processor device and/or logic device, which central processor 9100 receives input and controls the operation of the various components of the electronic device 9600.

The memory 9140 can be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information relating to the failure may be stored, and a program for executing the information may be stored. And the central processing unit 9100 can execute the program stored in the memory 9140 to realize information storage or processing, or the like.

The input unit 9120 provides input to the central processor 9100. The input unit 9120 is, for example, a key or a touch input device. Power supply 9170 is used to provide power to electronic device 9600. The display 9160 is used for displaying display objects such as images and characters. The display may be, for example, an LCD display, but is not limited thereto.

The memory 9140 can be a solid state memory, e.g., Read Only Memory (ROM), Random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 9140 could also be some other type of device. Memory 9140 includes a buffer memory 9141 (sometimes referred to as a buffer). The memory 9140 may include an application/function storage portion 9142, the application/function storage portion 9142 being used for storing application programs and function programs or for executing a flow of operations of the electronic device 9600 by the central processor 9100.

The memory 9140 can also include a data store 9143, the data store 9143 being used to store data, such as contacts, digital data, pictures, sounds, and/or any other data used by an electronic device. The driver storage portion 9144 of the memory 9140 may include various drivers for the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, contact book applications, etc.).

The communication module 9110 is a transmitter/receiver 9110 that transmits and receives signals via an antenna 9111. The communication module (transmitter/receiver) 9110 is coupled to the central processor 9100 to provide input signals and receive output signals, which may be the same as in the case of a conventional mobile communication terminal.

Based on different communication technologies, a plurality of communication modules 9110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, may be provided in the same electronic device. The communication module (transmitter/receiver) 9110 is also coupled to a speaker 9131 and a microphone 9132 via an audio processor 9130 to provide audio output via the speaker 9131 and receive audio input from the microphone 9132, thereby implementing ordinary telecommunications functions. The audio processor 9130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 9130 is also coupled to the central processor 9100, thereby enabling recording locally through the microphone 9132 and enabling locally stored sounds to be played through the speaker 9131.

An embodiment of the present application further provides a computer-readable storage medium capable of implementing all steps in the machine learning based test result analysis method in which the execution subject in the above embodiment is the server or the client, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the computer program implements all steps of the machine learning based test result analysis method in which the execution subject in the above embodiment is the server or the client, for example, when the processor executes the computer program, the processor implements the following steps:

From the above description, it can be seen that the computer-readable storage medium provided in the embodiment of the present application improves the original random forest method, improves the classification effect, enhances the generalization ability, and simultaneously improves the prediction accuracy of the test result, so that the test result can be accurately and conveniently analyzed, the cost for checking the transaction result is saved, and meanwhile, the tester does not need to know the program logic.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for analyzing test results based on machine learning, the method comprising:

2. The method according to claim 1, wherein the pre-processing the test result training set with the sample data to obtain the pre-processed test result training set comprises:

3. The machine-learning-based test result analysis method of claim 1, wherein after the constructing a random forest model and before the obtaining a target test result according to the random forest model, further comprising:

4. The machine learning-based test result analysis method according to claim 3, wherein after determining the most accurate optimal parameter within the set parameter range as the parameter of the random forest model, the method further comprises:

5. A test result analysis device based on machine learning, comprising:

6. The machine-learning-based test result analysis device of claim 5, wherein the test result training set preprocessing module comprises:

7. The machine-learning-based test result analysis device of claim 5, wherein the random forest analysis module comprises:

8. The machine-learning-based test result analysis device of claim 5, wherein the random forest analysis module comprises:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the machine learning based test result analysis method of any one of claims 1 to 4 are implemented when the program is executed by the processor.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the machine learning based test result analysis method of any one of claims 1 to 4.