CN113537506B

CN113537506B - Test method, device, equipment and medium for machine learning effect

Info

Publication number: CN113537506B
Application number: CN202010322893.7A
Authority: CN
Inventors: 何晴; 赵晓平; 张文博
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-04-22
Filing date: 2020-04-22
Publication date: 2023-08-29
Anticipated expiration: 2040-04-22
Also published as: CN113537506A

Abstract

The application discloses a testing method, a device, equipment and a medium for machine learning effect, and relates to machine learning testing technology. The specific implementation scheme is as follows: acquiring an online feature set obtained by extracting features of the request data in an online system, and an online estimated value set obtained by estimating the online feature set by using a machine learning model; acquiring a first offline feature set obtained by extracting features of a sample set obtained in an application stage in an online system, and a first offline evaluation value set obtained by evaluating by using a machine learning model; acquiring first intersection data and second intersection data from the request data and the sample set respectively; and comparing the first intersection data with the second intersection data, and the on-line feature set, the first off-line feature set, the on-line pre-evaluation value set and the first off-line evaluation value set thereof, and determining a test result according to the comparison result. The embodiment of the application can effectively test whether the consistency problem of the model in an on-line system and an off-line system occurs or not.

Description

Test method, device, equipment and medium for machine learning effect

Technical Field

The present application relates to the field of machine learning, and in particular, to a machine learning test technique, and more particularly, to a method, apparatus, device, and medium for testing a machine learning effect.

Background

With the rapid development of artificial intelligence, machine learning technology is attracting more and more attention in the internet field, and has been applied in various product fields, such as a business search recommendation system, in which multiple types of supervised machine learning technology are applied to click rate estimation, risk estimation, and the like.

The iterative process of the supervised classification learning model comprises four steps of training, evaluation, estimation and application, wherein the iterative steps are usually realized by a single machine in teaching and research scenes such as schools, laboratories and the like, however, in industrial application scenes, the iterative process of the model is split into two types of architectures of on-line (mainly used for estimation and application) and off-line (mainly used for training and evaluation) under the driving of large data and high concurrent actual business, and the on-line and off-line systems have very large differences.

Specifically, due to the characteristics of different systems on line and off line, the combined action of multiple hidden dangers such as different upgrading modes, configuration modes, deployment modes and the like is corresponding, so that the situation that the model is inconsistent under the two systems on line and off line is caused, the service use effect of the model is greatly reduced, a gap is formed between the service use effect of the model and the expected service use effect of the investigation stage, and the situation that the effect loss of the machine learning technology is generally serious after the machine learning technology is applied to the line in industrial practice is caused.

Disclosure of Invention

The embodiment of the application provides a testing method, a device, equipment and a medium for machine learning effect, which are used for testing and evaluating whether inconsistent using effects of an on-line model and an off-line model exist or not.

In a first aspect, an embodiment of the present application provides a method for testing a machine learning effect, including:

acquiring an online feature set obtained by extracting features of the acquired request data in an online system, and estimating the online feature set by using a machine learning model to obtain an online pre-estimation value set;

acquiring a first offline feature set obtained by extracting features from a sample set obtained in an application stage in an offline system, and a first offline evaluation value set obtained by evaluating the first offline feature set by using the machine learning model;

acquiring first intersection data and second intersection data from the request data and the sample set, respectively, wherein the first intersection data and the second intersection data have the same data in at least one data dimension;

and respectively comparing the first intersection data with the second intersection data, the on-line feature set of the first intersection data and the first off-line feature set of the second intersection data, the on-line estimated value set of the first intersection data and the first off-line estimated value set of the second intersection data, and determining a test result according to the comparison result.

In a second aspect, an embodiment of the present application further provides a test apparatus for a machine learning effect, including:

the online pre-estimation value set determining module is used for obtaining an online feature set obtained by extracting features of the obtained request data in an online system and an online pre-estimation value set obtained by estimating the online feature set by utilizing a machine learning model;

the first offline evaluation value set determining module is used for acquiring a first offline feature set obtained by extracting features from a sample set obtained in an application stage in an online offline system and evaluating the first offline feature set by using the machine learning model;

an intersection data acquisition module for acquiring first intersection data and second intersection data from the request data and the sample set, respectively, wherein the first intersection data and the second intersection data have the same data in at least one data dimension;

the problem determining module is used for comparing the first intersection data with the second intersection data, the on-line feature set of the first intersection data and the first off-line feature set of the second intersection data, the on-line pre-evaluation value set of the first intersection data and the first off-line evaluation value set of the second intersection data respectively, and determining a test result according to a comparison result.

In a third aspect, an embodiment of the present application further provides an electronic device, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for testing machine learning effects of any embodiment of the present application.

In a fourth aspect, an embodiment of the present application further provides a non-transitory computer readable storage medium storing computer instructions for causing the computer to execute the test method for a machine learning effect according to any embodiment of the present application.

According to the technical scheme of the embodiment of the application, the acquired request data is subjected to feature extraction and estimation on the online system, the sample set is subjected to feature extraction and estimation on the offline system, then the request data and the first intersection data and the second intersection data which have the same data on at least one data dimension in the sample set are respectively compared, and the data in the three stages of data, extracted features and a predicted value/estimated value are respectively compared, so that test results are obtained, and the test and estimation on whether inconsistent problems exist in the using effects of the models in the online system and the offline system are realized, and further strong guarantee is provided for effective application of the models.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the following description, and additional effects of the alternative described above will be apparent from the following description of the specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present application and are not to be construed as limiting the application. Wherein:

FIG. 1 is a flow chart of a method of testing machine learning effects according to a first embodiment of the present application;

FIG. 2 is a flow chart of a method of testing for machine learning effects according to a second embodiment of the present application;

fig. 3 is a schematic structural view of a test apparatus for machine learning effect according to a third embodiment of the present application;

fig. 4 is a block diagram of an electronic device for implementing a method of testing for machine learning effects according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a flowchart of a testing method for machine learning effects according to a first embodiment of the present application, which is applicable to a case where testing and evaluation are performed on whether there is a problem of inconsistent use effects in an online system and an offline system of a model, for example, a supervised classification learning model. The method may be performed by a test device for machine learning effects, which is implemented in software and/or hardware, preferably configured in an electronic device, such as a server or the like. As shown in fig. 1, the method specifically includes the following steps:

s101, acquiring an online feature set obtained by extracting features of acquired request data in an online system, and estimating the online feature set by using a machine learning model to obtain an online estimated value set.

S102, acquiring a first offline feature set obtained by extracting features from a sample set obtained in an application stage in an online system, and a first offline evaluation value set obtained by evaluating the first offline feature set by using the machine learning model.

Generally, an iterative process of a model, such as supervised classification learning, is divided into four steps of training, evaluation, estimation and application, wherein the training and evaluation belong to an offline stage (also called an offline stage) and are mainly used for generating and evaluating the model, and the two stages are completed based on an offline system; the pre-estimation and application belong to an online stage (also called online stage), and are mainly used for pre-estimating request data acquired in real time on a line by a model, and specifically applying a pre-estimation result to a certain application scene, wherein the two stages are completed based on an online system.

Specifically, during the training phase, the training data set T is utilized ₁ Extracting features to obtain F ₁ Feature set, F ₁ After training, a machine learning model M is obtained ₁ The method comprises the steps of carrying out a first treatment on the surface of the In the evaluation phase, the evaluation data set E is utilized ₁ Extracting features to obtain F _E1 Feature set, application training orderModel M of segment yield ₁ Estimating to generate a predicted value set Q _E1 The method comprises the steps of carrying out a first treatment on the surface of the In the prediction stage, the prediction data set P is utilized ₁ Extracting features to obtain F _P1 Feature set, model M produced during training stage is applied ₁ Estimating to generate a predicted value set Q _P1 The method comprises the steps of carrying out a first treatment on the surface of the In the application stage, the estimated value set Q obtained in the estimation stage is used for _P1 Acting on application scene (such as business searching recommendation system), and obtaining sample set T by application feedback ₂ For example, by reacting Q _P1 A series of complex processes such as sorting and the like are carried out, samples to be recommended are screened out, and the samples are pushed to a user; thereafter, T ₂ Will continue to be used as training sample based on the user's T ₂ And (3) cycling to a training stage, and continuing to iterate to generate an updated model.

As described above, due to the difference between the on-line system and the off-line system, the effect of the same machine learning model used in the on-line system and the off-line system is inconsistent, for example, the effect may not reach the expected value when the on-line system is used and the use effect of the model is affected. Therefore, whether the using effect of the on-line system and the off-line system of the model is inconsistent or not needs to be tested and evaluated, the inconsistent problem is found timely, the reasons and links of the problem are found out and the influence degree of the problem is further evaluated, so that relevant personnel are guided to further perfect the model timely, and the loss of the effect of the on-line system is reduced after the on-line system and the off-line system of the model are applied to the industry practically.

The embodiment of the application adopts a mode of corresponding verification of data of each stage of an online and offline system, wherein the online and offline system can multiplex a real model training environment, and the online system can have two implementation ways, one is built and reproduced by a tester, and the other is used for printing sampling data by multiplexing an experimental environment on the line. Specifically, firstly, in an online system, feature extraction is performed on acquired request data to obtain an online feature set, an online pre-estimation value set is obtained by pre-estimating the online feature set by using a machine learning model, then in an offline system, feature extraction is performed on a sample set obtained in an application stage to obtain a first offline feature set, and the first offline feature set is evaluated by using the machine learning model to obtain a first offline evaluation value set. And then acquiring data with intersection from the request data and the sample set, respectively comparing the data with corresponding characteristic data and data of three stages of the set of the predicted value/the set of the estimated value based on the data with intersection, and determining whether the problem of inconsistent using effect of the model exists in the online system and the offline system according to the comparison result.

Here, in S101, the request data is obtained from the request data in real time on the line in the estimating stage, and because the data size of the request data in the estimating stage is very large, the request data in the estimating stage can be obtained from the request data on the line by means of flow replication and estimated in the on-line system when the test is performed.

S103, acquiring first intersection data and second intersection data from the request data and the sample set respectively, wherein the first intersection data and the second intersection data have the same data in at least one data dimension.

Wherein the data has the same data in at least one data dimension, such as search ID, recommended advertisement ID, or cookie ID, and the embodiment of the present application is not limited in any way.

S104, respectively comparing the first intersection data with the second intersection data, the on-line feature set of the first intersection data and the first off-line feature set of the second intersection data, the on-line pre-evaluation value set of the first intersection data and the first off-line evaluation value set of the second intersection data, and determining a test result according to a comparison result.

If at least one group of the first intersection data and the second intersection data, the on-line feature set of the first intersection data and the first off-line feature set of the second intersection data, the on-line pre-evaluation value set of the first intersection data and the first off-line evaluation value set of the second intersection data are inconsistent, the problem that the using effect of the model is inconsistent exists in the on-line system and the off-line system is indicated, and if the two groups of the on-line feature set of the first intersection data and the off-line feature set of the second intersection data are consistent, the problem that the using effect of the model is inconsistent does not exist is indicated.

It should be noted that, in the prior art, the AUC is generally used to evaluate the model effect, but it is only suitable for the investigation scene, and is only excessively ideal based on the complete consistency of the processing logic of the online and offline system. The embodiment of the application adopts a mode of checking the data correspondence of each stage of the online and offline system, can effectively confirm whether the consistency problem exists in the online system and the offline system of the model using effect under the condition that the online and offline system is inconsistent, reduces the condition of effect loss after the product is online, provides strong guarantee for effective application of the product, and promotes the rapid iteration of a new technology.

After determining whether the inconsistent problem exists, further, the embodiment of the application can also determine the reason and the link of the problem occurrence, thereby guiding the subsequent targeted adjustment of the model.

Specifically, the determining the test result according to the comparison result further includes the following three cases:

(1) If the first intersection data and the second intersection data are the same, the on-line feature set of the first intersection data is the same as the first off-line feature set of the second intersection data, and the on-line pre-evaluation value set of the first intersection data is different from the first off-line evaluation value set of the second intersection data, determining that the machine learning model has a problem of pre-estimation logic mismatch in a training stage and a pre-estimation stage;

the model firstly performs feature extraction on the data and then performs estimation, so that when the first intersection data and the second intersection data are the same, and the respective on-line feature sets are the same as the first off-line feature sets, and only when the on-line estimated value set and the first off-line estimated value set are different, the problem occurs in the final estimation, that is, the problem of mismatching of estimated logics exists in the estimation stage and the estimation stage of the machine learning model, the difference between the estimated value set and the estimated value set obtained by the on-line system and the off-line system is caused.

(2) If the first intersection data and the second intersection data are the same, the on-line feature set of the first intersection data is different from the first off-line feature set of the second intersection data, and the on-line pre-evaluation value set of the first intersection data is different from the first off-line evaluation value set of the second intersection data, determining that the feature extraction logic is inconsistent in the on-line system and the off-line system;

wherein the first intersection data and the second intersection data are identical, but their respective on-line feature sets and first off-line feature sets are different, and the on-line pre-evaluation value set and the first off-line evaluation value set are different, meaning that an inconsistency has occurred since feature extraction, resulting in the resulting pre-evaluation value set and evaluation value set also being different, and therefore it can be determined that there is an inconsistency in the feature extraction logic in the on-line system and the off-line system.

(3) If the first intersection data and the second intersection data are different, the on-line feature set of the first intersection data and the first off-line feature set of the second intersection data are different, and the on-line pre-evaluation value set of the first intersection data and the first off-line evaluation value set of the second intersection data are different, determining that a sample difference exists between the first intersection data and the second intersection data;

In the 3 rd case, the corresponding comparison of the data in all stages shows a different phenomenon, which indicates that there is a sample difference between the first intersection data and the second intersection data, and the model can be retrained based on the sample difference problem to update the perfect model.

According to the technical scheme, the acquired request data is subjected to feature extraction and prediction on the online system, the sample set is subjected to feature extraction and evaluation on the offline system, then the request data and the first intersection data and the second intersection data which have the same data on at least one data dimension in the sample set are respectively compared, and whether the inconsistent problem exists in the use effect of the models in the online system and the offline system or not is further tested and evaluated, so that powerful guarantee is provided for effective application of the models.

Fig. 2 is a flow chart of a testing method for machine learning effect according to a second embodiment of the present application, which is further optimized based on the above embodiment. As shown in fig. 2, the method specifically includes the following steps:

S201, acquiring an online feature set obtained by extracting features of the acquired request data in an online system, and estimating the online feature set by using a machine learning model to obtain an online estimated value set.

S202, acquiring a first offline feature set obtained by extracting features from a sample set obtained in an application stage in an online system, and a first offline evaluation value set obtained by evaluating the first offline feature set by using the machine learning model.

S203, acquiring first intersection data and second intersection data from the request data and the sample set respectively, wherein the first intersection data and the second intersection data have the same data in at least one data dimension.

S204, the first intersection data and the second intersection data, the on-line feature set of the first intersection data and the first off-line feature set of the second intersection data, the on-line pre-evaluation value set of the first intersection data and the first off-line evaluation value set of the second intersection data are compared respectively, and a test result is determined according to the comparison result.

S205, obtaining a second offline evaluation value set obtained by extracting the characteristics of the first intersection data in the offline system and evaluating the characteristic extraction result by using the machine learning model.

S206, acquiring a third offline evaluation value set obtained by evaluating the online characteristic set of the first intersection data in the offline system by using the machine learning model.

S207, calculating area AUC under ROC curves corresponding to the first offline evaluation value set, the second offline evaluation value set, the third offline evaluation value set and the online pre-evaluation value set of the first intersection data respectively, and determining the influence degree of different problems by comparing the attenuation amplitudes of the corresponding AUCs.

In the embodiment of the invention, through S201-S204, whether inconsistent problems exist can be determined, meanwhile, a link of occurrence of the problems can be judged according to a comparison result, and further, the influence degree of different problems can be evaluated through S205-S207.

Specifically, in order to determine the degree of influence of the problem, it is necessary to first determine the data of the final form of each stage of data, that is, the first offline evaluation value set of the second intersection data, the second offline evaluation value set of the first intersection data, the third offline evaluation value set directly obtained according to the online feature set of the first intersection data, and the online pre-evaluation value set of the first intersection data, where it is required to specify that the second offline evaluation value set and the third offline evaluation value set differ in that the second offline evaluation value set is obtained by performing feature extraction on the first intersection data in an online system and evaluating the result of the feature extraction by using a machine learning model, and the third offline evaluation value set is obtained by directly evaluating the online feature set of the first intersection data that has been extracted in the online system by using a machine learning model.

Then, the area AUC under the ROC curve corresponding to each of the first offline evaluation value set, the second offline evaluation value set, the third offline evaluation value set and the online pre-evaluation value set is calculated respectively, and the degree of influence of different problems can be determined by comparing the attenuation magnitudes of the AUCs corresponding to each other, for example, in contrast to AUCs with large attenuation magnitudes, the influence of the corresponding problems is large. The nature of the problem and the influence of the problem on the product can be found deeper through the evaluation of the influence degree of the problem, so that related personnel are guided to further perfect the model, the loss of the effect of the model after the on-line industrial application is reduced, powerful guarantee is provided for the effective application of the algorithm product, and the quick iteration of a new technology is promoted.

Specifically, for example, if the AUC attenuation amplitude of the on-line predicted value set is larger than the AUC attenuation amplitudes of the other three sets, it can be determined that the problem of calculating the predicted value set has a large influence; or if the AUC attenuation amplitude of the second offline evaluation value set is large, determining that the sample difference problem has a large influence; or if the AUC attenuation amplitude of the third under-line evaluation value set is large, it can be determined that the problem of feature extraction is greatly affected.

In addition, in the embodiment of the present invention, it may also be determined whether an inconsistent problem occurs and a link where the problem occurs by another method, including the following steps:

if the third offline evaluation value set is different from the online pre-evaluation value set of the first intersection data, determining that the machine learning model has a problem of mismatch of estimated logic in an evaluation stage and an estimation stage;

if the third offline evaluation value set is the same as the online pre-evaluation value set of the first intersection data and the second offline evaluation value set is different from the online pre-evaluation value set of the first intersection data, determining that the feature extraction logic is inconsistent in the online system and the offline system;

and if the third offline evaluation value set is the same as the online pre-evaluation value set of the first intersection data, the second offline evaluation value set is the same as the online pre-evaluation value set of the first intersection data, and the online pre-evaluation value set of the first intersection data is different from the first offline evaluation value set, determining that a sample difference exists between the first intersection data and the second intersection data.

According to the technical scheme, the acquired request data is subjected to feature extraction and prediction on the online system, the sample set is subjected to feature extraction and evaluation on the offline system, then the request data and the first intersection data and the second intersection data which have the same data on at least one data dimension in the sample set are respectively compared, and whether the inconsistent problem exists in the use effect of the model in the online system and the offline system or not is realized by the data comparison of the data, the extracted feature and the data of the pre-evaluation value/evaluation value. In addition, not only can the links of judging the occurrence of the problem through testing, but also the influence degree of the problem can be evaluated through the data comparison of the data of each stage falling on the final form, so that strong guarantee is further provided for the effective application of the model.

Fig. 3 is a schematic structural diagram of a test apparatus for machine learning effect according to a third embodiment of the present application, which is applicable to a case where whether or not there is a problem of inconsistent use effect in an on-line system and an off-line system of a model is tested and evaluated, for example, for a supervised classification learning model. The device can realize the testing method for the machine learning effect according to any embodiment of the application. As shown in fig. 3, the apparatus 300 specifically includes:

An online pre-estimation value set determining module 301, configured to obtain an online feature set obtained by extracting features of the obtained request data in an online system, and an online pre-estimation value set obtained by estimating the online feature set by using a machine learning model;

the first offline evaluation value set determining module 302 is configured to obtain a first offline feature set obtained by extracting features from a sample set obtained in an application stage in an online offline system, and a first offline evaluation value set obtained by evaluating the first offline feature set by using the machine learning model;

an intersection data obtaining module 303, configured to obtain first intersection data and second intersection data from the request data and the sample set, where the first intersection data and the second intersection data have the same data in at least one data dimension;

the problem determining module 304 is configured to compare the first intersection data and the second intersection data, the on-line feature set of the first intersection data and the first off-line feature set of the second intersection data, the on-line pre-evaluation value set of the first intersection data and the first off-line evaluation value set of the second intersection data, respectively, and determine a test result according to a comparison result.

Optionally, the problem determining module includes a first problem determining unit, configured to determine that, if the first intersection data and the second intersection data are the same, the on-line feature set of the first intersection data and the first off-line feature set of the second intersection data are the same, and the on-line estimated value set of the first intersection data and the first off-line estimated value set of the second intersection data are different, the machine learning model has a problem of prediction logic mismatch in a training estimation stage and a prediction stage.

Optionally, the problem determining module includes a second problem determining unit, configured to determine that, if the first intersection data and the second intersection data are the same, the on-line feature set of the first intersection data and the first off-line feature set of the second intersection data are different, and the on-line pre-evaluation value set of the first intersection data and the first off-line evaluation value set of the second intersection data are different, there is an inconsistency in the feature extraction logic in the on-line system and the off-line system.

Optionally, the problem determining module includes a third problem determining unit configured to determine that a sample difference exists between the first intersection data and the second intersection data if the first intersection data and the second intersection data are different, the on-line feature set of the first intersection data and the first off-line feature set of the second intersection data are different, and the on-line pre-evaluation value set of the first intersection data and the first off-line evaluation value set of the second intersection data are different.

Optionally, the apparatus further includes:

the second offline evaluation value set determining module is used for obtaining a second offline evaluation value set obtained by performing feature extraction on the first intersection data in the offline system and evaluating a result of the feature extraction by using the machine learning model;

a third offline evaluation value set determining module, configured to obtain a third offline evaluation value set obtained by evaluating, in the offline system, the online feature set of the first intersection data using the machine learning model;

the problem influence degree determining module is configured to calculate an area AUC under an ROC curve corresponding to each of the first offline evaluation value set, the second offline evaluation value set, the third offline evaluation value set, and the online pre-evaluation value set of the first intersection data, and determine influence degrees of different problems by comparing attenuation magnitudes of the AUCs corresponding to each other.

Optionally, the problem influence degree determining module is specifically configured to:

if the AUC attenuation amplitude of the on-line estimated value set of the first intersection data is large, determining that the problem of calculating the estimated value set has large influence; or (b)

If the AUC attenuation amplitude of the second offline evaluation value set is large, determining that the influence of the sample difference problem is large; or (b)

And if the AUC attenuation amplitude of the third under-line evaluation value set is large, determining that the problem of feature extraction has large influence.

Optionally, the apparatus further includes:

and the estimated logic mismatch problem determination module is used for determining that the machine learning model has an estimated logic mismatch problem in a training stage and an estimated stage if the third offline evaluation value set and the online estimated value set of the first intersection data are different.

Optionally, the apparatus further includes:

and the feature extraction logic inconsistency problem determining module is used for determining that the feature extraction logic is inconsistent in the online system and the offline system if the third offline evaluation value set is the same as the online pre-evaluation value set of the first intersection data and the second offline evaluation value set is different from the online pre-evaluation value set of the first intersection data.

Optionally, the apparatus further includes:

and the sample difference problem determining module is used for determining that a sample difference exists between the first intersection data and the second intersection data if the third offline evaluation value set is the same as the online pre-evaluation value set of the first intersection data, the second offline evaluation value set is the same as the online pre-evaluation value set of the first intersection data, and the online pre-evaluation value set of the first intersection data is different from the first offline evaluation value set.

The test device 300 for machine learning effect provided by the embodiment of the application can execute the test method for machine learning effect provided by any embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. Reference is made to the description of any method embodiment of the application for details not described in this embodiment.

According to an embodiment of the present application, the present application also provides an electronic device and a readable storage medium.

As shown in fig. 4, a block diagram of an electronic device according to a method for testing machine learning effects according to an embodiment of the present application is shown. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 4, the electronic device includes: one or more processors 401, memory 402, and interfaces for connecting the components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 401 is illustrated in fig. 4.

Memory 402 is a non-transitory computer readable storage medium provided by the present application. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the testing method for machine learning effects provided by the application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to execute the test method for machine learning effect provided by the present application.

The memory 402 is used as a non-transitory computer readable storage medium, and may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the online pre-evaluation value set determination module 301, the first offline evaluation value set determination module 302, the intersection data acquisition module 303, and the problem determination module 304 shown in fig. 3) corresponding to a test method for machine learning effects in an embodiment of the present application. The processor 401 executes various functional applications of the server and data processing, that is, implements the test method for machine learning effect in the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 402.

Memory 402 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created by use of an electronic device according to a test method for machine learning effects implementing an embodiment of the present application, and the like. In addition, memory 402 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 402 may optionally include memory remotely located with respect to processor 401, which may be connected via a network to an electronic device implementing the test method for machine learning effects of embodiments of the present application. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device for implementing the testing method for the machine learning effect according to the embodiment of the application can further comprise: an input device 403 and an output device 404. The processor 401, memory 402, input device 403, and output device 404 may be connected by a bus or otherwise, for example in fig. 4.

The input device 403 may receive input numeric or character information and generate key signal inputs related to user settings and function control of an electronic device implementing the test method for machine learning effects of embodiments of the present application, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, etc. input devices. The output device 404 may include a display apparatus, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the acquired request data is subjected to feature extraction and estimation on the online system, the sample set is subjected to feature extraction and estimation on the offline system, then the request data and the first intersection data and the second intersection data which have the same data on at least one data dimension in the sample set are respectively compared, and the data in the three stages of data, extracted features and a predicted value/estimated value are respectively compared, so that the test and estimation on whether inconsistent problems exist in the using effects of the models in the online system and the offline system are realized, and strong guarantee is further provided for effective application of the models.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed embodiments are achieved, and are not limited herein.

The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims

1. A method for testing machine learning effects, comprising:

respectively comparing the first intersection data with the second intersection data, the on-line feature set of the first intersection data and the first off-line feature set of the second intersection data, the on-line estimated value set of the first intersection data and the first off-line estimated value set of the second intersection data, and determining a test result according to a comparison result;

wherein, the determining the test result according to the comparison result includes:

if at least one group of the first intersection data and the second intersection data, the on-line feature set of the first intersection data and the first off-line feature set of the second intersection data, the on-line pre-evaluation value set of the first intersection data and the first off-line evaluation value set of the second intersection data are inconsistent, the problem that the using effect of the machine learning model is inconsistent exists in the on-line system and the off-line system is indicated, and if the two groups of the on-line feature set of the first intersection data and the first off-line feature set of the second intersection data are consistent, the problem that the using effect of the machine learning model is inconsistent does not exist.

2. The method of claim 1, wherein determining the test result based on the comparison result comprises:

if the first intersection data and the second intersection data are the same, the on-line feature set of the first intersection data is the same as the first off-line feature set of the second intersection data, and the on-line pre-evaluation value set of the first intersection data is different from the first off-line evaluation value set of the second intersection data, determining that the machine learning model has a problem of estimation logic mismatch in an evaluation stage and an estimation stage.

3. The method of claim 1, wherein determining the test result based on the comparison result comprises:

if the first intersection data and the second intersection data are the same, the on-line feature set of the first intersection data and the first off-line feature set of the second intersection data are different, and the on-line pre-evaluation value set of the first intersection data and the first off-line evaluation value set of the second intersection data are different, determining that the feature extraction logic is inconsistent in the on-line system and the off-line system.

4. The method of claim 1, wherein determining the test result based on the comparison result comprises:

If the first intersection data and the second intersection data are different, the on-line feature set of the first intersection data and the first off-line feature set of the second intersection data are different, and the on-line pre-evaluation value set of the first intersection data and the first off-line evaluation value set of the second intersection data are different, determining that a sample difference exists between the first intersection data and the second intersection data.

5. The method according to claim 1, wherein the method further comprises:

acquiring a second offline evaluation value set obtained by performing feature extraction on the first intersection data in the offline system and evaluating a result of the feature extraction by using the machine learning model;

acquiring a third offline evaluation value set obtained by evaluating an online feature set of the first intersection data in the offline system by using the machine learning model;

and respectively calculating the area AUC under the ROC curve corresponding to each of the first offline evaluation value set, the second offline evaluation value set, the third offline evaluation value set and the online pre-evaluation value set of the first intersection data, and determining the influence degree of different problems by comparing the attenuation amplitudes of the AUCs corresponding to each other.

6. The method of claim 5, wherein said determining the extent of impact of different problems by comparing the magnitude of the decay of the respective AUCs comprises:

7. The method of claim 5, wherein the method further comprises:

and if the third offline evaluation value set is different from the online pre-evaluation value set of the first intersection data, determining that the machine learning model has a problem of mismatch of estimated logic in an evaluation stage and an estimation stage.

8. The method of claim 5, wherein the method further comprises:

and if the third offline evaluation value set is the same as the online pre-evaluation value set of the first intersection data and the second offline evaluation value set is different from the online pre-evaluation value set of the first intersection data, determining that the feature extraction logic is inconsistent in the online system and the offline system.

9. The method of claim 5, wherein the method further comprises:

10. A test device for machine learning effects, comprising:

the problem determining module is used for respectively comparing the first intersection data with the second intersection data, the on-line feature set of the first intersection data with the first off-line feature set of the second intersection data, the on-line pre-evaluation value set of the first intersection data and the first off-line evaluation value set of the second intersection data, and determining a test result according to a comparison result;

the problem determining module determines a test result according to the comparison result, specifically:

11. The apparatus of claim 10, wherein the problem determination module comprises a first problem determination unit to determine that the machine learning model has a pre-estimation logical mismatch problem in an estimation phase and a pre-estimation phase if the first intersection data and the second intersection data are the same, an on-line feature set of the first intersection data and a first off-line feature set of the second intersection data are the same, and an on-line pre-estimation value set of the first intersection data and a first off-line estimation value set of the second intersection data are different.

12. The apparatus of claim 10, wherein the problem determination module comprises a second problem determination unit to determine that there is an inconsistency in the feature extraction logic in the on-line system and the off-line system if the first intersection data and the second intersection data are the same, the on-line feature set of the first intersection data and the first off-line feature set of the second intersection data are not the same, and the on-line pre-evaluation value set of the first intersection data and the first off-line evaluation value set of the second intersection data are not the same.

13. The apparatus of claim 10, wherein the problem determination module comprises a third problem determination unit to determine that a sample difference exists between the first intersection data and the second intersection data if the first intersection data and the second intersection data are not identical, the on-line feature set of the first intersection data and the first off-line feature set of the second intersection data are not identical, and the on-line pre-evaluation value set of the first intersection data and the first off-line evaluation value set of the second intersection data are not identical.

14. The apparatus of claim 10, wherein the apparatus further comprises:

15. The apparatus of claim 14, wherein the problem impact level determination module is specifically configured to:

16. The apparatus of claim 14, wherein the apparatus further comprises:

and the estimated logic mismatch problem determination module is used for determining that the machine learning model has an estimated logic mismatch problem in an evaluation stage and an estimation stage if the third offline evaluation value set and the online estimated value set of the first intersection data are different.

17. The apparatus of claim 14, wherein the apparatus further comprises:

18. The apparatus of claim 14, wherein the apparatus further comprises:

19. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of testing for machine learning effects of any one of claims 1-9.

20. A non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method for testing machine learning effects of any one of claims 1-9.