CN111340102B

CN111340102B - Method and apparatus for evaluating model interpretation tools

Info

Publication number: CN111340102B
Application number: CN202010112949.6A
Authority: CN
Inventors: 方军鹏; 唐才智
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-02-24
Filing date: 2020-02-24
Publication date: 2022-03-01
Anticipated expiration: 2040-02-24
Also published as: CN111340102A

Abstract

The embodiment of the specification provides a method and a device for evaluating a model interpretation tool, wherein the method is carried out based on a first model and a plurality of first training samples which are acquired in advance and used for the first model, and comprises the following steps: selecting n features from the plurality of features as n selected features; replacing feature values of features in each of the first training samples except the n selected features with other values to obtain a plurality of second training samples; training the first model using the plurality of second training samples to obtain a first model having a first set of parameters; obtaining, by a model interpretation tool, an importance ranking of the plurality of features based on the plurality of second training samples and the first parameter set; determining recall of the top n features of the importance ranking relative to the n selected features for evaluation of the model interpretation tool.

Description

Method and apparatus for evaluating model interpretation tools

Technical Field

The embodiment of the specification relates to the technical field of machine learning, in particular to a method and a device for evaluating a model interpretation tool.

Background

Machine learning is currently used in a wide variety of fields, such as retail, technical, health care, scientific, and so forth. Machine learning models essentially fit a complex function to the relationship between the data and the target. The machine learning model is very different from some simple rules, which define the relationship between data and objects, but the machine learning model is a black box with only input and output, and does not understand the internal mechanism. In some areas, particularly in the financial area, such as insurance, banking, etc., data scientists often end up having to use more traditional and simpler machine learning models (linear models or decision tree models). However, although such simple models provide some interpretability, simple models are not good enough to achieve complex tasks, and are necessarily inferior to more complex depth models in terms of model accuracy performance and the like.

For example, the user who uses flower bei can pay for the next month in advance, and the function is similar to a credit card, which means that the user has the risk of cash register, and the cash register user has higher probability of overdue payment than a normal user, thereby causing loss to the company. To reduce the risk, interception of such cash-out transactions is necessary, or approval fails for small loans, and the user should be given a reasonable interpretation. However, considering that the financial scenario is sensitive, the interpretability requirements for the used interception model are necessarily high. The traditional method is to use some simple models such as linear model tree models, although the simple models can meet the requirement of interpretability, the accuracy performance of the simple models cannot meet the business requirement relative to the complex practical situation, for example, the loss that too low accuracy can intercept a large amount of normal transactions and injure normal users by mistake is also unbearable.

In view of the above problems, at present, a variety of model-independent tools for explaining the model are proposed, so as to reasonably explain the black box model actually applied in the business scene, and because the model is not changed, the performance of the model is not affected. Currently, methods for measuring model interpretation tools include a priori measurement methods, evaluation methods of model interpretation tools for image classification models, evaluation methods of model interpretation tools for text classification models, and the like. However, there is still no method that is applicable to multiple model interpretation tools simultaneously.

Therefore, a more efficient approach for evaluating model interpretation tools is needed.

Disclosure of Invention

The embodiments of the present specification aim to provide a more efficient solution for evaluating model interpretation tools to overcome the deficiencies in the prior art.

To achieve the above object, one aspect of the present specification provides a method for evaluating a model interpretation tool, the method being performed based on a first model and a plurality of first training samples acquired in advance for the first model, wherein each of the first training samples includes feature values of a plurality of features of a business object, the method comprising:

selecting n features from the plurality of features as n selected features;

replacing feature values of features in each of the first training samples except the n selected features with other values to obtain a plurality of second training samples;

training the first model using the plurality of second training samples to obtain a first model having a first set of parameters;

obtaining, by a model interpretation tool, an importance ranking of the plurality of features based on the plurality of second training samples and the first parameter set;

determining recall of the top n features of the importance ranking relative to the n selected features for evaluation of the model interpretation tool.

In one embodiment, the first model is a non-self-explanatory model.

In one embodiment, selecting n features from the plurality of features as the n selected features includes randomly selecting n features from the plurality of features as the n selected features.

In one embodiment, replacing the feature values of the features other than the n selected features in each of the first training samples with other values comprises replacing the feature values of the features other than the n selected features in each of the first training samples with other values determined randomly.

In one embodiment, the method is performed a plurality of times to obtain a plurality of recall rates, wherein the n selected features have a different combination of features in each execution of the method than the respective sets of n selected features corresponding to the other respective executions, the method further comprising obtaining an average recall rate based on the plurality of recall rates for evaluating the model interpretation tool.

In one embodiment, the business object is one or more of the following objects in the network platform: user, merchant, commodity, transaction.

In one embodiment, the business object is a platform user, each training sample includes a risk value of the user as a label value, and the first model is used to be trained as a risk control model based on the plurality of first training samples.

Another aspect of the present specification provides an apparatus for evaluating a model interpretation tool, the apparatus being deployed based on a first model and a plurality of first training samples acquired in advance for the first model, wherein each of the first training samples includes feature values of a plurality of features of a business object, the apparatus comprising:

a selecting unit configured to select n features from the plurality of features as n selected features;

a replacing unit configured to replace feature values of features other than the n selected features in each of the first training samples with other values to obtain a plurality of second training samples;

a training unit configured to train the first model using the plurality of second training samples to obtain a first model having a first parameter set;

a ranking unit configured to obtain an importance ranking of the plurality of features by a model interpretation tool based on the plurality of second training samples and the first parameter set;

a determining unit configured to determine recall of the top n features of the importance ranking relative to the n selected features for evaluation of the model interpretation tool.

In one embodiment, the selecting unit is further configured to randomly select n features from the plurality of features as the n selected features.

In one embodiment, the replacement unit is further configured to replace feature values of features other than the n selected features in each of the first training samples with other values determined at random.

In an embodiment, the apparatus is deployed a plurality of times to obtain a plurality of recall rates, wherein in each deployment of the apparatus the n selected features have a different combination of features than the respective sets of n selected features corresponding to the other respective deployments, wherein the apparatus further comprises an averaging unit configured to obtain an average recall rate based on the plurality of recall rates for evaluating the model interpretation tool.

Another aspect of the present specification provides a computer readable storage medium having a computer program stored thereon, which, when executed in a computer, causes the computer to perform any one of the above methods.

Another aspect of the present specification provides a computing device comprising a memory having stored therein executable code, and a processor that, when executing the executable code, implements any of the methods described above.

According to the scheme of the evaluation model interpretation tool, the black box model is trained and learned by using the data set of the replacement features, the real feature ratio of the first n features given by the model interpretation tool is counted, a certain number of feature combinations are traversed in order to avoid overfitting to a certain feature combination interpretation, and the interpreted results are averaged, so that the final relatively objective evaluation index is obtained.

Drawings

The embodiments of the present specification may be made more clear by describing the embodiments with reference to the attached drawings:

FIG. 1 shows a schematic diagram of a system 100 for evaluating a model interpretation tool in accordance with an embodiment of the present description;

FIG. 2 illustrates a flow diagram of a method of evaluating a model interpretation tool in accordance with an embodiment of the present description;

FIG. 3 schematically illustrates a plurality of parallel executions of steps S204-S212 of the method of FIG. 2;

FIG. 4 illustrates an apparatus 400 for evaluating a model interpretation tool in accordance with an embodiment of the present description.

Detailed Description

The embodiments of the present specification will be described below with reference to the accompanying drawings.

FIG. 1 shows a schematic diagram of a system 100 for evaluating a model interpretation tool in accordance with an embodiment of the present description. As shown in fig. 1, the system 100 includes a sample processing unit 11, a black box model 12, a model interpretation tool 13, and a calculation unit 14. The black box model 12, i.e. the non-self-interpretation model that is expected to be interpreted by the model interpretation tool 13, such as various complex neural network models, etc., the neural network model cannot interpret the importance of the sample features through its various parameters or network structures due to its complex structure of multiple layers and multiple neurons. The black box model 12 may be trained over a plurality of training samples associated with a particular business, becoming a business process model, such as a risk control model. For example, the specific service is to classify users in the network platform, for example, to classify the users into low-risk users and high-risk users, high-consumption users, low-consumption users, and so on, so that the training sample includes feature values of various features of the users, for example, gender, age, monthly transaction amount, credit amount, and so on of the users, and tag values of the users, for example, indicating whether the users are high-risk users, for example, in a flower situation, indicating whether the users are cash-out high-risk users, in a transaction situation, indicating whether the users are fraud high-risk users, and so on. It is to be understood that, although the user in the platform is taken as an example for description, in this embodiment of the present disclosure, the training sample may correspond to one or more of the following objects in the network platform: users, merchants, goods, transactions, etc. For example, the black box model is a commodity pushing model, the training sample may include features of two objects, namely a user and a commodity, in the platform, and the tag value of the training sample corresponds to whether the user purchases the commodity. In this scenario, the model interpretation tool may also be evaluated by the system shown in FIG. 1.

In the embodiment of the present specification, after a plurality of training samples are acquired, the plurality of training samples are first processed by the sample processing unit 11. Specifically, the feature values of n selected features of each sample are retained, and the feature values of other features of each sample are replaced with arbitrary values, for example, random values, so as to obtain a new training sample set, and by performing the process multiple times, and in each process, selecting different n selected features, a plurality of training sample sets can be obtained, which are schematically illustrated as sample set 1, sample set 2, and sample set 3, and it is understood that the present invention is not limited to obtaining only 3 different sets of samples.

By training the black box models 12 with the above 3 training sample sets, 3 black box models with different parameter sets can be obtained, and by interpreting the respective trained black box models 12 using the model interpretation tool 13, a plurality of sets of ranks, rank 1, rank 2, and rank 3 are schematically shown in the figure, corresponding to the respective sets of parameter sets and sample sets, and are sent to the calculation unit 14. The model interpretation tool 13 is, for example, a LIME (local interpretation model-agnostic interpretation tool), a SHAP (SHapley additive interpretation tool, xiapril value adding interpretation), or the like.

The calculation unit 14 may calculate recall ratios of the first n features in each set of rankings with respect to the corresponding n selected features, and take the average of the plurality of recall ratios as a final evaluation score of the model interpretation tool 13, so that the evaluation score is not dependent on a specific n features, but is directed to a plurality of sets of n features of different combinations of the plurality of features, thereby having certain universality and objectivity. The higher this final evaluation score, the more accurate the model interpretation tool 13 interprets the model. So that an appropriate interpretation tool can be selected for interpreting the black box model 12 based on the evaluation scores of the various interpretation tools.

The procedure of the above evaluation model interpretation tool will be described in detail below.

FIG. 2 shows a flowchart of a method of evaluating a model interpretation tool, according to an embodiment of the present description, comprising:

step S202, obtaining a plurality of first training samples, wherein each first training sample comprises characteristic values of a plurality of characteristics of a business object and a label value of the business object;

step S204, selecting n selected characteristics from the characteristics;

step S206, replacing the characteristic value of the characteristic except the n selected characteristics in each first training sample with other values to obtain a plurality of second training samples;

step S208, training a first model by using the plurality of second training samples to obtain a first model with a first parameter set;

step S210, based on the plurality of second training samples and the first parameter group, obtaining importance ranking of the plurality of features through a model interpretation tool;

step S212, determining recall of the top n features of the importance ranking relative to the n selected features for evaluation of the model interpretation tool.

First, in step S202, a plurality of first training samples are obtained, where each of the first training samples includes feature values of a plurality of features of a business object and a label value of the business object.

The plurality of first training samples correspond to a plurality of users in the network platform respectively, for example, each first training sample includes characteristic values of a plurality of characteristics of the corresponding user, such as sex, age, monthly transaction amount, monthly loan amount, monthly income, annual tax payment amount, and the like. In addition, each training sample further includes a label value of its corresponding user, where the label value is, for example, 0 or 1, where 0 represents a low-risk user and 1 represents a high-risk user. It is to be understood that, here, the label value of the sample is 0 or 1, that is, the first model is a binary model, however, in the embodiment of the present specification, the first model is not limited to be a binary model, but may be a multi-classification model, that is, the label value of the sample may be a plurality of values, or may also be a regression model, and so on.

In step S204, n selected features are selected from the plurality of features.

For example, the plurality of features is 20 features in total, and n may be set to 10. It will be appreciated that the value of n may be set relative to the plurality of features according to the accuracy requirements and the number of significant features, e.g. the requirement for accuracy is high, n may be set small, e.g. n may be set to 5, e.g. n may be set to 15 if the first 15 of the plurality of features are determined to be significant, etc. The 10 features may be selected from the 20 features in various ways, for example, 10 features may be randomly selected from the 20 features, or 10 features may be selected from the 20 features according to a predetermined rule, for example, 10 features sequentially arranged from the 20 features, 10 features in odd-numbered positions from the 20 features, and the like. In one embodiment, the number of combinations N of 10 features selected from the 20 features may be determined first, i.e.

So that one combination can be randomly determined from the N combinations as the 10 selected features, or one combination can be determined from the N combinations as the 10 selected features with a predetermined rule.

In step S206, the feature values of the features other than the n selected features in each of the first training samples are replaced with arbitrary values to obtain a plurality of second training samples.

After n selected features are determined, for example, the 20 features are f 1-f 20, and the n selected features are f 1-f 10, i.e., features f 11-f 20 are unselected features. Thus, for each first training sample, the eigenvalues of the features f 11-f 20 in that sample are respectively replaced with other values. The other value may be randomly determined or predetermined. In particular, for example, for a first training sample X₁The initial value of the characteristic f11 is x₁₁Can randomly acquire and initial value x₁₁Different value x'₁₁To replace the sample X₁In x₁₁By similarly applying to the sample X₁The initial values of the features f 12-f 20 are replaced, so that the sample X can be obtained₁(i.e., first training sample) corresponding new sample (i.e., second training sample) X'₁By similarly performing the above-described processing on each of the plurality of first training samples, a plurality of second training samples can be obtained.

In step S208, the first model is trained using the plurality of second training samples to obtain a first model having a first parameter set.

As described above, the first model is, for example, a black box model, such as various neural network models, such as a CNN model, a DNN model, a reinforcement learning model, and so on. It is to be understood that, in the embodiments of the present specification, the first model is not limited to being a black box model, but may also be a self-explanatory model, such as a logistic regression model, a linear regression model, a support vector machine model, a tree model, a bayesian model, a KNN model, a neural network model with a defined network structure, and the like. The first model may be trained based on the plurality of second training samples by various optimization methods, such as a gradient descent method, a back propagation method, and the like, which are not limited herein. After the first model is trained using the plurality of second training samples, the plurality of parameters of the first model are changed, thereby obtaining a first model having a first parameter set.

In step S210, based on the plurality of second training samples and the first parameter set, an importance ranking of the plurality of features is obtained by a model interpretation tool.

As described above, the model interpretation tool may be any of the existing model interpretation tools, such as LIME, SHAP, and the like. For example, for LIME, it is used to perform interference on sample 1 in the second training sample, and a plurality of disturbance samples adjacent to sample 1 may be obtained, and by inputting the plurality of disturbance samples into the trained first model, a model prediction value is obtained based on the first parameter set, and then a linear function is fitted based on data of the disturbance samples, so that the importance of each feature locally in the vicinity of sample 1 is determined based on the linear function. By performing the above-described process on each of the second training samples, the importance of each feature obtained by each sample can be averaged, and the importance ranking of each feature as a whole can be obtained. The local or global ranking of importance of the above-described features f 1-f 20 can be determined, for example, by LIME as described above.

In step S212, recall of the top n features of the importance ranking relative to the n selected features is determined for evaluation of the model interpretation tool.

Recall, which is commonly used in document searching, represents the percentage of relevant documents from a document collection relative to all relevant documents, and is used herein to represent the percentage of selected features contained in the top n features in the importance ranking determined by the model interpretation tool relative to all n selected features.

For example, n is taken to be 10, and in this step, the ratio of the selected features in the top 10 of the importance rankings to all selected features is determined and used as an evaluation score for the model interpretation tool. For example, as described above, the 10 selected features include: f 1-f 10, in the importance ranking, the top 10 features include: f2, f5, f11, f6, f7, f8, f9, f15, f10, f16, so that 7 features in total of the top 10 features in the importance ranking can be determined as the selected feature, so that the recall ratio of the top 10 features in the importance ranking to the selected feature can be calculated as 7/10-0.7, which can be used as the evaluation score for the model interpretation tool.

In one embodiment, steps S204-S212 of the method of FIG. 2 are performed a plurality of times to obtain a plurality of recall ratios, and the plurality of recall ratios are averaged as an evaluation score for the model interpretation tool. Fig. 3 schematically shows a schematic diagram of a plurality of parallel executions of steps S204-S212 of the method shown in fig. 2. As shown in fig. 3, after acquiring a plurality of first training samples in step S202, step S204 is performed m times (schematically shown as 3 times) in parallel to select m sets of n selected features of different combinations from the plurality of features, respectively. Here, each time step S204 is executed, n selected features may be randomly selected. Or according to a predetermined rule, e.g. after determining the number N of combinations of 10 features out of 20 (i.e. as described above)

) For N combinations arranged sequentially, a rule may be predetermined that: the m groups of features are m of the N groups spaced apart from each other, e.g., the 1 st of the N groups is determined to be the 1 st group of N selected features, the 1001 st group is determined to be the 2 nd group of N selected features, the 2001 group is determined to be the 3 rd group of N selected features, and so on. By determining the m sets of n selected features in this manner, the m sets of n selected features can cover more of the plurality of features, thereby making the evaluation of the model interpretation tool more accurate. Here, although m is schematically shown as 3 times,in practice, m may be determined to be, for example, 20%, 30%, etc. of N to cover more features of the model.

Then, step S206 is executed in parallel, so that 3 different sets of training sample sets, i.e., sample set 1 to sample set 3 in fig. 3, are obtained. Then, step S208 is performed in parallel to acquire 3 sets of parameters of the black box model. Then, step S210 is executed in parallel, 3 importance ranks corresponding respectively are obtained based on each training sample set and corresponding model parameters through a model interpretation tool, and rank 1 to rank 3 are obtained, and 3 corresponding recall ratios, recall ratio 1 to recall ratio 3 are obtained through executing step S212 in parallel. Finally, in step S214, the 3 recall ratios are averaged to serve as an evaluation score for the model interpretation tool.

It is to be understood that, although 3 recall ratios respectively corresponding to 3 sets of n selected features are obtained in a parallel execution manner in fig. 3, the embodiment of the present specification is not limited thereto, and for example, 3 recall ratios may be obtained successively in a loop execution manner.

The obtained multiple recall ratios are averaged, so that the accuracy of the model interpretation tool is reflected by the obtained average recall ratio as an evaluation score, and the model interpretation tool has higher universality and objectivity, namely, the average recall ratio is relative to multiple characteristics and is not relative to certain specific characteristics. After obtaining the evaluation scores of the respective model interpretation tools by the method, an appropriate model interpretation tool may be determined based on the evaluation scores for model interpretation of the black box model. Therefore, the importance of each feature in the black box model can be better explained by the selected better model explanation tool.

Fig. 4 shows an apparatus 400 for evaluating a model interpretation tool according to an embodiment of the present specification, the apparatus being deployed based on a first model and a plurality of first training samples pre-acquired for the first model, wherein each of the first training samples includes feature values of a plurality of features of a business object, the apparatus including:

an extracting unit 41 configured to extract n features from the plurality of features as n selected features;

a replacing unit 42 configured to replace feature values of features other than the n selected features in each of the first training samples with other values to obtain a plurality of second training samples;

a training unit 43 configured to train the first model using the plurality of second training samples to obtain a first model having a first parameter set;

a ranking unit 44 configured to obtain an importance ranking of the plurality of features by a model interpretation tool based on the plurality of second training samples and the first parameter set;

a determining unit 45 configured to determine recall ratios of the top n features of the importance ranking with respect to the n selected features for evaluating the model interpretation tools.

In one embodiment, the selecting unit 41 is further configured to randomly select n features from the plurality of features as the n selected features.

In one embodiment, the replacing unit 42 is further configured to replace feature values of features other than the n selected features in each of the first training samples with other values determined randomly.

In an embodiment, the apparatus is deployed a plurality of times to obtain a plurality of recall rates, wherein in each deployment of the apparatus the n selected features have a different combination of features than the respective sets of n selected features corresponding to the other respective deployments, wherein the apparatus further comprises an averaging unit 46 configured to obtain an average recall rate based on the plurality of recall rates for evaluating the model interpretation tool.

By the scheme of evaluating the model interpretation tool according to the embodiment of the specification, the training and learning of the black box model are performed by using the data set of the alternative features, and the recall ratio of the first n features given by the model interpretation tool relative to the selected n features is counted, that is, the true feature ratio in the first n features is determined, so that the higher the recall ratio is, the more reasonable and more reliable the interpretation given by the model interpretation tool is. In addition, in order to avoid overfitting explained for a certain feature combination, a certain number of feature combinations are traversed, and the explained results are averaged, so that a final relatively objective evaluation index is obtained. For example, in a case that the black box model is a risk control model for a platform user, the scheme of the evaluation model interpretation tool according to the embodiment of the present specification is applicable to a big data scene of a plurality of features of a plurality of users, and a better model interpretation tool is selected through the scheme, so that a more reliable interpretation can be given for a prediction result of the risk control model, and thus, the judgment of the risk degree of the user can be facilitated.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

It will be further appreciated by those of ordinary skill in the art that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application. The software modules may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method of evaluating a model interpretation tool, the method being performed based on a first model and a plurality of first training samples previously acquired for the first model, wherein each of the first training samples comprises feature values of a plurality of features of a business object, the method comprising:

selecting n features from the plurality of features as n selected features;

2. The method of claim 1, wherein the first model is a non-self-explanatory model.

3. The method of claim 1, wherein selecting n features from the plurality of features as n selected features comprises randomly selecting n features from the plurality of features as the n selected features.

4. The method of claim 1, wherein replacing the feature values of the features other than the n selected features in each of the first training samples with other values comprises replacing the feature values of the features other than the n selected features in each of the first training samples with other values that are randomly determined.

5. The method of claim 1, wherein the method is performed a plurality of times to obtain a plurality of recall rates, wherein the n selected features have different feature combinations from respective sets of n selected features corresponding to other respective executions in each execution of the method, the method further comprising, after obtaining a plurality of recall rates, obtaining an average recall rate based on the plurality of recall rates for evaluating the model interpretation tool.

6. The method of claim 1, wherein the business object is one or more of the following objects in a network platform: user, merchant, commodity, transaction.

7. The method of claim 6, wherein the business object is a platform user, each training sample includes a risk value of the user as a label value, the first model is for being trained as a risk control model based on the plurality of first training samples.

8. An apparatus for evaluating a model interpretation tool, the apparatus being deployed based on a first model and a plurality of first training samples pre-acquired for the first model, wherein each of the first training samples comprises feature values of a plurality of features of a business object, the apparatus comprising:

9. The apparatus of claim 8, wherein the first model is a non-self-explanatory model.

10. The apparatus of claim 8, wherein the selecting unit is further configured to randomly select n features from the plurality of features as the n selected features.

11. The apparatus according to claim 8, wherein the replacement unit is further configured to replace feature values of features other than the n selected features in each of the first training samples with other values determined randomly.

12. The apparatus of claim 8, wherein the apparatus is deployed a plurality of times to obtain a plurality of recall rates, wherein the n selected features have different feature combinations with respective sets of n selected features corresponding to other respective deployments in each deployment of the apparatus, the apparatus further comprising an averaging unit configured to, after obtaining a plurality of recall rates, obtain an average recall rate based on the plurality of recall rates for evaluating the model interpretation tool.

13. The apparatus of claim 8, wherein the business object is one or more of the following objects in a network platform: user, merchant, commodity, transaction.

14. The apparatus of claim 13, wherein the business object is a platform user, each training sample includes a risk value of the user as a label value, the first model is to be trained as a risk control model based on the plurality of first training samples.

15. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-7.

16. A computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, implements the method of any of claims 1-7.