CN112154459A - Model interpretation - Google Patents

Model interpretation Download PDF

Info

Publication number
CN112154459A
CN112154459A CN201980026981.8A CN201980026981A CN112154459A CN 112154459 A CN112154459 A CN 112154459A CN 201980026981 A CN201980026981 A CN 201980026981A CN 112154459 A CN112154459 A CN 112154459A
Authority
CN
China
Prior art keywords
model
linear
machine learning
feature
proxy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201980026981.8A
Other languages
Chinese (zh)
Inventor
M·陈
N·吉尔
P·霍尔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
H2oAi Inc
Original Assignee
H2oAi Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US15/959,040 external-priority patent/US11922283B2/en
Priority claimed from US15/959,030 external-priority patent/US11386342B2/en
Application filed by H2oAi Inc filed Critical H2oAi Inc
Publication of CN112154459A publication Critical patent/CN112154459A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/045Explanation of inference; Explainable artificial intelligence [XAI]; Interpretable artificial intelligence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Input data associated with a machine learning model is classified into a plurality of clusters. A plurality of linear proxy models are generated. One of the plurality of linear proxy models corresponds to one of the plurality of clusters. The linear proxy model is configured to output a corresponding prediction based on input data associated with the corresponding cluster. The prediction data associated with the machine learning model and the prediction data associated with the plurality of linear proxy models are output.

Description

Model interpretation
Background
Machine learning is a field of computer science that gives computers the ability to learn without explicit programming. A machine learning model may be trained to implement a complex function that makes one or more predictions based on a set of inputs. The set of inputs consists of a plurality of entries. Each entry is associated with one or more features having corresponding feature values. Once trained, the machine learning model behaves like a black box: it receives a set of inputs, applies the set of inputs to the complex function, and outputs one or more predictions.
Drawings
Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings.
FIG. 1 is a block diagram illustrating an embodiment of a system for machine learning model interpretation.
FIG. 2A is an example of a diagram illustrating an embodiment of input data.
FIG. 2B is an example of a diagram illustrating an embodiment of input data ranked based on predictive tags.
FIG. 3 is a diagram illustrating an embodiment of the output of a linear proxy model.
FIG. 4A is a flow diagram illustrating an embodiment of a process for providing a linear proxy model.
FIG. 4B is a flow diagram illustrating an embodiment of a process for providing predictions.
FIG. 5 is a diagram illustrating an embodiment of a non-linear proxy model.
FIG. 6 is a flow diagram illustrating an embodiment of a process for providing a non-linear proxy model.
FIG. 7 is a diagram illustrating an embodiment of a non-linear proxy model.
FIG. 8 is a flow diagram illustrating an embodiment of a process for providing a proxy non-linear model.
FIG. 9 is a diagram illustrating an embodiment of a non-linear proxy model.
FIG. 10 is a flow diagram illustrating an embodiment of a process for providing a non-linear model.
FIG. 11 is a diagram illustrating an embodiment of an instrument panel.
FIG. 12 is a flow diagram illustrating an embodiment of a process for commissioning a machine learning model.
Detailed Description
The invention can be implemented in numerous ways, including as a process; a device; a system; the composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term "processor" refers to one or more devices, circuits, and/or processing cores configured to process data (such as computer program instructions).
A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.
A machine learning model interpretation technique is disclosed. The machine learning model is configured to provide one or more predictions based on a set of inputs, however, it is not clear how the machine learning model derives its decisions. The machine learning model is typically proprietary software of the company and the user must obtain a license to use the software. As a result, the user is forced to purchase a license from the company.
The machine learning model may be limited by the type of information output to the user. The machine learning model may output the prediction, but may not provide one or more reasons for the machine learning model to make the prediction. For example, the machine learning model may not output an identification of one or more input features that affect the prediction of the machine learning model.
The machine learning model may be approximated as a linear proxy model and/or one or more non-linear proxy models. Proxy models are a data mining and engineering technique in which a simpler model is typically used to interpret another, typically more complex model or phenomenon. The proxy model may reduce the number of computations and the time required for the computer to output the predictions. The reduction in computation and time frees up computer resources, which allows the computer to perform other tasks and/or make other predictions. The linear proxy model may beKThe LIME proxy model. The non-linear proxy model may be a decision tree proxy model, a feature importance proxy model, and/or a partially dependent proxy model. The proxy model may not only provide predictions that are similar to those made by the machine learning model, but may also provide one or more reasons that describe the proxy model's decisions made.
Because of the complexity of machine learning models, none of the models can itself be trusted to accurately approximate a machine learning model. However, a combination of a linear proxy model and one or more non-linear proxy models may provide confidence in the approximation. In some cases, the output of the linear proxy model may closely match the output of the machine learning model, but the output of the linear proxy model may conflict with the output of the at least one non-linear proxy model. In other cases, the outputs of the linear proxy models may conflict with the outputs of the machine learning models, but the outputs of the one or more non-linear proxy models closely match the outputs of the machine learning models. In other cases, the output of the linear proxy model or the output of one or more non-linear proxy models do not closely match the output of the machine learning model. In these cases, neither the linear proxy model nor one or more non-linear proxy models can be used to interpret the machine learning model. As a result, it may be necessary to modify at least one of the linear proxy model or one or more non-linear proxy models, or even the machine learning model itself.
However, where the output of the linear proxy model closely matches the output of the machine learning model, and the output of the non-linear proxy model closely matches the output of the machine learning model, the combination of the linear proxy model and the one or more non-linear proxy models may be trusted to accurately interpret the machine learning model of interest.
This is an improvement over the field of machine learning in that a combination of a linear proxy model and one or more non-linear proxy models can be used to accurately approximate a machine learning model that implements a complex function to make one or more predictions based on a set of inputs. The combination of a linear proxy model and one or more non-linear proxy models may reduce the number of computations and the time required for a computer to make a prediction compared to the number of computations and the time required for a computer implementing a machine learning model to make a prediction. The combination of a linear proxy model and one or more non-linear proxy models provides transparency to the machine learning model. The linear proxy model and one or more non-linear proxy models allow for the debugging of the underlying machine learning model itself.
FIG. 1 is a block diagram illustrating an embodiment of a system for machine learning model interpretation. In the illustrated example, the system 100 includes a complex model server 102, a network 105, a proxy model server 112, and a client device 122.
The complex model server 102 includes: machine learning model 104, training data 106, model prediction data 107, and actual outcome data 108. The machine learning model 104, the training data 106, the model prediction data 107, and the actual outcome data 108 may be stored in a memory and/or storage device (not shown) of the complex model server 102. The complex model server 102 may include one or more processors, one or more memories (e.g., random access memory), and one or more storage devices (e.g., read-only memory).
The machine learning model 104 is configured to implement one or more machine learning algorithms (e.g., decision trees, naive bayes classification, least squares regression, logistic regression, support vector machines, neural networks, deep learning, etc.). The machine learning model 104 may be trained using training data, such as training data 116. Once trained, the machine learning model 104 is configured to output predictive labels (such as model predictive data 107) based on input entries that include one or more features and corresponding feature values.
The training data 106 includes a plurality of entries. Each entry is associated with one or more features having corresponding feature values.
The model prediction data 107 includes predictions made by the machine learning model 104. The model prediction data 107 may include a probability that a particular outcome has been predicted by the machine learning model. Model prediction data 107 may include: a prediction tag (e.g., a predicted value) for a particular prediction.
The actual results data 108 includes real world results data. For example, the machine learning model 104 may be trained to predict the probability of a particular outcome given input data that includes multiple entries. The actual results data 108 includes real-world results for entries associated with a plurality of features and corresponding feature values.
The network 105 may be a local area network, a wide area network, a wired network, a wireless network, the internet, an intranet, or any other suitable communication network.
The proxy model server 112 includes: a linear proxy model 114, one or more proxy nonlinear models 115, training data 116, model prediction data 117, and actual outcome data 118. The linear proxy model 114, the one or more proxy non-linear models 115, the training data 116, the model prediction data 117, and the actual results data 118 may be stored in a memory and/or storage device (not shown) of the complex model server 112.
Proxy model server 112 is configured to implement one or more proxy models. Proxy models are a data mining and engineering technique in which a simpler model is typically used to interpret another, typically more complex model or phenomenon. The proxy model 112 may receive training data 106, model prediction data 107, and actual outcome data 108 from the complex model server 102 via the network 105 and be stored as training data 116, model prediction data 117, and actual outcome data 118, respectively. Using training data 116, model prediction data 117, and actual outcome data 118, proxy model server 122 may train one or more proxy models to make one or more predictions. The one or more agent models are agents of the machine learning model 104. Giving a learning functiong(e.g., machine learning model 104) and a set of predictions (e.g., model prediction 107),
Figure DEST_PATH_IMAGE001
agent modelhCan be trained so that
Figure DEST_PATH_IMAGE003
So that
Figure 623928DEST_PATH_IMAGE004
. Proxy modelhWhich may be a linear model or a non-linear model.
Linear model
The linear proxy model 114 may beK-The LIME proxy model. By usingKLIME, local Generalized Linear Model (GLM) agent is used to interpret the prediction of complex response functions, and local regions are defined byKClusters or user-defined segments, rather than being defined by simulated, perturbed observation samples.
For each cluster, training a local
Figure DEST_PATH_IMAGE005
. The input data may be classified into a plurality of clusters using a clustering technique such as k-means clustering. Can selectKSo that predictions from all local GLM models will be such thatR 2 And (4) maximizing. This can be mathematically summarized as follows:
Figure 213173DEST_PATH_IMAGE006
KLIME can also train a Global proxy GLM over the entire input training data seth global Such as training data 106 and global model predictiong(X)Such as model prediction data 107. In some embodiments, if given the secondkA cluster having less than a threshold number of members (e.g., 20), thenh global Is used as a linear proxy rather than as a linear proxyh GLM,k . In some embodiments, from all agentsKIntercept, coefficient, of the LIME model (including the global surrogate),R 2 The values, accuracy and predictions can be used to debug and improvegOf (4) is low.
Can be selected fromKThe LIME generates one or more reason codes and corresponding values. The reason code corresponds to the input feature. The reason code value may be as follows
Figure DEST_PATH_IMAGE007
Providing an approximately local linear contribution of the feature. Reason codes are a powerful tool for accountability and fairness because they are for each
Figure 432058DEST_PATH_IMAGE007
The interpretation is provided so that the user can understand the target
Figure 499371DEST_PATH_IMAGE008
Approximate magnitude and direction of local contribution of input features of. In thatKIn LIME, the cause code value can be calculated by determining each coefficient feature product. The reason code may also be written into an automatically generated reason code.
For theh GLM,k And observation ofx (i)
Figure DEST_PATH_IMAGE009
By mixingKThe LIME prediction is decomposed into individual coefficient feature products,
Figure 251426DEST_PATH_IMAGE010
the local linear contribution of the feature may be determined. This coefficient feature product is called a cause code value and is used for each
Figure DEST_PATH_IMAGE011
A reason code is created.
KLIME provides interpretability on several scales: (1) the coefficients of the global GLM agent provide information about the global average trend, (2) the coefficients of the GLM agent within a segment show the average trend in the local area, and (3) when evaluating the observation results within a particular segment,Kthe LIME provides a reason code on a per observation basis.KLIME can increase transparency by revealing input features and their linear trend.KLIME can enforce accountability by creating an explanation for each observation in the dataset. When the important features and their linear trend around a particular record are in line with domain knowledge and reasonable expectations,KLIME may enhance trust and fairness.
Non-linear model
The one or more proxy nonlinear models 115 may include a feature importance model, a decision tree model, a partial dependency graph, and/or any other nonlinear model.
The feature importance model measures the impact of features of a set of inputs on the prediction of the model. The features may have a global feature importance and a local feature importance. Global feature importance measures the overall impact of input features on model prediction, while accounting for non-linearities and interactions. The global feature importance value gives an indication of the magnitude of the contribution of the model predicted features to all observations. Local feature importance describes how the combination of learned model rules or parameters and attributes of individual observations affect the model prediction of that observation, while validating non-linearities and interactions.
The feature importance model may include: random forest agent modelh RF Which comprisesBA decision treeh tree,b . The random forest agent model is a global interpretability measure. For example,h RF can be expressed as:
Figure 11572DEST_PATH_IMAGE012
wherein
Figure 13026DEST_PATH_IMAGE013
Is each treeh tree,b A set of split rules. In each treeh tree,b At each split of (a), the improvement in the split criteria is an importance measure attributed to the split feature. The significance signature is accumulated over all trees separately for each signature. The aggregated feature importance values may be scaled between 0 and 1 such that the importance value of the most important feature is 1.
The feature importance model may include leave-one-covariate-out (LOCO). LOCO feature importance is a local interpretability measure. LOCO provides a method for on a per-view basisx (i) For any modelgMechanism to compute importance values by prediction of observations of the data from the model (input features not of interest)X j ) Subtraction model observations prediction of datag(x (i) )To proceed with the above-mentioned process,
Figure DEST_PATH_IMAGE014
. LOCO is an idea independent of model, and
Figure 743084DEST_PATH_IMAGE015
the calculation may be performed in various ways. In some embodiments of the present invention, the,
Figure DEST_PATH_IMAGE016
calculated using model-specific techniques, by using a random forest proxy modelh RF To approximate tog(x (i) )Contribution of (1)X j . Predicted contribution of any rule
Figure 582602DEST_PATH_IMAGE017
(including targeting treesh tree,b Is/are as followsX j ) Is predicted from the original
Figure 310387DEST_PATH_IMAGE018
And (4) subtracting. For random forests:
Figure DEST_PATH_IMAGE019
wherein
Figure 595875DEST_PATH_IMAGE020
Is each treeh tree,b Wherein the removal involvesX j All of the rules of (a). In some embodiments, the LOCO feature importance value is scaled between 0 and 1 such that observations on the data are made for direct comparison versus local comparison to the importance of random forest featuresx (i) The most important feature in terms of importance has an importance value of 1.
The importance of random forest features increases transparency by reporting and ranking influential input features. The importance of the LOCO features strengthens accountability by creating an interpretation for each model prediction. The importance of both global and local features enhances trust and fairness when the reported values conform to domain knowledge and reasonable expectations.
Decision tree modelh tree Can be generated to approximate a learned functiong(e.g., machine learning model 104).h tree Is used to pass displaygTo increase by approximating a flow chart of a decision processgThe transparency of (2).h tree And also showgMost likely important features and most important interactions.h tree Can be used for visualization, verification and debugginggThis is done by comparing the displayed decision process, important features and important interactions with known standards, domain knowledge and reasonable expectations.
The partial dependency graph may show how the machine-learned response function varies based on the value of the input feature of interest, while taking into account non-linearities and averaging the effects of all other input features.
For thePDimensional feature space, considering individual features
Figure DEST_PATH_IMAGE021
And its complement
Figure 739411DEST_PATH_IMAGE022
(i.e., the amount of the acid,
Figure DEST_PATH_IMAGE023
). Function(s)gTo pairX j Is a marginal expectation:
Figure 793955DEST_PATH_IMAGE024
in recall that the above-described embodiment of the present invention,
Figure 895903DEST_PATH_IMAGE022
marginal expectation pair onX (-j) The values of (c) are summed. One-dimensional partial dependence can be expressed as:
Figure DEST_PATH_IMAGE025
Given characteristicX j Is a response functiongSet given characteristicsX j =x j And all other existing feature vectors of the complement are used
Figure 668687DEST_PATH_IMAGE026
(as they exist in the dataset). The partial dependency graph shows partial dependencies as feature subsetsX i Is a function of the particular value of (a). The partial dependency graph enables improved transparency of g and enables verification and debugginggBy comparing the average prediction of a feature over its field with known standards and reasonable expectations.
In some embodiments, the partial dependency graph includes an Individual Condition Expectation (ICE) graph. ICE isNDecomposed partial dependencies of individual responses
Figure 678231DEST_PATH_IMAGE027
(for a single feature)X j ) Instead of averaging the responses of all observations across the training set. Single observation resultx (i) By plotting the ICE diagrams
Figure DEST_PATH_IMAGE028
Comparison of
Figure 728227DEST_PATH_IMAGE029
Are fixed simultaneously
Figure DEST_PATH_IMAGE030
To be created. ICE graphs may allow individual observations of predictive data
Figure 594552DEST_PATH_IMAGE031
To determine dataWhether the individual observations are outside of one standard deviation from the average model behavior represented by the partial correlation. ICE graphs may also allow for individual observations of predictive data
Figure DEST_PATH_IMAGE032
To determine whether processing of a particular observation is effective as compared to average model behavior, known criteria, domain knowledge, and/or reasonable expectations.
The training data 116 includes: data used to train the linear proxy model 114 and/or one or more non-linear proxy models 115. The training data 116 may include at least a portion of the training data 106. The training data 116 includes a plurality of entries. Each entry is associated with one or more features having corresponding values and associated actual results.
Model prediction data 117 includes predictions made by machine learning model 104, predictions made by linear proxy model 114, and predictions made by one or more non-linear proxy models 115. The model prediction data 117 may include prediction labels that the machine learning model 104 has predicted (e.g., probabilities of particular outcomes, predicted values ± offset values, etc.), prediction labels that the linear proxy model 114 has predicted, and prediction labels that the one or more non-linear proxy models 115 have predicted.
The actual results data 118 includes real world results data. For example, the machine learning model 104, the linear proxy model 114, and the one or more non-linear proxy models 115 may be trained to predict the probability of a particular outcome given a set of inputs. The actual results data 118 includes: the real-world result given the set of inputs (e.g., whether a particular result occurred or did not occur).
The client device 122 may be a computer, laptop, mobile device, tablet device, or the like. The client device 122 includes: an application 124 associated with the proxy model server 112. The application 124 is configured to display, via the graphical user interface 126, one or more graphics depicting at least one of the linear proxy model 114 and the one or more non-linear proxy models 115.
In some embodiments, the graphical user interface 126 is configured to receive a selection of points (e.g., observations) shown in the linear proxy model. In response to the selection, the application 124 is configured to dynamically update one or more non-linear proxy models associated with the linear proxy model and to dynamically update a display of the one or more non-linear proxy models. Application 124 is also configured to provide an indication of the received selection to proxy model server 112. In response to the indication, the linear proxy model may be configured to provide one or more reason codes and corresponding reason code values to the application 124. In response to the indication, the non-linear proxy model may be configured to provide the selected point with one or more important features. In response to the indication, the non-linear proxy model may be configured to highlight the decision tree path associated with the selected point.
FIG. 2A is an example of a diagram illustrating an embodiment of input data. The input data includes: training data, validation data, model prediction data, and actual outcome data. In the illustrated example, the input data 200 may be implemented by a system such as the complex model server 102 or the proxy model server 112.
In the example shown, input data 200 includes entry A1、A2…An. Each entry includes one or more features having corresponding feature values. For example, item A1Including having a corresponding characteristic value X1、Y1...Z1Characteristic F of1、F2…Fn. Item A2Including having a corresponding characteristic value X2、Y2...Z2Characteristic F of1、F2…Fn. Item AnIncluding having a corresponding characteristic value Xn、Yn...ZnCharacteristic F of1、F2...Fn. In some embodiments, the feature value may correspond to the actual value of the feature (e.g., temperature =98 °). In other embodiments, the characteristic value may correspond to one of a range of values (e.g., a value of "2" indicates a temperature range of 20-40). In other embodimentsIn an embodiment, the characteristic value may correspond to one of the possible non-numeric values (e.g., "0" ═ male, "1" ═ female). In other embodiments, the characteristic value may be a string.
Models such as machine learning model 104, linear proxy model 114, or proxy nonlinear model(s) 115 may carry out predictions based on an item, a feature associated with the item, and a corresponding feature value. For example, the model may be based on feature F1、F2...FnAnd its corresponding characteristic value X1、Y1...Z1To output A1Predicted tag P of1. The models may be respectively item A1、A2…AnEach of which outputs a prediction P1、P2…Pn. The prediction tag may be the probability of a particular outcome, a predicted value plus an offset range, a predicted value plus a confidence level, etc.
Input data 200 may include actual result data, e.g., whether a particular result occurred, the actual value of the output variable, etc. A value of 1 may indicate that a particular result occurred. A value of 0 may indicate that no particular result occurred. In other embodiments, a value of 1 indicates that no particular output occurred, and a value of 0 indicates that a particular result occurred.
In some embodiments, a model, such as machine learning model 104, linear proxy model 114, or proxy non-linear model(s) 115, may predict that a particular outcome (e.g., greater than or equal to a prediction threshold) is to occur and that a particular outcome (e.g., a value of 1) actually occurs. In some embodiments, a model, such as machine learning model 104, linear proxy model 114, or proxy non-linear model(s) 115, may predict that a particular outcome is to occur (e.g., greater than or equal to a prediction threshold) and that no particular outcome actually occurs (e.g., a value of 0). In some embodiments, a model, such as machine learning model 104, linear proxy model 114, or proxy non-linear model(s) 115, may predict that a particular outcome will not occur (e.g., less than a prediction threshold) and that a particular outcome actually occurs (e.g., a value of 1). In some embodiments, a model, such as machine learning model 104, linear proxy model 114, or proxy non-linear model(s) 115, may predict that no particular result will occur (e.g., less than a prediction threshold) and that no particular result actually occurs (e.g., a value of 0).
FIG. 2B is an example of a diagram illustrating an embodiment of input data ranked based on predictive tags. In the illustrated example, the ranked training data 250 may be implemented by a system such as the complex model server 102 or the proxy model server 112.
In the example shown, input data 250 includes entry A1、A20…A2. The entries of input data 250 are the same as the entries of input data 200, but are ranked based on the predicted tags. The predictive label may be a probability of a particular outcome. In some embodiments, entries are ranked from the lowest predicted tag to the highest predicted tag. In some embodiments, the entries are ranked from the highest predicted tag to the lowest predicted tag.
FIG. 3 is a diagram illustrating an embodiment of the output of a linear proxy model. The linear model graph 300 may be implemented by a system such as the proxy model server 112. The linear model graph 300 may represent the output of a linear model, such as the linear proxy model 114. The linear proxy model 114 is a proxy model with a more complex function, such as the machine learning model 104.
The linear model graph 300 plots the predicted label associated with an entry versus the ranked predictions. The y-axis of the linear model graph 300 indicates scores derived by a model, such as the machine learning model 104 or the linear proxy model 114. The x-axis of the linear model graph 300 indicates the prediction level associated with a set of inputs. The set of entries is ranked based on the predicted tags and sequentially rendered. For example, fig. 2B depicts a set of entries ranked based on corresponding predicted tags. The entries included in input data 250 will be drawn in the following order: a. the1、A20… and A2
The linear model graph 300 includes: line 301, representing a relationship with a set of inputsThe inputs are determined by a machine learning model, such as machine learning model 104. For example, line 301 may be the prediction P of input data 2501、P20…P2Is used for plotting (a). The predicted values associated with the line 301 may be determined by a machine learning algorithm (e.g., decision tree, naive bayes classification, least squares regression, logistic regression, support vector machine, neural network, deep learning, etc.).
The linear model graph 300 includes: a series of observations, e.g., white dots 302, 305, represent predictive labels associated with a set of entries determined by a linear model, such as linear proxy model 114. In some embodiments, the observations are associated with a global agent model. The observations may represent the predicted labels of the global proxy model for a particular entry. In other embodiments, the observations are associated with a local linear model.
The predictive label associated with each observation may be determined by the K-LIME model. The linear proxy model 114 may include a plurality of local linear models. A group of entries may be classified into one or more clusters using one or more techniques (e.g., k-means clustering). Each cluster represents a subset of entries that are similar to each other. The entries may be associated with the clusters based on a distance between the entry and a cluster centroid. An entry is associated with a cluster if the distance of the entry from the cluster centroid is less than or equal to a threshold distance. An entry is associated with a different cluster if the entry is more than a threshold distance from the cluster centroid. A local linear model may be generated for each cluster. The entries associated with a particular cluster may be used to train a cluster local linear model. For example, for a set of entries classified as 11 clusters, each of the 11 clusters may have a corresponding local linear model. Each local linear model is configured to make predictions for a subset of entries included in the cluster. The local linear model is configured to make predictions based on one or more features of the item and corresponding feature values. For example, assume that white point 302 is part of a first cluster and white point 305 is part of a second cluster. The first local linear model may be configured to generate a prediction for white point 302 based on one or more characteristics of white point 302 and corresponding characteristic values, and the second local linear model may be configured to generate a prediction for white point 305 based on one or more characteristics of white point 305 and corresponding characteristic values.
In some embodiments, an entry is added to a cluster (e.g., production data) by determining the cluster centroid that is closest to the entry. The entries and cluster centroids have a particular location in the feature space. For example, an entry includes a plurality of features and corresponding feature values. Entry positions in feature space can be represented as vectors, e.g., { X }1,Y1…Z1}. The closest cluster may be determined by calculating the distance between the entry in the feature space and the cluster centroid in the feature space. The closest cluster corresponds to a local linear model with one or more associated model parameters. After determining the closest centroid cluster, a predictive label for the input may be determined by inputting feature values associated with the features to the local linear model corresponding to the closest centroid cluster.
The linear model graph 300 includes a set of actual result data, e.g., black dots 303, 304. The black dots 303 indicate that a particular result actually occurred for an entry having a set of features and corresponding feature values. The black dots 304 indicate that no particular result occurred for an entry having a set of features and corresponding feature values.
In some embodiments, the machine learning model predictions are related to actual outcome data. For example, point 308 on line 301 indicates that a particular result is likely to occur (predictive tag ≈ 0.75), and black point 307 indicates that the particular result actually occurs.
In some embodiments, the machine learning model predictions are not correlated with the actual outcome data. For example, point 309 on line 301 indicates that a particular result is unlikely to occur (predictive label ≈ 0.2), and black point 306 indicates that a particular result actually occurs.
Each observation point (i.e., white point) has a corresponding black point. For example, white point 302 has a corresponding black point 306. In some embodiments, the global agent model is related to actual outcome data. In some embodiments, the global agent model is not related to the actual result data. In some embodiments, the local linear model prediction is related to the actual outcome data. In some embodiments, the local linear model prediction is not correlated with the actual outcome data.
Each observation point may be selected. In response to being selected, one or more reason codes and corresponding reason code values may be displayed. The reason code corresponds to the feature. The reason code value corresponds to the amount of contribution (e.g., weight) of the feature to the prediction label of the local model for the observation point (input point). The linear proxy model may determine the cause code and corresponding cause code value for a particular observation point. The sum of the reason code values may be equal to the prediction tag. In some embodiments, instead of displaying all of the reason codes and corresponding reason code values for a particular observation point, the top reason code (e.g., the top 5 reason codes), i.e., the most influential feature, is displayed. For example, white point 302 has a prediction tag of approximately 0.3. The uppermost reason code "F1", "F18", "F3", "F50", "F34" and the corresponding reason code value may be displayed. In other embodiments, selecting a viewpoint may cause all reason codes and corresponding reason code values for the selected viewpoint to be displayed.
FIG. 4A is a flow diagram illustrating an embodiment of a process for providing a linear proxy model. In the example shown, process 400 may be implemented by a system such as proxy model server 112.
At 402, data associated with a machine learning model is received. The data may include training data used to train the machine learning model. The data may include predictive data of a machine learning model associated with an entry of training data. The data may include actual result data associated with entries having one or more features with corresponding feature values, i.e., whether a particular result actually occurred.
At 404, data associated with the machine learning model is classified into a plurality of clusters. The data may be classified into a plurality of clusters using one or more techniques (e.g., k-means clustering). Each cluster represents a subset of items that are similar to each other. The cluster includes a plurality of entries. Each entry includes one or more features having corresponding feature values. Each entry has a corresponding position in feature space, e.g., (F)1,F2…Fn). In some embodiments, a cluster is determined based on one or more entries within a threshold distance from a point in the feature space (e.g., a cluster centroid).
At 406, a model is created. In some embodiments, a global proxy model is created based on input data. In other embodiments, a separate linear model is created for each cluster. Each linear model is configured to output a predictive tag. For example, a linear model may determine the prediction P1The prediction P1Indicating at a given entry a1The probability of whether a particular result will occur, the entry A1Including having a corresponding characteristic value X1、Y1…Z1Characteristic F of1、F2…Fn
At 408, the entries are ranked based on model predictions. In some embodiments, the entries are ranked based on predictions made by a machine learning model (such as machine learning model 104). In other embodiments, the entries are ranked based on predictions made by a linear proxy model (such as linear proxy model 114). In some embodiments, entries are ranked from the lowest predicted tag to the highest predicted tag. In some embodiments, the entries are ranked from the highest predicted tag to the lowest predicted tag.
At 410, a linear model map, such as linear model map 300, is provided. In some embodiments, the linear model graph is provided to the client device from a proxy model server via a network. The client device may display the linear model graph via an application running on the client device.
At 412, a selection of observation points included in the linear model map is received. For example, the client device may receive a selection of points, such as white point 302, via the GUI. One or more non-linear model maps may be updated based on the selected points.
At 414, one or more reason codes are provided. The one or more reason codes include: a set of features that predominantly cause an entry to have a corresponding prediction tag. For example, a series of reason codes may be provided to indicate why the white point 302 has a prediction tag of 0.3. Each reason code has a corresponding reason code value that indicates a contribution to the prediction tag. The cumulative contribution of the reason code is equal to the prediction tag.
FIG. 4B is a flow diagram illustrating an embodiment of a process for providing predictions. Process 450 may be implemented by a system such as proxy model server 112.
At 452, production data is received. The production data includes one or more entries. Each entry is associated with one or more features having corresponding feature values. One or more entries of the production data do not include a corresponding prediction tag.
At 454, a closest cluster is determined for each entry of production data. The entry of the production data includes a plurality of characteristic values. The feature values correspond to positions in the feature space. The cluster centroids of the clusters have corresponding positions in the feature space. The closest centroid is determined for each item of production data. The closest centroid can be determined by calculating the distance between the location of the entry in the feature space and the location of the cluster centroid in the feature space.
At 456, a linear proxy model for each item of production data is determined. Each cluster has a corresponding linear proxy model.
At 458, one or more entries of production data are applied to the corresponding linear proxy model. For example, a first entry of production data may be applied to a first linear proxy model corresponding to a first cluster, and a second entry of production data may be applied to a second linear proxy model corresponding to a second cluster.
At 460, the prediction tag and the one or more reason codes are output. Each linear proxy model outputs a corresponding prediction tag. The prediction tag may be a probability, a predicted value ± offset value, etc. of a particular outcome. The reason code provides an explanation as to why the predictive tag has some output.
FIG. 5 is a diagram illustrating an embodiment of a non-linear proxy model. Non-linear model graph 500 may be implemented by a system such as proxy model server 112. The non-linear model graph 500 may represent the output of a non-linear proxy model, such as one of the non-linear proxy models 115. The non-linear proxy model 115 is a proxy model with a more complex function, such as the machine learning model 104.
The non-linear model graph 500 illustrates the feature importance of one or more features. Feature importance measures the impact of a feature on the prediction of a model. The non-linear model graph 500 includes: global feature importance and local feature importance for a particular feature. In some embodiments, the features are ordered in descending order from the global most important feature to the global least important feature.
Global feature importance measures the overall impact of features on model prediction, while accounting for non-linearities and interactions. The global feature importance value provides an indication of the magnitude of the contribution of the feature to the model prediction of all observations. For example, the global importance value may indicate the importance of a feature to the global proxy model, i.e., the importance of a feature to all entries. In some embodiments, the global feature importance value is equal to the number of times a feature is selected in a decision tree set (e.g., a global decision tree proxy model) to split a decision tree in the decision tree set. In some embodiments, the global feature importance value is scaled to a number between 0 and 1, such that the importance value of the most important feature is 1. In some embodiments, the global feature importance value is weighted based on the location of the feature in the decision tree. For example, a feature selected at the top of the decision tree for splitting may be weighted higher than another feature selected at the bottom of the decision tree for splitting. In some embodiments, the weight is a value between 0 and 1. A weight of approximately 1 indicates that the feature is selected at or near the top of the decision tree. A weight of approximately 0 indicates that the feature was not selected for the branch of the decision tree or was not selected at or near the bottom of the decision tree. In some embodiments, the weight is a value greater than 1.
Local feature importance describes how the combination of learned model rules or parameters and attributes of individual observations affect the model prediction of that observation, while validating non-linearities and interactions. For example, the local feature importance may indicate the importance of a feature associated with an entry (e.g., a viewpoint) for the global proxy model, i.e., the importance of that feature to that particular entry. The local feature importance value may be determined by calculating a LOCO value for the feature. The entry includes a plurality of features. A first prediction is calculated using the plurality of features and a second prediction is calculated using the plurality of features minus one of the plurality of features. The second prediction is subtracted from the first prediction to determine the importance of the feature. A LOCO value is calculated for each feature of the plurality of features.
As seen in fig. 5, features "F1", "F18", "F3", "F50", "F34", and "F8" are described as the most important features for prediction. In some embodiments, the most important features are the most important features of the global proxy model. In other embodiments, the most important feature is the most important feature of the selected viewpoint. A global importance value and a local importance value are shown for each feature. For example, global importance values 502a, 504a, 506a, 508a, 510a, and 512a are shown for features "F1", "F18", "F3", "F50", "F34", and "F8", respectively. Local importance values 502b, 504b, 506b, 508b, 510b, and 512b are shown for features "F1", "F18", "F3", "F50", "F34", and "F8", respectively.
In some embodiments, the global importance value of a feature is correlated with the local importance value of the feature. For example, if the difference between two values is less than or equal to a threshold, the global importance value for the feature is correlated with the local importance value for the feature. If the difference between the two values is greater than a threshold, the global importance value for the feature is not correlated with the local importance value for the feature. An entry associated with a prediction may be marked if the global importance value of a feature and the local importance value of the feature are not related. In some embodiments, the feature importance model is studied to determine the reason why the model outputs such values. If a threshold number of entries are flagged, the non-linear model may be determined to be inaccurate and adjusted. For example, the global importance value 504a of feature "F18" is not correlated with the local importance value 504 b. This indicates that the non-linear model associated with the non-linear model graph 500 may need to be adjusted or that the feature importance model should be studied. In some embodiments, the listed features may indicate that a single feature dominates the prediction label associated with the prediction (e.g., the feature importance value is greater than the dominance score). For example, the associated importance value for feature F1 may be 0.98 (of 1.00). This may indicate data leakage associated with the prediction and indicate that the model may need to be adjusted or that the feature importance model should be studied. In response to such an indication, the model may be adjusted or studied.
FIG. 6 is a flow diagram illustrating an embodiment of a process for providing a non-linear proxy model. In the example shown, process 600 may be implemented by a system such as proxy model server 112.
At 602, a global importance value for a feature is determined. The global feature importance value may be equal to a number of times a feature is selected in the set of decision trees to split the decision trees in the set of decision trees. In some embodiments, the global feature importance value is scaled to a number between 0 and 1, such that the importance value of the most important feature is 1. In some embodiments, the global feature importance value is weighted based on the location of the feature in the decision tree. For example, a feature selected at the top of the decision tree for splitting may be weighted higher than another feature selected at the bottom of the decision tree for splitting.
At 604, a local importance value for the feature is determined. The local feature importance value may be determined by calculating a LOCO value for the feature. The entry includes a plurality of features. A first prediction is calculated using the plurality of features and a second prediction is calculated using the plurality of features minus one of the plurality of features. The second prediction is subtracted from the first prediction to determine the importance of the feature.
At 606, one or more of the most important features are ranked. In some embodiments, the one or more importance features are ranked based on the global importance value. In other embodiments, one or more importance features are ranked based on the local importance value. The highest number (e.g., top 5) of features or the highest percentage (top 10%) of features may be determined as the one or more most important features.
At 608, a visualization of a comparison between the determined global importance value and the determined local importance for the plurality of features is provided. In some embodiments, the comparison is provided for one or more of the most important features.
FIG. 7 is a diagram illustrating an embodiment of a non-linear proxy model. Non-linear model graph 700 may be implemented by a system such as proxy model server 112. The non-linear model graph 700 may represent the output of a non-linear proxy model, such as one of the non-linear proxy models 115. The non-linear proxy model 115 is a proxy model with a more complex function, such as the machine learning model 104.
The non-linear model graph 700 illustrates a decision tree proxy model. A complex decision tree set model may include hundreds of trees of varying degrees of complexity (e.g., thousands of levels). The decision tree proxy model is an approximation of a complex decision set tree model (e.g., a global decision tree proxy model) and includes a shallow decision tree (e.g., three levels).
The non-linear model graph 700 may indicate the most common decision path of the decision tree proxy model. The thickness of the most common decision path may be larger than the thickness of the other decision paths. For example, the paths between "F1", "F18", and "F2" are coarser than other decision paths. This indicates that the paths between "F1", "F18", and "F2" are the most common decision paths of the non-linear model graph 700. The non-linear model graph 700 may indicate the least common decision path of the decision tree proxy model. The thickness of the least common decision path may be thinner than the thickness of the other decision paths. For example, the path between "F18" and "F50" is more detailed than other decision paths. This indicates that the path between "F18" and "F50" is the least common decision path of the non-linear model graph 700. The width of a path of the decision tree proxy model may indicate the frequency with which the decision tree proxy model uses the path.
The non-linear model graph 700 may include: prediction labels associated with different paths associated with the decision tree proxy model. For example, a prediction tag of "0.136" is output for the entries having features F1, F18, and F2.
In some embodiments, when an observation (e.g., the white dot in fig. 3) is selected on a linear model graph (such as linear model graph 300), the non-linear model graph 700 may be updated to show the path of the observation through the decision tree proxy model.
FIG. 8 is a flow diagram illustrating an embodiment of a process for providing a proxy non-linear model. In the example shown, process 800 may be implemented by a system such as proxy model server 112.
At 802, a decision tree proxy model is generated. A complex decision tree model may include hundreds of trees of varying degrees of complexity (e.g., thousands of levels). The decision tree proxy model is an approximation of a complex decision tree model and includes a shallow decision tree (e.g., three levels).
At 804, an indication of a selection of observation points in a linear proxy model graph is received. The linear proxy model graph may map prediction labels of the linear proxy model and the machine learning model relative to the ranked predictions. The observation point is one of the predictions made by the linear proxy model.
At 806, the decision tree proxy model is updated based on the selected observation points. The decision tree proxy model may be updated to show the path of the selected observation point through the decision tree proxy model.
FIG. 9 is a diagram illustrating an embodiment of a non-linear proxy model. Non-linear model graph 900 may be implemented by a system such as proxy model server 112. The non-linear model graph 900 may represent the output of a non-linear proxy model, such as one of the non-linear proxy models 115. The non-linear proxy model 115 is a proxy model with a more complex function, such as the machine learning model 104.
The non-linear model graph 900 illustrates a partial dependency graph. The partial dependency graph determines the partial dependency of the prediction on the feature. The partial dependency graph is configured to modify feature values associated with the features to the same value for all entries, and determine a prediction label given the modified feature values. In some embodiments, average prediction labels are determined for different feature values. For example, non-linear plot 900 illustrates a white point, which may range in value from "-2" to "8". The white dots depict the average prediction labels for inputs with characteristic values. For example, white dot 904 indicates that for all inputs with a feature value of "2," the average prediction label for the particular feature is "0.6".
The non-linear model graph 900 illustrates the range of prediction labels (e.g., one standard deviation) for all entries having the same feature value. For example, range 902 indicates that when the feature value of a particular feature is "1," the model will typically output a prediction label between 0.1 and 0.4.
The non-linear model graph 900 illustrates the prediction tags of entries when the feature values are set to particular values. For example, black dot 904 indicates that when the feature value is set to "1" for a particular feature and a particular entry, the prediction tag is 0.2.
FIG. 10 is a flow diagram illustrating an embodiment of a process for providing a non-linear model. In the example shown, process 1000 may be implemented by a system such as proxy model server 122.
At 1002, an indication is received to modify a feature value associated with a feature to a particular value for all entries. At 1004, the features are modified to a particular value for all entries. The entry includes one or more features having corresponding feature values. The entry input data may indicate that the feature value of a particular feature varies for all entries. The input data may be modified so that the feature value of a particular feature is the same for all entries.
At 1004, an average predicted tag for entries having the same feature value is determined. The prediction labels of all entries having a specific feature having the same feature value are calculated and averaged. At 1006, a range (e.g., one standard deviation) of the prediction tags for the entries having the feature value is determined.
At 1008, a predictive tag for a single entry having a particular feature value is determined. The single entry may correspond to a selected observation point in the linear proxy model map.
In some embodiments, step 1002-1008 is repeated for all possible values of the particular feature. For example, the features depicted in FIG. 9 may have feature values of "-2" to "8". When the feature value is "-2", "-1" … … "8", step 1002-1008 may be repeated for it.
FIG. 11 is a diagram illustrating an embodiment of an instrument panel. In the illustrated example, dashboard 1100 may be implemented by a system such as proxy model server 122. Dashboard 1100 may be provided to client systems, such as client 122. Dashboard 1100 can include a graph of linear models and one or more graphs of non-linear models, or graphs based on an original machine learning model.
In the illustrated example, dashboard 1100 includes a K-LIME linear model graph, a feature significance graph, a proxy model decision tree, and a partial dependency graph. In some embodiments, a user selection of observations, such as white dots 1102, is received. In response to the selection, the feature significance graph, the proxy model decision tree, and the partial dependency graph may be updated.
For example, the feature importance map may be updated to depict the most important features. The most important features may be the most important features associated with the global proxy model. The most important features may be the most important features associated with the selected viewpoint. The proxy model decision tree may be updated to reflect the path taken by the observations in the proxy decision tree to reach the prediction tag. The partial dependency graph may be updated to depict how the predicted labels of the observation points change when the feature values of a particular feature are modified to a particular value.
FIG. 12 is a flow diagram illustrating an embodiment of a process for commissioning a machine learning model. In the example shown, process 1200 may be implemented by a system such as proxy model server 122.
At 1202, a linear model map is provided. The linear model graph may depict predictions of a linear proxy model.
At 1204, a selection of points included in the sexual agent model is received. The linear proxy model graph may map prediction labels of the linear proxy model and the machine learning model relative to the ranked predictions. The observation point is one of the predictions made by the linear proxy model.
At 1206, one or more non-linear proxy models are updated based on the selected points. For example, the feature importance map may be updated to depict the most important features. The proxy model decision tree may be updated to reflect the path taken by the observations in the proxy decision tree to reach the prediction tag. The partial dependency graph may be updated to depict how the predicted labels of the observation points change when the feature values of a particular feature are modified to a particular value.
At 1208, it is determined whether the output of the linear proxy model correlates with the output of the non-linear proxy model. For example, the output of the linear proxy model may indicate that feature "F1" is one of the most important features affecting the prediction tags of the linear proxy model, while the output of the non-linear proxy model indicates that feature "F1" is not one of the most important features affecting the prediction of the non-linear proxy model.
In response to determining that the linear model is consistent with the linear model, process 1200 proceeds to 1210. In response to determining that the linear model is not consistent with the linear model, process 1200 proceeds to 1212.
At 1210, at least one of the linear proxy model and/or the non-linear proxy model is determined to be accurate. Determining these models is accurate because the interpretation is considered accurate. For example, the determined feature importance, decision tree proxy model output, and/or partial dependency graph that remains stable over time or stable when intentionally perturbed with training data may be matched with human domain expertise to debug the model. If the interpretation matches the human domain expertise, more confidence may be attached to the model. These techniques can be used to visualize, validate and debug machine learning models by comparing displayed decision processes, important features and important interactions with known standards, domain knowledge and reasonable expectations.
At 1212, the linear and/or nonlinear model(s) are retrained. In some embodiments, if a threshold number of entries are flagged, the linear and/or nonlinear proxy models are retrained. An entry may be tagged if the predictive tag associated with the linear proxy model is not related to the predictive tag associated with the non-linear proxy model.
Although the foregoing embodiments have been described in some detail for purposes of clarity of understanding, the invention is not limited to the details provided. There are many alternative ways of implementing the invention. The disclosed embodiments are illustrative and not restrictive.

Claims (40)

1. A method, comprising:
classifying input data associated with a machine learning model into a plurality of clusters;
generating a plurality of linear proxy models, wherein one of the plurality of linear proxy models corresponds to one of the plurality of clusters, wherein the linear proxy models are configured to output corresponding predictions based on input data associated with the corresponding cluster; and
the prediction data associated with the machine learning model and the prediction data associated with the plurality of linear proxy models are output.
2. The method of claim 1, further comprising: input data associated with a machine learning model is received.
3. The method of claim 1, wherein the input data associated with the machine learning model comprises one or more entries, wherein the one or more entries are ordered into training data and validation data, wherein each entry of the one or more entries is associated with one or more features having a corresponding feature value, a corresponding predictive label, and a corresponding actual result.
4. The method of claim 1, wherein the input data associated with the machine learning model is classified into a plurality of clusters using a k-means clustering technique.
5. The method of claim 1, further comprising: the prediction data associated with the plurality of linear proxy models is ranked.
6. The method of claim 5, wherein outputting prediction data associated with a machine learning model and prediction data associated with a plurality of linear proxy models comprises: the ranked prediction data associated with the plurality of linear proxy models is plotted against the corresponding prediction labels.
7. The method of claim 1, further comprising: a selection of data points in the prediction data associated with the plurality of linear proxy models is received.
8. The method of claim 7, in response to receiving a selection of data points in the predictive data associated with the plurality of linear proxy models, providing one or more reason codes associated with predicted values associated with the data points.
9. The method of claim 8, wherein the one or more reason codes associated with the predicted values associated with the data points indicate a highest threshold number of reasons that the corresponding linear proxy model made predictions associated with the selected data points.
10. The method of claim 8, wherein one or more reason codes have corresponding contribution values.
11. The method of claim 10, wherein a sum of the contribution values associated with the one or more reason codes is equal to the predicted value associated with the data point.
12. The method of claim 8, wherein the one or more reason codes correspond to one or more characteristics associated with the input data.
13. The method of claim 1, further comprising: a global agent model of the machine learning model is generated based at least in part on input data associated with the machine learning model.
14. The method of claim 1, further comprising:
receiving production data, wherein the production data comprises at least one entry;
determining a cluster of a plurality of clusters for at least one entry based at least in part on a centroid associated with the cluster;
determining a linear proxy model corresponding to the determined cluster; and
using the determined linear proxy model, outputting prediction data associated with the at least one entry.
15. The method of claim 1, wherein the input data associated with the machine learning model comprises: one or more entries, the method further comprising: ordering one or more items of input data into one or more groups, wherein a group corresponds to one of a plurality of clusters, wherein an item is associated with a group based at least in part on a distance between a feature value associated with the item and a cluster centroid associated with the group.
16. The method of claim 15, wherein a linear proxy model of the plurality of linear proxy models is trained using one or more entries associated with one of the one or more groups.
17. The method of claim 1, wherein a plurality of linear proxy models are trained to predict actual values associated with a machine learning model.
18. A system, comprising:
a processor configured to:
classifying input data associated with a machine learning model into a plurality of clusters;
generating a plurality of linear proxy models, wherein one of the plurality of linear proxy models corresponds to one of the plurality of clusters, wherein the linear proxy models are configured to output corresponding predictions based on input data associated with the corresponding cluster; and
outputting prediction data associated with a machine learning model and prediction data associated with a plurality of linear proxy models; and
a memory coupled to the processor and configured to provide instructions to the processor.
19. The system of claim 18, wherein the processor is further configured to receive input data associated with a machine learning model.
20. A computer program product, the computer program product being embodied in a non-transitory computer readable storage medium and comprising computer instructions for:
classifying input data associated with a machine learning model into a plurality of clusters;
generating a plurality of linear proxy models, wherein one of the plurality of linear proxy models corresponds to one of the plurality of clusters, wherein the linear proxy models are configured to output corresponding predictions based on input data associated with the corresponding cluster; and
the prediction data associated with the machine learning model and the prediction data associated with the plurality of linear proxy models are output.
21. A method, comprising:
receiving an indication of a selection of items associated with a machine learning model; and
one or more interpretation views associated with one or more machine learning models are dynamically updated based on the selected entry.
22. The method of claim 21, wherein the one or more machine learning models comprise one or more non-linear models.
23. The method of claim 22, wherein one of the one or more non-linear proxy models comprises a feature importance model.
24. The method of claim 23, wherein a feature importance model is configured to output one or more features, wherein the one or more features have corresponding global feature importance values and corresponding local feature importance values.
25. A method as claimed in claim 24, wherein the corresponding global feature importance values associated with a feature are based at least in part on the number of uses of the feature in a random forest model.
26. A method as claimed in claim 25, wherein the corresponding global feature importance values associated with a feature are based at least in part on a level of a random forest model that the feature is used to split the random forest model.
27. The method of claim 24, wherein the corresponding local feature importance value is calculated using a leave-one-covariate mechanism.
28. The method of claim 24, further comprising:
comparing a corresponding global feature importance value associated with a feature to a corresponding local feature importance value associated with the feature; and
determining whether a difference between a corresponding global feature importance value and a corresponding local feature importance value associated with the feature is greater than or equal to a threshold.
29. The method of claim 28, in response to determining that a difference between a corresponding global feature importance value associated with the feature and a corresponding local feature importance value is greater than or equal to a threshold, studying the feature importance model.
30. The method of claim 28, in response to determining that a difference between a corresponding global feature importance value associated with the feature and a corresponding local feature importance value is less than a threshold, forgoing research on the feature importance model.
31. The method of claim 22, wherein one of the one or more non-linear proxy models comprises a decision tree proxy model.
32. The method of claim 31, wherein the plurality of branches associated with the decision tree proxy model are based on input data associated with a machine learning model, wherein the input data associated with the machine learning model comprises a plurality of entries, wherein each entry has one or more features and one or more corresponding feature values.
33. The method of claim 31, wherein dynamically updating one or more interpretation views associated with one or more machine learning models comprises: highlighting a path of the decision tree proxy model, wherein the highlighted path is specific to the selected entry.
34. The method of claim 31, wherein a width of a path of the decision tree proxy model indicates a frequency with which the path is used by the decision tree proxy model.
35. The method of claim 32, wherein one of the one or more non-linear proxy models comprises a partial dependency graph.
36. The method of claim 35, wherein a partial dependency graph indicates a dependency of a prediction tag of the partial dependency graph on a feature having a particular value.
37. The method of claim 35, wherein the partial dependency graph indicates: based on the average prediction tags of all entries associated with the partial dependency graph, the partial dependency graph has corresponding features with the same particular value.
38. The method of claim 31, wherein the one or more interpretation views associated with the one or more machine learning models comprise: a view associated with the feature importance proxy model, a view associated with the decision tree proxy model, and a view associated with the partial dependency graph.
39. A system, comprising:
a processor configured to:
receiving an indication of a selection of items associated with a machine learning model; and
dynamically updating one or more interpretation views associated with one or more machine learning models based on the selected entry; and
a memory coupled to the processor and configured to provide instructions to the processor.
40. A computer program product, the computer program product being embodied in a non-transitory computer readable storage medium and comprising computer instructions for:
receiving an indication of a selection of items associated with a machine learning model; and
one or more interpretation views associated with one or more machine learning models are dynamically updated based on the selected entry.
CN201980026981.8A 2018-04-20 2019-04-08 Model interpretation Pending CN112154459A (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US15/959,040 US11922283B2 (en) 2018-04-20 2018-04-20 Model interpretation
US15/959040 2018-04-20
US15/959030 2018-04-20
US15/959,030 US11386342B2 (en) 2018-04-20 2018-04-20 Model interpretation
PCT/US2019/026331 WO2019204072A1 (en) 2018-04-20 2019-04-08 Model interpretation

Publications (1)

Publication Number Publication Date
CN112154459A true CN112154459A (en) 2020-12-29

Family

ID=68239941

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201980026981.8A Pending CN112154459A (en) 2018-04-20 2019-04-08 Model interpretation

Country Status (4)

Country Link
EP (1) EP3782079A4 (en)
CN (1) CN112154459A (en)
SG (1) SG11202009599SA (en)
WO (1) WO2019204072A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461862B (en) * 2020-03-27 2023-06-30 支付宝(杭州)信息技术有限公司 Method and device for determining target characteristics for service data
CN113517035B (en) * 2020-04-10 2024-02-02 中国石油天然气股份有限公司 Method and device for researching structure-activity relationship of surfactant

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7590642B2 (en) * 2002-05-10 2009-09-15 Oracle International Corp. Enhanced K-means clustering
US20140222349A1 (en) * 2013-01-16 2014-08-07 Assurerx Health, Inc. System and Methods for Pharmacogenomic Classification
US10452992B2 (en) * 2014-06-30 2019-10-22 Amazon Technologies, Inc. Interactive interfaces for machine learning model evaluations
WO2018017467A1 (en) * 2016-07-18 2018-01-25 NantOmics, Inc. Distributed machine learning systems, apparatus, and methods

Also Published As

Publication number Publication date
EP3782079A4 (en) 2022-01-19
WO2019204072A1 (en) 2019-10-24
EP3782079A1 (en) 2021-02-24
SG11202009599SA (en) 2020-10-29

Similar Documents

Publication Publication Date Title
US11893467B2 (en) Model interpretation
US12118447B2 (en) Model interpretation
US10068186B2 (en) Model vector generation for machine learning algorithms
US6636862B2 (en) Method and system for the dynamic analysis of data
Li et al. Applying various algorithms for species distribution modelling
CN106022517A (en) Risk prediction method and device based on nucleus limit learning machine
CN113674087A (en) Enterprise credit rating method, apparatus, electronic device and medium
Resmi et al. An effective software project effort estimation system using optimal firefly algorithm
Soroush et al. A hybrid customer prediction system based on multiple forward stepwise logistic regression mode
CN112154459A (en) Model interpretation
Abdulsalam et al. Customer churn prediction in telecommunication industry using classification and regression trees and artificial neural network algorithms
Rahman et al. A comparison of machine learning algorithms to estimate effort in varying sized software
US11144938B2 (en) Method and system for predictive modeling of consumer profiles
Begum et al. Software Defects Identification: Results Using Machine Learning and Explainable Artificial Intelligence Techniques
Mori et al. Inference in hybrid Bayesian networks with large discrete and continuous domains
US20200184344A1 (en) System and method for measuring model efficacy in highly regulated environments
Ureña et al. On incomplete fuzzy and multiplicative preference relations in multi-person decision making
US20210356920A1 (en) Information processing apparatus, information processing method, and program
Al-Janabi A novel agent-DKGBM predictor for business intelligence and analytics toward enterprise data discovery
Riesener et al. Identification of evaluation criteria for algorithms used within the context of product development
Naser et al. SPINEX: Similarity-based predictions with explainable neighbors exploration for regression and classification
Liu et al. Task re-pricing model based on density-based spatial clustering of applications
Boyko et al. Methodology for Estimating the Cost of Construction Equipment Based on the Analysis of Important Characteristics Using Machine Learning Methods
Sakhrawi et al. Software enhancement effort estimation using machine learning regression methods
Chang et al. PSO based time series models applied in exchange rate forecasting for business performance management

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination