WO2021072556A1 - Systems and methods for machine learning interpretability - Google Patents

Systems and methods for machine learning interpretability Download PDF

Info

Publication number
WO2021072556A1
WO2021072556A1 PCT/CA2020/051400 CA2020051400W WO2021072556A1 WO 2021072556 A1 WO2021072556 A1 WO 2021072556A1 CA 2020051400 W CA2020051400 W CA 2020051400W WO 2021072556 A1 WO2021072556 A1 WO 2021072556A1
Authority
WO
WIPO (PCT)
Prior art keywords
training data
shap
prediction
machine learning
values
Prior art date
Application number
PCT/CA2020/051400
Other languages
French (fr)
Inventor
Behrouz Haji SOLEIMANI
Andrea PAGOTTO
Seyednaser NOURASHRAFEDDIN
Chantal BISSON-KROL
Original Assignee
Kinaxis Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kinaxis Inc. filed Critical Kinaxis Inc.
Priority to EP20877942.1A priority Critical patent/EP4046087A4/en
Priority to JP2022522739A priority patent/JP2022552980A/en
Priority to CA3155102A priority patent/CA3155102A1/en
Publication of WO2021072556A1 publication Critical patent/WO2021072556A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/045Explanation of inference; Explainable artificial intelligence [XAI]; Interpretable artificial intelligence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Definitions

  • the present disclosure addresses the problem of visually demonstrating example-based machine learning interpretability explanations of a time series forecast from a black box machine learning model.
  • This method solves the problem stated above, since it makes it clear from a plot of the time-series data, which point or points in the training data explains the forecasted value of a chosen prediction.
  • the method can involve using SHapley Additive explanations (SHAP), which is a unified approach to explain the output of a machine learning model.
  • SHAP may be used by the model to compute feature importances per-instance.
  • a method comprising: training, by a processor, a regression machine learning model using training data; predicting, by the processor, a prediction based on the trained model; receiving, by a machine learning interpretability module, the training data, the trained model and the prediction; and comparing, by the machine learning interpretability module, characteristics of the training data and the prediction.
  • comparing characteristics comprises visualization of the training data, the prediction and the characteristics of the training data and the prediction.
  • comparing characteristics comprises: determining, by the machine learning interpretability module, a heuristic function value of each training data point; wherein: the prediction comprises a plurality of predicted data points; and the heuristic function incorporates: SHAP values of each training data point; SHAP values of the predicted data points; features values of the training data points; and features values of the predicted data points.
  • the heuristic function can comprise a combination of a SHAP distance and a features distance, wherein: the SHAP distance is a Euclidean distance between a SHAP vector of a training data point and a SHAP vector of a predicted data point; the features distance is a Euclidean distance between a features vector of a training data point and a features vector of a predicted data point; the SHAP vector is an ordered sequence of SHAP values of a data point; and the features vector is an ordered sequence of features values of a data point.
  • comparing characteristics comprises: determining, by the machine learning interpretability module, SHAP values of one or more points of the prediction; determining, by the machine learning interpretability module, SHAP values of one or more points of the training data; and determining, by the machine learning interpretability module, for each of the one or more points of the prediction, a difference between the SHAP values of the prediction point and the SHAP values of each of the of the one or more points of the training data.
  • the difference can be a Euclidean distance between a SHAP vector of the prediction point and a SHAP vector of each of the of the one or more points of the training data.
  • comparing characteristics comprises: removing, by the machine learning interpretability module, a training data point from the training data to form an amended training data set; retraining, by the machine learning interpretability module, the trained model on the amended training data set; predicting, by the machine learning interpretability module, based on the amended training data set to provide an amended prediction; comparing, by the machine learning interpretability module, a difference between the prediction and the amended prediction; assigning, by the machine learning interpretability module, a measure of influence to the removed training data point, based on the difference.
  • a system comprising: a processor; and a memory storing instructions that, when executed by the processor, configure the system to: train, by a processor, a regression machine learning model using training data; predict, by the processor, a prediction based on the trained model; receive, by a machine learning interpretability module, the training data, the trained model and the prediction; and compare, by the machine learning interpretability module, characteristics of the training data and the prediction.
  • system is further configured to provide a visualization of the training data, the prediction and the characteristics of the training data and the prediction.
  • the system is further configured to: determine, by the machine learning interpretability module, a heuristic function value of each training data point; wherein: the prediction comprises a plurality of predicted data points; and the heuristic function incorporates: SHAP values of each train data point; SHAP values of the predicted data points; features values of the training data points; and features values of the predicted data points. 11.
  • the heuristic function can comprise a combination of a SHAP distance and a features distance, wherein: the SHAP distance is a Euclidean distance between a SHAP vector of a training data point and a SHAP vector of a predicted data point; the features distance is a Euclidean distance between a features vector of a training data point and a features vector of a predicted data point; the SHAP vector is an ordered sequence of SHAP values of a data point; and the features vector is an ordered sequence of features values of a data point.
  • the system is further configured to: determine, by the machine learning interpretability module, SHAP values of one or more points of the prediction; determine, by the machine learning interpretability module, SHAP values of one or more points of the training data; and determine, by the machine learning interpretability module, for each of the one or more points of the prediction, a difference between the SHAP values of the prediction point and the SHAP values of each of the of the one or more points of the training data.
  • the difference can be a Euclidean distance between a SHAP vector of the prediction point and a SHAP vector of each of the of the one or more points of the training data.
  • the system is further configured to: remove, by the machine learning interpretability module, a training data point from the training data to form an amended training data set; retrain, by the machine learning interpretability module, the trained model on the amended training data set; predict, by the machine learning interpretability module, based on the amended training data set to provide an amended prediction; compare, by the machine learning interpretability module, a difference between the prediction and the amended prediction; assign, by the machine learning interpretability module, a measure of influence to the removed training data point, based on the difference.
  • a non-transitory computer-readable storage medium including instructions that when executed by a computer, cause the computer to: tram, by a processor, a regression machine learning model using training data; predict, by the processor, a prediction based on the trained model; receive, by a machine learning interpretability module, the training data, the trained model and the prediction; and compare, by the machine learning interpretability module, characteristics of the training data and the prediction.
  • the heuristic function can comprise a combination of a SHAP distance and a features distance, wherein: the SHAP distance is a Euclidean distance between a SHAP vector of a training data point and a SHAP vector of a predicted data point; the features distance is a Euclidean distance between a features vector of a training data point and a features vector of a predicted data point; the SHAP vector is an ordered sequence of SHAP values of a data point; and the features vector is an ordered sequence of features values of a data point.
  • the difference can be a Euclidean distance between a SHAP vector of the prediction point and a SHAP vector of each of the of the one or more points of the training data.
  • FIG. 1 illustrates a flowchart in accordance with one embodiment.
  • FIG. 2 illustrates a machine learning interpretability module flowchart in accordance with one embodiment.
  • FIG. 3A illustrates a heuristic function example in accordance with one embodiment.
  • FIG. 3B illustrates a further aspect of the heuristic function example shown in FIG. 3 A.
  • FIG. 3C illustrates a further aspect of the heuristic function example shown in FIG. 3 A.
  • FIG. 4 illustrates an example in accordance with one embodiment.
  • FIG. 5 illustrates an example in accordance with one embodiment.
  • FIG. 6 illustrates a flowchart in accordance with one embodiment.
  • FIG. 7 illustrates an example in accordance with one embodiment.
  • FIG. 8 illustrates a system in accordance with one embodiment.
  • FIG. 1 illustrates flowcharts 100 in accordance with one embodiment.
  • the flowcharts 100 comprise two phases: a first phase 102 and a second phase 104.
  • training data 106 is used by a machine learning algorithm 108 to provide a trained model 110.
  • the machine learning algorithm 108 uses the trained model 110 to provide a predictions 112 (or prediction) of future data.
  • the training data 106, the trained model 110, and the predictions 112 are then input to a machine learning interpretability module 114 to provide an explanation output 116.
  • the explanation output 116 can be output visually, which may also include a graphical user interface 118, so as to allow a user to interact with the explanation output 116.
  • FIG. 2 illustrates an MLI module flowchart 200 in accordance with one embodiment. That is, FIG. 2 illustrates an embodiment of a machine learning interpretability module 114.
  • the machine learning interpretability module 114 can operate in the following two stages.
  • the first stage can comprise computation of: historic SHAP values 202 based on training data 106 and trained model 110; and future SHAP values 204 based on trained model 110 and predictions 112.
  • historic SHAP values 202 and future SHAP values 204 are computed, they are used in a second stage: computation of a similarity measure 206 between historic SHAP values 202 and future SHAP values 204.
  • Similarity measure 206 can then be output as an explanation output 116 for a user.
  • Explanation output 116 can be visual, and may include a graphical user interface 118 so as to allow the user to interact with the results.
  • a heuristic function can be used in calculation of similarity measure 206, by including a combination of both the difference between historic SHAP values 202 and future SHAP values 204, and the difference between historic and future features values.
  • each point (whether historical or forecast) is accorded a feature vector and a SHAP vector.
  • a feature vector is just an ordered sequence of numerical values assigned to a given feature of the data point.
  • a SHAP vector is just an ordered sequence of numerical values assigned to a given SHAP characteristic of the data point.
  • a similarity measure can refer to a similarity between a forecast data point and a training data point, as measured by the distance between the vector associated with each point.
  • a measure of feature similarity can be obtained by calculating the distance between the feature vector of the training data point and the feature vector of the forecast point.
  • a measure of SHAP similarity can be obtained by calculating the distance between the SHAP vector of the training data point and the SHAP vector of the forecast point.
  • a heuristic function can be a combination of the feature distance and the SHAP distance.
  • each training data point can have the following features: year, month, week of year, day of week, season, etc.
  • a numerical value can be assigned to a season (e.g. 'O' for winter; T for summer; or 'O' for winter; T for spring, '2' for summer; and '3' for fall).
  • Feature vectors provided no information about the attribute or value at the data point. For example, for a lead-time series, the feature vector provides no information about lead-time of any given data point - it only provides information about the features of that data point.
  • a feature vector of 'PF' is obtained based on the features of 'PF' .
  • Each training data point Hi' also has its own feature vector.
  • the features similarity between each training data point ⁇ ;' and the forecast point 'PF' can be calculated by standard techniques for calculating Euclidean distances between vectors.
  • a SHAP vector of 'PF' is calculated.
  • the SHAP vector of each training data point '3 ⁇ 4' is also computed. Contrary to the features vector, the SHAP vector includes information about the attribute or value associated with the data point.
  • the SHAP vector includes information about the lead time for the data point in question.
  • the SHAP similarity between each training data point and the forecast point 'PF' can be calculated by standard techniques for calculating Euclidean distances between vectors.
  • HF simple heuristic function
  • HF a*(shap distance) + (1 -a)" (features distance) (EQ. 1).
  • 'a' can be adjusted between 0 and 1.
  • FIG. 3A, FIG. 3B and FIG. 3C illustrate a heuristic function example 300 in accordance with one embodiment.
  • the historical lead time data 318 is shown from roughly September 1, 2016 to roughly November 30, 2017, while the forecast lead times 320 are shown between roughly December 1, 2016 to roughly November 30, 2018.
  • each of FIG. 3A, FIG. 3B and FIG. 3C illustrates a SHAP scale 322, which varies from a minimum value of ‘0’ (as shown in FIG. 3A) to a maximum value of ‘100’ (as shown in FIG. 3C).
  • FIG. 3A, FIG. 3B and FIG. 3C illustrates a forecast point scale 328 which designates various points on the forecast lead times 320.
  • the forecast point scale 328 is set to ‘ 151 ’, which corresponds to the forecast point 308.
  • each figure illustrates a gradient key (gradient key 310 in FIG. 3 A; gradient key 312 in FIG. 3B; and gradient key 314 in FIG. 3C).
  • gradient key 310 in FIG. 3 A
  • gradient key 312 in FIG. 3B
  • gradient key 314 in FIG. 3C
  • the grater its impact or weight of the training data point on the forecast point 308. While the drawings are shown on a gray-scale, it is understood that the graphical display will be in colour.
  • the resulting features similarity plot 302 shows that the darkest points in the historical lead time data 318 occur between training data points in the March 1, 2017-July 1, 2017 range, for forecast point 308 (which is near May 15, 2018). That is, these points with the darkest gradient indicate that the greatest similarities occur between training data points in the March 1, 2017-July 1, 2017 range, for forecast point 308. This is not surprising, since these are training data points that have similar dates (i.e. features) to forecast point 308.
  • the lead time has no bearing on the features similarity.
  • the resulting half features, half SHAP plot 304 indicates that the greatest similarities occur between training data points in the April 15, 2017- June 15, 2017 range, for forecast point 308 (which is near May 15, 2018), as inferred by the points with the darkest gradients. Note how the similarity range has narrowed to April 15, 2017-June 15, 2017 in FIG. 3B (which has half features, half SHAP similarities), from a range of March 1, 2017-July 1, 2017 shown in FIG. 3A (which has only features similarities).
  • the resulting SHAP similarity plot 306 indicates that the greatest SHAP similarities occur between training data point of around May 1, 2017 for forecast point 308 (which is near May 15, 2018). Note how the similarity range in FIG. 3C has narrowed successively from the features similarity plot 302 shown in FIG. 3A and the half features, half SHAP plot 304 shown in FIG. 3B.
  • FIG. 3C also illustrates SHAP values 316 of forecast point 308, which indicate that the most important feature in the historical lead time data 318 for forecast point 308 is when the day of the week is equal to 1, which lowers the forecast lead time to 7.6 days (as opposed to other days of the week).
  • the training data based on SHAP similarities, the one training data point around May 1, 2017 has a similar lead time as that of forecast point 308. Looking at this point in the history can provide some explanation about why this predicted point (i.e. Forecast point 308) was given a lower predicted lead time than a forecast point beside it.
  • the day of week has a value different from ‘1’, which, according to SHAP values 316 has minimal effect on the forecast. Therefore, any point adjacent to forecast point 308 will not show a decrease in lead-time to the extent shown by forecast point 308.
  • the next most important feature in the historical lead time data 318 for forecast point 308, is when the month is equal to 5 (that is, the month of May).
  • FIG. 4 illustrates an example 400 in accordance with one embodiment.
  • Forecast point 404 is one day after forecast point 308.
  • forecast point 308 the greatest impact in lowering the forecast lead time to 7.6 days is when the day of the week is - 1, as shown in SHAP values 316.
  • forecast point 404 the forecast lead time jumps to 22, as shown by SHAP values 406. Furthermore, the day of the week has no impact in lowering the projected lead time. In contrast to forecast point 308, the week of the year set to 19 has the highest impact for forecast point 404. While the drawings are shown on a gray-scale, it is understood that the graphical display will be in colour.
  • FIG. 5 illustrates an example 500 in accordance with one embodiment.
  • Graph 502 illustrates an example of lead time v. date, showing both historical data 504 and prediction 506.
  • prediction point 508 shown by the arrow, at around July 5 is highlighted.
  • the features are: year, month of the year, week of the year, day of the year and season (e.g. O' for winter; T for summer).
  • the SHAP values 510 of prediction point 508 indicate that the prediction point 508 has a forecasted lead time of l.oo (output value).
  • the week of the year value of 28 has the greatest impact on the forecast, while the year (2018) is next in impact.
  • the day of the week is next, in terms of impact on the forecast; if the day of the week is other than 5, the resulting forecast of lead time will be higher.
  • Season (with value T) has minimal impact on prediction point 508.
  • each training data point on prediction point 508 is shown by the gradient key 512 of a heuristic function that includes a combination of historical SHAP vector distances and features vector distances, as described above.
  • a sliding scale value of 50 (out of 100) (shown by SHAP scale 322) has been used in the evaluation of the heuristic function, which means that features vector distances and historical SHAP vector distances are combined equally in the evaluation of the heuristic function.
  • FIG. 6 illustrates a flowchart 600 in accordance with one embodiment.
  • Flowchart 600 illustrates another embodiment of machine learning interpretability, in which an influence of a training data point (on a forecast) is provided. Influence is not measured by a SHAP characteristic, but instead, on how removal of that training data point affects the forecast.
  • training data is used to train a machine learning model.
  • the model is used to make a prediction at block 606.
  • each training data point is removed individually (at block 608) to form a modified or new training data set at block 610; the model is retrained at block 612 on the new data set, and a new prediction is made at block 614.
  • results of the prediction (made at block 614) are compared with the results of the prediction made with the full training data set (made at block 606). The comparison may be made in any number of ways known in the art.
  • the removed point is then returned to the training data set at block 618, along with a measure of the influence of the removed data point. Embodiments of the measure of influence are described below.
  • the measure of influence can be provided to a user in any suitable manner known in the art. In some embodiments, the measure of influence of each training data point is shown visually in graphical form. In some embodiments, the measure of influence of each training data point is shown visually in tabular form.
  • FIG. 7 illustrates an example 700 in accordance with one embodiment machine learning interpretability.
  • Flowchart 600 was used to obtain illustrative example 700.
  • Historical data 702 (shown by filled circles) of lead times, from about September 1, 2016 to about January 7, 2018, was used to train a machine model, leading to a full data forecast 704.
  • the historical data point 712 (around March 25, 2017) is removed from the training data set.
  • the revised prediction (based on the removal of historical data point 712) is shown as amended data forecast 706, which is, for the most part, lower than full data forecast 704 throughout the forecast range of about January 8, 2018 to about January 8, 2019.
  • the difference between full data forecast 704 and amended data forecast 706 can be evaluated by known means in the art, and the difference is accorded a difference value for historical data point 712.
  • a user can glean further information from the colour gradient of historical data 702, by looking for patterns of high-influence data points, or low-influence data points. This can be achieved via a graphical user interface through which the user can select different data points along the historical data 702, and see how the resulting amended data forecast 706 changes relative to the full data forecast 704.
  • FIG. 8 illustrates a system 800 in accordance with one embodiment of machine learning interpretability.
  • System server 802 comprises a machine learning algorithm, a machine learning interpretability module, and other modules and/or algorithms, including access to a library of SHAP algorithms.
  • Machine learning storage 812 can include training data used for training a machine learning algorithm.
  • System 800 includes a system server 802, machine learning storage 812, client data source 822 and one or more devices 814, 816 and 818.
  • System server 802 can include a memory 808, a disk 804, a processor 806 and a network interface 820. While one processor 806 is shown, the system server 802 can comprise one or more processors.
  • memory 808 can be volatile memory, compared with disk 804 which can be non-volatile memory.
  • system server 802 can communicate with machine learning storage 812, client data source 822 and one or more external devices 814, 816 and 818 via network 810. While machine learning storage 812 is illustrated as separate from system server 802, machine learning storage 812 can also be integrated into system server 802, either as a separate component within system server 802 or as part of at least one of memory 808 and disk 804.
  • System 800 can also include additional features and/or functionality.
  • system 800 can also include additional storage (removable and/or non removable) including, but not limited to, magnetic or optical disks or tape.
  • additional storage is illustrated in FIG. 8 by memory 808 and disk 804.
  • Storage media can include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data.
  • Memory 808 and disk 804 are examples of non-transitory computer-readable storage media.
  • Non-transitory computer-readable media also includes, but is not limited to, Random Access Memory (RAM), Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash memory and/or other memory technology, Compact Disc Read-Only Memory (CD-ROM), digital versatile discs (DVD), and/or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, and/or any other medium which can be used to store the desired information and which can be accessed by system 800. Any such non-transitory computer-readable storage media can be part of system 800.
  • RAM Random Access Memory
  • ROM Read-Only Memory
  • EEPROM Electrically Erasable Programmable Read-Only Memory
  • CD-ROM Compact Disc Read-Only Memory
  • DVD digital versatile discs
  • Any such non-transitory computer-readable storage media can be part of system 800.
  • Communication between system server 802, machine learning storage 812 and one or more external devices 814, 91 and 818 via network 810 can be over various network types.
  • the processor 806 may be disposed in communication with network 810 via a network interface 820.
  • the network interface 820 may communicate with the network 810.
  • the network interface 820 may employ connection protocols including, without limitation, direct connect, Ethernet (e.g., twisted pair 10/40/400 Base T), transmission control protocol/internet protocol (TCP/IP), token ring, IEEE 802. lla/b/g/n/x, etc.
  • Non-limitmg example network types can include Fibre Channel, small computer system interface (SCSI), Bluetooth, Ethernet, Wi-fi, Infrared Data Association (IrDA), Local area networks (LAN), Wireless Local area networks (WLAN), wide area networks (WAN) such as the Internet, serial, and universal serial bus (USB).
  • SCSI small computer system interface
  • Bluetooth Ethernet
  • Wi-fi Infrared Data Association
  • LAN Local area networks
  • WLAN Wireless Local area networks
  • WAN wide area networks
  • USB universal serial bus
  • communication between various components of system 800 may take place over hard-wired, cellular, Wi-Fi or Bluetooth networked components or the like.
  • one or more electronic devices of system 800 may include cloud-based features, such as cloud-based memory storage.
  • Machine learning storage 812 may implement an "in-memory" database, in which volatile (e.g., non-disk-based) storage (e.g., Random Access Memory) is used both for cache memory and for storing the full database during operation, and persistent storage (e.g., one or more fixed disks) is used for offline persistency and maintenance of database snapshots.
  • volatile storage may be used as cache memory for storing recently-used data, while persistent storage stores the full database.
  • Machine learning storage 812 may store metadata regarding the structure, relationships and meaning of data. This information may include data defining the schema of database tables stored within the data. A database table schema may specify the name of the database table, columns of the database table, the data type associated with each column, and other information associated with the database table. Machine learning storage 812 may also or alternatively support multi-tenancy by providing multiple logical database systems which are programmatically isolated from one another. Moreover, the data may be indexed and/or selectively replicated in an index to allow fast searching and retrieval thereof. In addition, machine learning storage 812 can store a number of machine learning models that are accessed by the system server 802. A number of ML models can be used.
  • gradient-boosted trees can be used.
  • one or more clustering algorithms can be used. Non-limiting examples include hierarchical clustering, k-means, mixture models, density-based spatial clustering of applications with noise and ordering points to identify the clustering structure.
  • one or more anomaly detection algorithms can be used.
  • Non-limiting examples include local outlier factor.
  • neural networks can be used.
  • Client data source 822 may provide a variety of raw data from a user, including, but not limited to: point of sales data that indicates the sales record of all of the client's products at every location; the inventory history of all of the client's products at every location; promotional campaign details for all products at all locations, and events that are important/relevant for sales of a client's product at every location.
  • the system server 802 may communicate with one or more devices 814, 816 and 818.
  • These devices 814, 816 and 818 may include, without limitation, personal computer(s), server(s), various mobile devices such as cellular telephones, smartphones (e g., Apple iPhone, Blackberry, Android-based phones, etc.), tablet computers, eBook readers (Amazon Kindle, Nook, etc.), laptop computers, notebooks, gaming consoles (Microsoft Xbox, Nintendo DS, Sony PlayStation, etc.), or the like.
  • system server 802 can retrieve data from machine learning storage 812 and client data source 822.
  • the retrieved data can be saved in memory 808 or disk 804.
  • system server 802 also comprise a web server, and can format resources into a format suitable to be displayed on a web browser.
  • a user can amend the results, which are re-sent to machine learning storage 812, for further execution.
  • the results can be amended by either interaction with one or more data files, which are then sent to machine learning storage 812; or through a user interface at the one or more devices 814, 816 and 818.
  • a user can amend the results using a graphical user interface.
  • Any of the methods, modules, algorithms, implementations, or procedures described herein can include machine-readable instructions for execution by: (a) a processor, (b) a controller, and/or (c) any other suitable processing device.
  • Any algorithm, software, or method disclosed herein can be embodied in software stored on a non-transitory tangible medium such as, for example, a flash memory, a CD-ROM, a floppy disk, a hard drive, a digital versatile disk (DVD), or other memory devices, but persons of ordinary skill in the art will readily appreciate that the entire algorithm and/or parts thereof could alternatively be executed by a device other than a controller and/or embodied in firmware or dedicated hardware in a well-known manner (e g., it may be implemented by an application specific integrated circuit (ASIC), a programmable logic device (PLD), a field programmable logic device (FPLD), discrete logic, etc.).
  • ASIC application specific integrated circuit
  • PLD programmable logic device
  • FPLD field programmable

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Methods and systems that provide machine learning interpretability. SHAP values of historical and predicted data, along with features of both, are used to provide a measure of the impact of training data points on a predictions. Removal of an individual training data point from a training data set, followed by comparing the resulting prediction with that obtained by the full training data set, also provides a measure of influence of individual training data points on forecasts.

Description

SYSTEMS AND METHODS FOR MACHINE LEARNING INTERPRET ABILITY
BACKGROUND
[0001] While machine learning provides a powerful predictive tool, a user is often left wondering how training data (which is used to train a machine learning model) is related to a forecast provided by the trained model. This phenomenon is often referred to as a “black box” machine learning model. One method that provides a user an interpretation of machine learning prediction results based on tabular data, uses a chart. There are also some interpretability methods specific to images or textual data. However, there are no methods that are applicable for a time-series forecast.
BRIEF SUMMARY
[0002] The present disclosure addresses the problem of visually demonstrating example-based machine learning interpretability explanations of a time series forecast from a black box machine learning model. Disclosed are methods and systems that relate a similarity measure between a chosen predicted point in a forecast and the training data used for training the model, shown with a visualization suitable for interpreting time-series data. This method solves the problem stated above, since it makes it clear from a plot of the time-series data, which point or points in the training data explains the forecasted value of a chosen prediction. The method can involve using SHapley Additive explanations (SHAP), which is a unified approach to explain the output of a machine learning model. SHAP may be used by the model to compute feature importances per-instance. These feature importances, and feature values, are used as vectors to compute a similarity between training data and prediction. This method shows not only how the model has weighted the importance of features for explanation of a particular instance, but also can explain why, based on related examples from the past. [0003] In one aspect, a method comprising: training, by a processor, a regression machine learning model using training data; predicting, by the processor, a prediction based on the trained model; receiving, by a machine learning interpretability module, the training data, the trained model and the prediction; and comparing, by the machine learning interpretability module, characteristics of the training data and the prediction. [0004] In some embodiments of the method, comparing characteristics comprises visualization of the training data, the prediction and the characteristics of the training data and the prediction.
[0005] In some embodiments of the method, comparing characteristics comprises: determining, by the machine learning interpretability module, a heuristic function value of each training data point; wherein: the prediction comprises a plurality of predicted data points; and the heuristic function incorporates: SHAP values of each training data point; SHAP values of the predicted data points; features values of the training data points; and features values of the predicted data points. The heuristic function can comprise a combination of a SHAP distance and a features distance, wherein: the SHAP distance is a Euclidean distance between a SHAP vector of a training data point and a SHAP vector of a predicted data point; the features distance is a Euclidean distance between a features vector of a training data point and a features vector of a predicted data point; the SHAP vector is an ordered sequence of SHAP values of a data point; and the features vector is an ordered sequence of features values of a data point.
[0006] In some embodiments of the method, comparing characteristics comprises: determining, by the machine learning interpretability module, SHAP values of one or more points of the prediction; determining, by the machine learning interpretability module, SHAP values of one or more points of the training data; and determining, by the machine learning interpretability module, for each of the one or more points of the prediction, a difference between the SHAP values of the prediction point and the SHAP values of each of the of the one or more points of the training data. The difference can be a Euclidean distance between a SHAP vector of the prediction point and a SHAP vector of each of the of the one or more points of the training data.
[0007] In some embodiments of the method, comparing characteristics comprises: removing, by the machine learning interpretability module, a training data point from the training data to form an amended training data set; retraining, by the machine learning interpretability module, the trained model on the amended training data set; predicting, by the machine learning interpretability module, based on the amended training data set to provide an amended prediction; comparing, by the machine learning interpretability module, a difference between the prediction and the amended prediction; assigning, by the machine learning interpretability module, a measure of influence to the removed training data point, based on the difference. [0008] In another aspect, a system comprising: a processor; and a memory storing instructions that, when executed by the processor, configure the system to: train, by a processor, a regression machine learning model using training data; predict, by the processor, a prediction based on the trained model; receive, by a machine learning interpretability module, the training data, the trained model and the prediction; and compare, by the machine learning interpretability module, characteristics of the training data and the prediction.
[0009] In some embodiments, the system is further configured to provide a visualization of the training data, the prediction and the characteristics of the training data and the prediction.
[0010] In some embodiments, the system is further configured to: determine, by the machine learning interpretability module, a heuristic function value of each training data point; wherein: the prediction comprises a plurality of predicted data points; and the heuristic function incorporates: SHAP values of each train data point; SHAP values of the predicted data points; features values of the training data points; and features values of the predicted data points. 11. The heuristic function can comprise a combination of a SHAP distance and a features distance, wherein: the SHAP distance is a Euclidean distance between a SHAP vector of a training data point and a SHAP vector of a predicted data point; the features distance is a Euclidean distance between a features vector of a training data point and a features vector of a predicted data point; the SHAP vector is an ordered sequence of SHAP values of a data point; and the features vector is an ordered sequence of features values of a data point.
[0011] In some embodiments, the system is further configured to: determine, by the machine learning interpretability module, SHAP values of one or more points of the prediction; determine, by the machine learning interpretability module, SHAP values of one or more points of the training data; and determine, by the machine learning interpretability module, for each of the one or more points of the prediction, a difference between the SHAP values of the prediction point and the SHAP values of each of the of the one or more points of the training data. The difference can be a Euclidean distance between a SHAP vector of the prediction point and a SHAP vector of each of the of the one or more points of the training data.
[0012] In some embodiments, the system is further configured to: remove, by the machine learning interpretability module, a training data point from the training data to form an amended training data set; retrain, by the machine learning interpretability module, the trained model on the amended training data set; predict, by the machine learning interpretability module, based on the amended training data set to provide an amended prediction; compare, by the machine learning interpretability module, a difference between the prediction and the amended prediction; assign, by the machine learning interpretability module, a measure of influence to the removed training data point, based on the difference.
[0013] In yet another aspect, a non-transitory computer-readable storage medium, the computer-readable storage medium including instructions that when executed by a computer, cause the computer to: tram, by a processor, a regression machine learning model using training data; predict, by the processor, a prediction based on the trained model; receive, by a machine learning interpretability module, the training data, the trained model and the prediction; and compare, by the machine learning interpretability module, characteristics of the training data and the prediction.
[0014] In some embodiments of the non-transitory computer-readable storage medium, the instructions that when executed by a computer, further cause the computer to provide visualization of the training data, the prediction and the characteristics of the training data and the prediction.
[0015] In some embodiments of the non-transitory computer-readable storage medium, the instructions that when executed by a computer, further cause the computer to: determine, by the machine learning interpretability module, a heuristic function value of each training data point; wherein: the prediction comprises a plurality of predicted data points; and the heuristic function incorporates: SHAP values of each train data point; SHAP values of the predicted data points; features values of the training data points; and features values of the predicted data points. The heuristic function can comprise a combination of a SHAP distance and a features distance, wherein: the SHAP distance is a Euclidean distance between a SHAP vector of a training data point and a SHAP vector of a predicted data point; the features distance is a Euclidean distance between a features vector of a training data point and a features vector of a predicted data point; the SHAP vector is an ordered sequence of SHAP values of a data point; and the features vector is an ordered sequence of features values of a data point.
[0016] In some embodiments of the non-transitory computer-readable storage medium, the instructions that when executed by a computer, further cause the computer to: determine, by the machine learning interpretability module, SHAP values of one or more points of the prediction; determine, by the machine learning interpretability module, SHAP values of one or more points of the training data; and determine, by the machine learning interpretability module, for each of the one or more points of the prediction, a difference between the SHAP values of the prediction point and the SHAP values of each of the of the one or more points of the training data. The difference can be a Euclidean distance between a SHAP vector of the prediction point and a SHAP vector of each of the of the one or more points of the training data.
[0017] In some embodiments of the non-transitory computer-readable storage medium, the instructions that when executed by a computer, further cause the computer to: remove, by the machine learning interpretability module, a training data point from the training data to form an amended training data set; retrain, by the machine learning interpretability module, the trained model on the amended training data set; predict, by the machine learning interpretability module, based on the amended training data set to provide an amended prediction; compare, by the machine learning interpretability module, a difference between the prediction and the amended prediction; assign, by the machine learning interpretability module, a measure of influence to the removed training data point, based on the difference.
[0018] The details of one or more embodiments of the subject matter of this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims.
[0019] Like reference numbers and designations in the various drawings indicate like elements.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
[0020] To easily identify the discussion of any particular element or act, the most significant digit or digits in a reference number refer to the figure number in which that element is first introduced.
[0021] FIG. 1 illustrates a flowchart in accordance with one embodiment.
[0022] FIG. 2 illustrates a machine learning interpretability module flowchart in accordance with one embodiment. [0023] FIG. 3A illustrates a heuristic function example in accordance with one embodiment.
[0024] FIG. 3B illustrates a further aspect of the heuristic function example shown in FIG. 3 A.
[0025] FIG. 3C illustrates a further aspect of the heuristic function example shown in FIG. 3 A.
[0026] FIG. 4 illustrates an example in accordance with one embodiment.
[0027] FIG. 5 illustrates an example in accordance with one embodiment.
[0028] FIG. 6 illustrates a flowchart in accordance with one embodiment.
[0029] FIG. 7 illustrates an example in accordance with one embodiment.
[0030] FIG. 8 illustrates a system in accordance with one embodiment.
DETAILED DESCRIPTION
[0031] In the present disclosure, any embodiment or implementation of the present subject matter described herein as serving as an example, instance or illustration, and is not necessarily to be construed as preferred or advantageous over other embodiments.
[0032] While the disclosure is susceptible to various modifications and alternative forms, specific embodiment thereof has been shown by way of example in the drawings and will be described in detail below. It should be understood, however that it is not intended to limit the disclosure to the particular forms disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents, and alternative falling within the spirit and the scope of the disclosure.
[0033] The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a setup, device or method that comprises a list of components or steps does not include only those components or steps but may include other components or steps not expressly listed or inherent to such setup or device or method. In other words, one or more elements in a system or apparatus proceeded by "comprises . . . a" does not, without more constraints, preclude the existence of other elements or additional elements in the system or apparatus.
[0034] In the following detailed description of the embodiments of the disclosure, reference is made to the accompanying drawings that form a part hereof, and in which are shown by way of illustration specific embodiments in which the disclosure may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the disclosure, and it is to be understood that other embodiments may be utilized and that changes may be made without departing from the scope of the present disclosure. The following description is, therefore, not to be taken in a limiting sense.
[0035] FIG. 1 illustrates flowcharts 100 in accordance with one embodiment.
[0036] The flowcharts 100 comprise two phases: a first phase 102 and a second phase 104.
[0037] In first phase 102, training data 106 is used by a machine learning algorithm 108 to provide a trained model 110. The machine learning algorithm 108 uses the trained model 110 to provide a predictions 112 (or prediction) of future data.
[0038] In the second phase 104, the training data 106, the trained model 110, and the predictions 112 are then input to a machine learning interpretability module 114 to provide an explanation output 116. the explanation output 116 can be output visually, which may also include a graphical user interface 118, so as to allow a user to interact with the explanation output 116.
[0039] FIG. 2 illustrates an MLI module flowchart 200 in accordance with one embodiment. That is, FIG. 2 illustrates an embodiment of a machine learning interpretability module 114.
[0040] The machine learning interpretability module 114 can operate in the following two stages. The first stage can comprise computation of: historic SHAP values 202 based on training data 106 and trained model 110; and future SHAP values 204 based on trained model 110 and predictions 112.
[0041] Once historic SHAP values 202 and future SHAP values 204 are computed, they are used in a second stage: computation of a similarity measure 206 between historic SHAP values 202 and future SHAP values 204.
[0042] Similarity measure 206 can then be output as an explanation output 116 for a user. Explanation output 116 can be visual, and may include a graphical user interface 118 so as to allow the user to interact with the results.
[0043] In some embodiments, a heuristic function can be used in calculation of similarity measure 206, by including a combination of both the difference between historic SHAP values 202 and future SHAP values 204, and the difference between historic and future features values.
[0044] In some embodiments, each point (whether historical or forecast) is accorded a feature vector and a SHAP vector. A feature vector is just an ordered sequence of numerical values assigned to a given feature of the data point. Similarly, a SHAP vector is just an ordered sequence of numerical values assigned to a given SHAP characteristic of the data point.
[0045] In some embodiments, a similarity measure can refer to a similarity between a forecast data point and a training data point, as measured by the distance between the vector associated with each point. For example, a measure of feature similarity can be obtained by calculating the distance between the feature vector of the training data point and the feature vector of the forecast point. Similarly, a measure of SHAP similarity can be obtained by calculating the distance between the SHAP vector of the training data point and the SHAP vector of the forecast point.
[0046] In some embodiments, a heuristic function can be a combination of the feature distance and the SHAP distance.
[0047] Example of a heuristic function
[0048] In a time series, each training data point can have the following features: year, month, week of year, day of week, season, etc. For seasons, a numerical value can be assigned to a season (e.g. 'O' for winter; T for summer; or 'O' for winter; T for spring, '2' for summer; and '3' for fall). Feature vectors provided no information about the attribute or value at the data point. For example, for a lead-time series, the feature vector provides no information about lead-time of any given data point - it only provides information about the features of that data point.
[0049] For a given forecast point, 'PF', a feature vector of 'PF' is obtained based on the features of 'PF' . Each training data point Hi', also has its own feature vector. The features similarity between each training data point Ή;' and the forecast point 'PF' can be calculated by standard techniques for calculating Euclidean distances between vectors. [0050] Similarly, for forecast point, 'PF', a SHAP vector of 'PF' is calculated. The SHAP vector of each training data point '¾' is also computed. Contrary to the features vector, the SHAP vector includes information about the attribute or value associated with the data point. For example, where lead times are forecasted, the SHAP vector includes information about the lead time for the data point in question. The SHAP similarity between each training data point and the forecast point 'PF' can be calculated by standard techniques for calculating Euclidean distances between vectors.
[0051] A simple heuristic function, HF, that includes both the features distance and the SHAP distance can be formulated as follows:
[0052] HF = a*(shap distance) + (1 -a)" (features distance) (EQ. 1).
[0053] The value of 'a' can be adjusted between 0 and 1. When a=0, the heuristic function only provides features similarity. When a=l, the heuristic function only provides SHAP similarity.
[0054] FIG. 3A, FIG. 3B and FIG. 3C illustrate a heuristic function example 300 in accordance with one embodiment. In each of these figures, the historical lead time data 318 is shown from roughly September 1, 2016 to roughly November 30, 2017, while the forecast lead times 320 are shown between roughly December 1, 2016 to roughly November 30, 2018.
[0055] Furthermore, each of FIG. 3A, FIG. 3B and FIG. 3C illustrates a SHAP scale 322, which varies from a minimum value of ‘0’ (as shown in FIG. 3A) to a maximum value of ‘100’ (as shown in FIG. 3C). The value of the SHAP scale 322 is equal to the value of ‘a’ x 100, where ‘a’ is defined in Equation 1. That is, if a =1, the SHAP scale value is 100; if ‘a’ = 0.5, then the SHAP scale value is equal to 50, and so on. That is, the SHAP scale value represents a sliding value of the SHAP distance in the heuristic function defined in EQ. 1 above.
[0056] In addition, each of FIG. 3A, FIG. 3B and FIG. 3C illustrates a forecast point scale 328 which designates various points on the forecast lead times 320. In the figures, the forecast point scale 328 is set to ‘ 151 ’, which corresponds to the forecast point 308.
[0057] SHAP and features similarities are shown for training data points relative to forecast point 308 in each of FIG. 3A, FIG. 3B and FIG. 3C. Furthermore, each figure illustrates a gradient key (gradient key 310 in FIG. 3 A; gradient key 312 in FIG. 3B; and gradient key 314 in FIG. 3C). In which the darker the shade of the training data point according to the gradient key, the grater its impact or weight of the training data point on the forecast point 308. While the drawings are shown on a gray-scale, it is understood that the graphical display will be in colour.
[0058] FIG. 3A illustrates the case where the SHAP scale 322 value is equal to zero. That is, 'a' = 0 in EQ (1), which means that the heuristic function represents only features similarity plot 302. The resulting features similarity plot 302 shows that the darkest points in the historical lead time data 318 occur between training data points in the March 1, 2017-July 1, 2017 range, for forecast point 308 (which is near May 15, 2018). That is, these points with the darkest gradient indicate that the greatest similarities occur between training data points in the March 1, 2017-July 1, 2017 range, for forecast point 308. This is not surprising, since these are training data points that have similar dates (i.e. features) to forecast point 308. The lead time has no bearing on the features similarity.
[0059] FIG. 3B illustrates the case where the SHAP scale 322 value is equal to 50. That is, 'a' = 0.5 in EQ. (1), which means that the heuristic function represents a half features, half SHAP plot 304. The resulting half features, half SHAP plot 304 indicates that the greatest similarities occur between training data points in the April 15, 2017- June 15, 2017 range, for forecast point 308 (which is near May 15, 2018), as inferred by the points with the darkest gradients. Note how the similarity range has narrowed to April 15, 2017-June 15, 2017 in FIG. 3B (which has half features, half SHAP similarities), from a range of March 1, 2017-July 1, 2017 shown in FIG. 3A (which has only features similarities).
[0060] FIG. 3C illustrates the case where the SHAP scale 322 value is equal to 100. That is, 'a' = 1.0 in EQ. (1), which means that the heuristic function represents a SHAP similarity plot 306. The resulting SHAP similarity plot 306 indicates that the greatest SHAP similarities occur between training data point of around May 1, 2017 for forecast point 308 (which is near May 15, 2018). Note how the similarity range in FIG. 3C has narrowed successively from the features similarity plot 302 shown in FIG. 3A and the half features, half SHAP plot 304 shown in FIG. 3B.
[0061] FIG. 3C also illustrates SHAP values 316 of forecast point 308, which indicate that the most important feature in the historical lead time data 318 for forecast point 308 is when the day of the week is equal to 1, which lowers the forecast lead time to 7.6 days (as opposed to other days of the week). Looking at the training data, based on SHAP similarities, the one training data point around May 1, 2017 has a similar lead time as that of forecast point 308. Looking at this point in the history can provide some explanation about why this predicted point (i.e. Forecast point 308) was given a lower predicted lead time than a forecast point beside it. For a forecast point next to forecast point 308, the day of week has a value different from ‘1’, which, according to SHAP values 316 has minimal effect on the forecast. Therefore, any point adjacent to forecast point 308 will not show a decrease in lead-time to the extent shown by forecast point 308.
[0062] The next most important feature in the historical lead time data 318 for forecast point 308, is when the month is equal to 5 (that is, the month of May).
[0063] FIG. 4 illustrates an example 400 in accordance with one embodiment.
[0064] In FIG. 4, the difference in historical and future SHAP values are shown for two adjacent forecast point 308 and forecast point 404. SHAP similarity plot 306 and SHAP values 316 are identical to the corresponding illustration shown in FIG. 3C.
[0065] Forecast point 404 is one day after forecast point 308.
[0066] For forecast point 308, the greatest impact in lowering the forecast lead time to 7.6 days is when the day of the week is - 1, as shown in SHAP values 316. For forecast point 404, the forecast lead time jumps to 22, as shown by SHAP values 406. Furthermore, the day of the week has no impact in lowering the projected lead time. In contrast to forecast point 308, the week of the year set to 19 has the highest impact for forecast point 404. While the drawings are shown on a gray-scale, it is understood that the graphical display will be in colour.
[0067] FIG. 5 illustrates an example 500 in accordance with one embodiment.
[0068] Graph 502 illustrates an example of lead time v. date, showing both historical data 504 and prediction 506. In FIG. 5, prediction point 508 (shown by the arrow, at around July 5) is highlighted. In example 500, the features are: year, month of the year, week of the year, day of the year and season (e.g. O' for winter; T for summer).
[0069] The SHAP values 510 of prediction point 508 indicate that the prediction point 508 has a forecasted lead time of l.oo (output value). The week of the year value of 28 has the greatest impact on the forecast, while the year (2018) is next in impact. The day of the week is next, in terms of impact on the forecast; if the day of the week is other than 5, the resulting forecast of lead time will be higher. Season (with value T) has minimal impact on prediction point 508.
[0070] The impact of each training data point on prediction point 508, is shown by the gradient key 512 of a heuristic function that includes a combination of historical SHAP vector distances and features vector distances, as described above. In FIG. 5, the SHAP scale 322 value is 50, which corresponds to ‘ = 0.5 in EQ. (1). While the drawings are shown on a gray-scale, it is understood that the graphical display will be in colour.
[0071] In FIG. 5, a sliding scale value of 50 (out of 100) (shown by SHAP scale 322) has been used in the evaluation of the heuristic function, which means that features vector distances and historical SHAP vector distances are combined equally in the evaluation of the heuristic function.
[0072] FIG. 6 illustrates a flowchart 600 in accordance with one embodiment.
[0073] Flowchart 600 illustrates another embodiment of machine learning interpretability, in which an influence of a training data point (on a forecast) is provided. Influence is not measured by a SHAP characteristic, but instead, on how removal of that training data point affects the forecast.
[0074] At block 604, training data is used to train a machine learning model. The model is used to make a prediction at block 606. In order to obtain a measure of the influence of each training data point on the prediction, each training data point is removed individually (at block 608) to form a modified or new training data set at block 610; the model is retrained at block 612 on the new data set, and a new prediction is made at block 614. At block 616, results of the prediction (made at block 614) are compared with the results of the prediction made with the full training data set (made at block 606). The comparison may be made in any number of ways known in the art. The removed point is then returned to the training data set at block 618, along with a measure of the influence of the removed data point. Embodiments of the measure of influence are described below.
[0075] If this is not the last data point that has been sampled for removal (decision block 620), then a new training data point is removed at block 622, and the procedure is repeated by using the new training data set at block 610.
[0076] If, on the other hand, there are no more data points to sample for removal, then the method ends at block 624, providing a measure of influence for each training data point.
[0077] If removal of a particular training data point does not result in a change in the resulting amended data forecast, then that particular training data point has no influence on the prediction. The greater the change in the amended data forecast from the full data forecast, the greater the influence of the particular training data point on the forecast. [0078] The measure of influence can be provided to a user in any suitable manner known in the art. In some embodiments, the measure of influence of each training data point is shown visually in graphical form. In some embodiments, the measure of influence of each training data point is shown visually in tabular form.
[0079] FIG. 7 illustrates an example 700 in accordance with one embodiment machine learning interpretability. Flowchart 600 was used to obtain illustrative example 700.
[0080] Historical data 702 (shown by filled circles) of lead times, from about September 1, 2016 to about January 7, 2018, was used to train a machine model, leading to a full data forecast 704.
[0081] In FIG. 7, the historical data point 712 (around March 25, 2017) is removed from the training data set. The revised prediction (based on the removal of historical data point 712) is shown as amended data forecast 706, which is, for the most part, lower than full data forecast 704 throughout the forecast range of about January 8, 2018 to about January 8, 2019. The difference between full data forecast 704 and amended data forecast 706 can be evaluated by known means in the art, and the difference is accorded a difference value for historical data point 712.
[0082] In FIG. 7, all of the remaining training data points (i.e. historical data 702 excluding historical data point 712) have undergone the procedure described above for historical data point 712, and have already been accorded a difference value. This is indicated by the shading of the various points of historical data 702. While the drawings are shown on a gray-scale, it is understood that the graphical display will be in colour. [0083] In FIG. 7, a gradient key 714 is used as a measure to indicate that the lighter the shade of a training data point, the lower its influence on the forecast. As an example, data point 710, which is almost white according to gradient key 714, has minimal influence on the forecast. On the other hand, grouping 708 of data points (around August 1, 2017) are dark, which according to gradient key 714, have a large influence on the forecast.
[0084] If removal of a particular training data point does not result in a change m the resulting amended data forecast, then that particular training data point has no influence on the prediction. The greater the change in the amended data forecast from the full data forecast, the greater the influence of the particular training data point on the forecast.
[0085] A user can glean further information from the colour gradient of historical data 702, by looking for patterns of high-influence data points, or low-influence data points. This can be achieved via a graphical user interface through which the user can select different data points along the historical data 702, and see how the resulting amended data forecast 706 changes relative to the full data forecast 704.
[0086] FIG. 8 illustrates a system 800 in accordance with one embodiment of machine learning interpretability.
[0087] System server 802 comprises a machine learning algorithm, a machine learning interpretability module, and other modules and/or algorithms, including access to a library of SHAP algorithms. Machine learning storage 812 can include training data used for training a machine learning algorithm.
[0088] System 800 includes a system server 802, machine learning storage 812, client data source 822 and one or more devices 814, 816 and 818. System server 802 can include a memory 808, a disk 804, a processor 806 and a network interface 820. While one processor 806 is shown, the system server 802 can comprise one or more processors. In some embodiments, memory 808 can be volatile memory, compared with disk 804 which can be non-volatile memory. In some embodiments, system server 802 can communicate with machine learning storage 812, client data source 822 and one or more external devices 814, 816 and 818 via network 810. While machine learning storage 812 is illustrated as separate from system server 802, machine learning storage 812 can also be integrated into system server 802, either as a separate component within system server 802 or as part of at least one of memory 808 and disk 804.
[0089] System 800 can also include additional features and/or functionality. For example, system 800 can also include additional storage (removable and/or non removable) including, but not limited to, magnetic or optical disks or tape. Such additional storage is illustrated in FIG. 8 by memory 808 and disk 804. Storage media can include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Memory 808 and disk 804 are examples of non-transitory computer-readable storage media. Non-transitory computer-readable media also includes, but is not limited to, Random Access Memory (RAM), Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash memory and/or other memory technology, Compact Disc Read-Only Memory (CD-ROM), digital versatile discs (DVD), and/or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, and/or any other medium which can be used to store the desired information and which can be accessed by system 800. Any such non-transitory computer-readable storage media can be part of system 800.
[0090] Communication between system server 802, machine learning storage 812 and one or more external devices 814, 91 and 818 via network 810 can be over various network types. In some embodiments, the processor 806 may be disposed in communication with network 810 via a network interface 820. The network interface 820 may communicate with the network 810. The network interface 820 may employ connection protocols including, without limitation, direct connect, Ethernet (e.g., twisted pair 10/40/400 Base T), transmission control protocol/internet protocol (TCP/IP), token ring, IEEE 802. lla/b/g/n/x, etc. Non-limitmg example network types can include Fibre Channel, small computer system interface (SCSI), Bluetooth, Ethernet, Wi-fi, Infrared Data Association (IrDA), Local area networks (LAN), Wireless Local area networks (WLAN), wide area networks (WAN) such as the Internet, serial, and universal serial bus (USB). Generally, communication between various components of system 800 may take place over hard-wired, cellular, Wi-Fi or Bluetooth networked components or the like. In some embodiments, one or more electronic devices of system 800 may include cloud-based features, such as cloud-based memory storage.
[0091] Machine learning storage 812 may implement an "in-memory" database, in which volatile (e.g., non-disk-based) storage (e.g., Random Access Memory) is used both for cache memory and for storing the full database during operation, and persistent storage (e.g., one or more fixed disks) is used for offline persistency and maintenance of database snapshots. Alternatively, volatile storage may be used as cache memory for storing recently-used data, while persistent storage stores the full database.
[0092] Machine learning storage 812 may store metadata regarding the structure, relationships and meaning of data. This information may include data defining the schema of database tables stored within the data. A database table schema may specify the name of the database table, columns of the database table, the data type associated with each column, and other information associated with the database table. Machine learning storage 812 may also or alternatively support multi-tenancy by providing multiple logical database systems which are programmatically isolated from one another. Moreover, the data may be indexed and/or selectively replicated in an index to allow fast searching and retrieval thereof. In addition, machine learning storage 812 can store a number of machine learning models that are accessed by the system server 802. A number of ML models can be used.
[0093] In some embodiments where machine learning is used, gradient-boosted trees, ensemble of trees and support vector regression, can be used. In some embodiments of machine learning, one or more clustering algorithms can be used. Non-limiting examples include hierarchical clustering, k-means, mixture models, density-based spatial clustering of applications with noise and ordering points to identify the clustering structure.
[0094] In some embodiments of machine learning, one or more anomaly detection algorithms can be used. Non-limiting examples include local outlier factor.
[0095] In some embodiments of machine learning, neural networks can be used.
[0096] Client data source 822 may provide a variety of raw data from a user, including, but not limited to: point of sales data that indicates the sales record of all of the client's products at every location; the inventory history of all of the client's products at every location; promotional campaign details for all products at all locations, and events that are important/relevant for sales of a client's product at every location.
[0097] Using the network interface 820 and the network 810, the system server 802 may communicate with one or more devices 814, 816 and 818. These devices 814, 816 and 818 may include, without limitation, personal computer(s), server(s), various mobile devices such as cellular telephones, smartphones (e g., Apple iPhone, Blackberry, Android-based phones, etc.), tablet computers, eBook readers (Amazon Kindle, Nook, etc.), laptop computers, notebooks, gaming consoles (Microsoft Xbox, Nintendo DS, Sony PlayStation, etc.), or the like.
[0098] Using network 810, system server 802 can retrieve data from machine learning storage 812 and client data source 822. The retrieved data can be saved in memory 808 or disk 804. In some embodiments, system server 802 also comprise a web server, and can format resources into a format suitable to be displayed on a web browser.
[0099] Once a preliminary machine learning result is provided to any of the one or more devices, a user can amend the results, which are re-sent to machine learning storage 812, for further execution. The results can be amended by either interaction with one or more data files, which are then sent to machine learning storage 812; or through a user interface at the one or more devices 814, 816 and 818. For example, in device 816, a user can amend the results using a graphical user interface. [0100] Although the algorithms described above including those with reference to the foregoing flow charts have been described separately, it should be understood that any two or more of the algorithms disclosed herein can be combined in any combination. Any of the methods, modules, algorithms, implementations, or procedures described herein can include machine-readable instructions for execution by: (a) a processor, (b) a controller, and/or (c) any other suitable processing device. Any algorithm, software, or method disclosed herein can be embodied in software stored on a non-transitory tangible medium such as, for example, a flash memory, a CD-ROM, a floppy disk, a hard drive, a digital versatile disk (DVD), or other memory devices, but persons of ordinary skill in the art will readily appreciate that the entire algorithm and/or parts thereof could alternatively be executed by a device other than a controller and/or embodied in firmware or dedicated hardware in a well-known manner (e g., it may be implemented by an application specific integrated circuit (ASIC), a programmable logic device (PLD), a field programmable logic device (FPLD), discrete logic, etc.). Further, although specific algorithms are described with reference to flowcharts depicted herein, persons of ordinary skill in the art will readily appreciate that many other methods of implementing the example machine readable instructions may alternatively be used. For example, the order of execution of the blocks may be changed, and/or some of the blocks described may be changed, eliminated, or combined.
[0101] It should be noted that the algorithms illustrated and discussed herein as having various modules which perform particular functions and interact with one another. It should be understood that these modules are merely segregated based on their function for the sake of description and represent computer hardware and/or executable software code which is stored on a computer-readable medium for execution on appropriate computing hardware. The various functions of the different modules and units can be combined or segregated as hardware and/or software stored on a non-transitory computer-readable medium as above as modules in any manner and can be used separately or in combination.
[0102] Particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. For example, the actions recited in the claims can be performed in a different order and still achieve desirable results. As one example, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In certain implementations, multitasking and parallel processing may be advantageous.

Claims

CLAIMS What is claimed is:
1. A method comprising: training, by a processor, a regression machine learning model using training data; predicting, by the processor, a prediction based on the trained model; receiving, by a machine learning interpretability module, the training data, the trained model and the prediction; and comparing, by the machine learning interpretability module, characteristics of the training data and the prediction.
2. The method of claim 1, comparing characteristics comprises visualization of the training data, the prediction and the characteristics of the training data and the prediction.
3. The method of claim 1 or claim 2, wherein comparing characteristics comprises: determining, by the machine learning interpretability module, a heuristic function value of each training data point; wherein: the prediction comprises a plurality of predicted data points; and the heuristic function incorporates: SHAP values of each training data point; SHAP values of the predicted data points; features values of the training data points; and features values of the predicted data points.
4. The method of claim 3, wherein the heuristic function comprises a combination of a SHAP distance and a features distance, wherein: the SHAP distance is a Euclidean distance between a SHAP vector of a training data point and a SHAP vector of a predicted data point; the features distance is a Euclidean distance between a features vector of a training data point and a features vector of a predicted data point; the SHAP vector is an ordered sequence of SHAP values of a data point; and the features vector is an ordered sequence of features values of a data point.
5. The method of claim 1 or claim 2, wherein comparing characteristics comprises: determining, by the machine learning interpretability module, SHAP values of one or more points of the prediction; determining, by the machine learning interpretability module, SHAP values of one or more points of the training data; and determining, by the machine learning interpretability module, for each of the one or more points of the prediction, a difference between the SHAP values of the prediction point and the SHAP values of each of the of the one or more points of the training data.
6. The method of claim 5, wherein the difference is a Euclidean distance between a SHAP vector of the prediction point and a SHAP vector of each of the of the one or more points of the training data.
7. The method of claim 1 or claim 2, wherein comparing characteristics comprises: removing, by the machine learning interpretability module, a training data point from the training data to form an amended training data set; retraining, by the machine learning interpretability module, the trained model on the amended training data set; predicting, by the machine learning interpretability module, based on the amended training data set to provide an amended prediction; comparing, by the machine learning interpretability module, a difference between the prediction and the amended prediction; assigning, by the machine learning interpretability module, a measure of influence to the removed training data point, based on the difference.
8. A system comprising: a processor; and a memory storing instructions that, when executed by the processor, configure the system to: train, by a processor, a regression machine learning model using training data; predict, by the processor, a prediction based on the trained model; receive, by a machine learning interpretability module, the training data, the trained model and the prediction; and compare, by the machine learning interpretability module, characteristics of the training data and the prediction.
9. The system of claim 8, further configured to provide a visualization of the training data, the prediction and the characteristics of the training data and the prediction.
10. The system of claim 8 or claim 9, further configured to: determine, by the machine learning interpretability module, a heuristic function value of each training data point; wherein: the prediction comprises a plurality of predicted data points; and the heuristic function incorporates: SHAP values of each train data point; SHAP values of the predicted data points; features values of the training data points; and features values of the predicted data points.
11. The system of claim 10, wherein the heuristic function comprises a combination of a SHAP distance and a features distance, wherein: the SHAP distance is a Euclidean distance between a SHAP vector of a training data point and a SHAP vector of a predicted data point; the features distance is a Euclidean distance between a features vector of a training data point and a features vector of a predicted data point; the SHAP vector is an ordered sequence of SHAP values of a data point; and the features vector is an ordered sequence of features values of a data point.
12. The system of claim 8 or claim 9, further configured to: determine, by the machine learning interpretability module, SHAP values of one or more points of the prediction; determine, by the machine learning interpretability module, SHAP values of one or more points of the training data; and determine, by the machine learning interpretability module, for each of the one or more points of the prediction, a difference between the SHAP values of the prediction point and the SHAP values of each of the of the one or more points of the training data.
13. The system of claim 12, wherein the difference is a Euclidean distance between a SHAP vector of the prediction point and a SHAP vector of each of the of the one or more points of the training data.
14. The system of claim 8 or claim 9, further configured to: remove, by the machine learning interpretability module, a training data point from the training data to form an amended training data set; retrain, by the machine learning interpretability module, the trained model on the amended training data set; predict, by the machine learning interpretability module, based on the amended training data set to provide an amended prediction; compare, by the machine learning interpretability module, a difference between the prediction and the amended prediction; assign, by the machine learning interpretability module, a measure of influence to the removed training data point, based on the difference.
15. A non- transitory computer-readable storage medium, the computer-readable storage medium including instructions that when executed by a computer, cause the computer to: train, by a processor, a regression machine learning model using training data; predict, by the processor, a prediction based on the trained model; receive, by a machine learning interpretability module, the training data, the trained model and the prediction; and compare, by the machine learning interpretability module, characteristics of the training data and the prediction.
16. The computer-readable storage medium of claim 15, wherein instructions that when executed by a computer, further cause the computer to provide visualization of the training data, the prediction and the characteristics of the training data and the prediction.
17. The computer-readable storage medium of claim 15 or claim 16, wherein instructions that when executed by a computer, further cause the computer to: determine, by the machine learning interpretability module, a heuristic function value of each training data point; wherein: the prediction comprises a plurality of predicted data points; and the heuristic function incorporates: SHAP values of each train data point; SHAP values of the predicted data points; features values of the training data points; and features values of the predicted data points.
18. The computer-readable storage medium of claim 17, wherein the heuristic function comprises a combination of a SHAP distance and a features distance, wherein: the SHAP distance is a Euclidean distance between a SHAP vector of a training data point and a SHAP vector of a predicted data point; the features distance is a Euclidean distance between a features vector of a training data point and a features vector of a predicted data point; the SHAP vector is an ordered sequence of SHAP values of a data point; and the features vector is an ordered sequence of features values of a data point.
19. The computer-readable storage medium of claim 15 or claim 16, wherein instructions that when executed by a computer, further cause the computer to: determine, by the machine learning interpretability module, SHAP values of one or more points of the prediction; determine, by the machine learning interpretability module, SHAP values of one or more points of the training data; and determine, by the machine learning interpretability module, for each of the one or more points of the prediction, a difference between the SHAP values of the prediction point and the SHAP values of each of the of the one or more points of the training data.
20. The computer-readable storage medium of claim 19, wherein the difference is a Euclidean distance between a SHAP vector of the prediction point and a SHAP vector of each of the of the one or more points of the training data.
21. The computer-readable storage medium of claim 15 or claim 16, wherein instructions that when executed by a computer, further cause the computer to: remove, by the machine learning interpretability module, a training data point from the training data to form an amended training data set; retrain, by the machine learning interpretability module, the trained model on the amended training data set; predict, by the machine learning interpretability module, based on the amended training data set to provide an amended prediction; compare, by the machine learning interpretability module, a difference between the prediction and the amended prediction; assign, by the machine learning interpretability module, a measure of influence to the removed training data point, based on the difference.
PCT/CA2020/051400 2019-10-19 2020-10-19 Systems and methods for machine learning interpretability WO2021072556A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP20877942.1A EP4046087A4 (en) 2019-10-19 2020-10-19 Systems and methods for machine learning interpretability
JP2022522739A JP2022552980A (en) 2019-10-19 2020-10-19 Systems and methods for machine learning interpretability
CA3155102A CA3155102A1 (en) 2019-10-19 2020-10-19 Systems and methods for machine learning interpretability

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201962923508P 2019-10-19 2019-10-19
US62/923,508 2019-10-19

Publications (1)

Publication Number Publication Date
WO2021072556A1 true WO2021072556A1 (en) 2021-04-22

Family

ID=75492090

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2020/051400 WO2021072556A1 (en) 2019-10-19 2020-10-19 Systems and methods for machine learning interpretability

Country Status (5)

Country Link
US (1) US20210117863A1 (en)
EP (1) EP4046087A4 (en)
JP (1) JP2022552980A (en)
CA (1) CA3155102A1 (en)
WO (1) WO2021072556A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023102667A1 (en) * 2021-12-09 2023-06-15 Kinaxis Inc. Iterative data-driven configuration of optimization methods and systems

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11388040B2 (en) 2018-10-31 2022-07-12 EXFO Solutions SAS Automatic root cause diagnosis in networks
US11645293B2 (en) 2018-12-11 2023-05-09 EXFO Solutions SAS Anomaly detection in big data time series analysis
US11727284B2 (en) * 2019-12-12 2023-08-15 Business Objects Software Ltd Interpretation of machine learning results using feature analysis
US12052134B2 (en) 2021-02-02 2024-07-30 Exfo Inc. Identification of clusters of elements causing network performance degradation or outage
EP4120653A1 (en) * 2021-07-15 2023-01-18 EXFO Inc. Communication network performance and fault analysis using learning models with model interpretation
CN113723618B (en) * 2021-08-27 2022-11-08 南京星环智能科技有限公司 SHAP optimization method, equipment and medium
CN116205310B (en) * 2023-02-14 2023-08-15 中国水利水电科学研究院 Soil water content influence factor sensitive interval judging method based on interpretable integrated learning model
CN117094123B (en) * 2023-07-12 2024-06-11 广东省科学院生态环境与土壤研究所 Soil carbon fixation driving force identification method, device and medium based on interpretable model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060167654A1 (en) * 2003-01-07 2006-07-27 Alon Keinan Identification of effective elements in complex systems
US20170046460A1 (en) * 2015-04-14 2017-02-16 Ptc Inc. Scoring a population of examples using a model
US20170249547A1 (en) * 2016-02-26 2017-08-31 The Board Of Trustees Of The Leland Stanford Junior University Systems and Methods for Holistic Extraction of Features from Neural Networks
US20170330109A1 (en) * 2016-05-16 2017-11-16 Purepredictive, Inc. Predictive drift detection and correction
US20190156216A1 (en) * 2017-11-17 2019-05-23 Adobe Inc. Machine learning model interpretation

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10510022B1 (en) * 2018-12-03 2019-12-17 Sas Institute Inc. Machine learning model feature contribution analytic system
US11531915B2 (en) * 2019-03-20 2022-12-20 Oracle International Corporation Method for generating rulesets using tree-based models for black-box machine learning explainability
US11120218B2 (en) * 2019-06-13 2021-09-14 International Business Machines Corporation Matching bias and relevancy in reviews with artificial intelligence
US11568212B2 (en) * 2019-08-06 2023-01-31 Disney Enterprises, Inc. Techniques for understanding how trained neural networks operate

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060167654A1 (en) * 2003-01-07 2006-07-27 Alon Keinan Identification of effective elements in complex systems
US20170046460A1 (en) * 2015-04-14 2017-02-16 Ptc Inc. Scoring a population of examples using a model
US20170249547A1 (en) * 2016-02-26 2017-08-31 The Board Of Trustees Of The Leland Stanford Junior University Systems and Methods for Holistic Extraction of Features from Neural Networks
US20170330109A1 (en) * 2016-05-16 2017-11-16 Purepredictive, Inc. Predictive drift detection and correction
US20190156216A1 (en) * 2017-11-17 2019-05-23 Adobe Inc. Machine learning model interpretation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LUNDBERG SCOTT, LEE SU-IN: "A Unified Approach to Interpreting Model Predictions", ARXIV.ORG, 25 November 2017 (2017-11-25), pages 1 - 10, XP081403747 *
See also references of EP4046087A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023102667A1 (en) * 2021-12-09 2023-06-15 Kinaxis Inc. Iterative data-driven configuration of optimization methods and systems

Also Published As

Publication number Publication date
US20210117863A1 (en) 2021-04-22
JP2022552980A (en) 2022-12-21
EP4046087A1 (en) 2022-08-24
CA3155102A1 (en) 2021-04-22
EP4046087A4 (en) 2024-02-07

Similar Documents

Publication Publication Date Title
US20210117863A1 (en) Systems and methods for machine learning interpretability
US9239986B2 (en) Assessing accuracy of trained predictive models
US11036684B2 (en) Columnar database compression
US20120284212A1 (en) Predictive Analytical Modeling Accuracy Assessment
US20220292308A1 (en) Systems and methods for time series modeling
US11995520B2 (en) Efficiently determining local machine learning model feature contributions
CN104077303B (en) Method and apparatus for data to be presented
US20240152498A1 (en) Data storage using vectors of vectors
US20210110298A1 (en) Interactive machine learning
CN110069676A (en) Keyword recommendation method and device
JP2020144493A (en) Learning model generation support device and learning model generation support method
WO2021072537A1 (en) Interactive machine learning
JP2020098388A (en) Demand prediction method, demand prediction program, and demand prediction device
US9223871B2 (en) System and method for automatic wrapper induction using target strings
WO2020056286A1 (en) System and method for predicting average inventory with new items
CN107562533A (en) A kind of data loading processing method and device
Meena et al. Product recommendation system using distance measure of product image features
JP2017151731A (en) Demand amount prediction program, demand amount prediction method, and information processor
US20180039677A1 (en) Data searching apparatus
US11886514B2 (en) Machine learning segmentation methods and systems
US20240211835A1 (en) Automatic and Dynamic Adaptation of Hierarchical Reconciliation for Time Series Forecasting
EP4155970A1 (en) System and method for data management
US20240193462A1 (en) Category classification system for feature contribution scores
JP6972641B2 (en) Information processing equipment and information processing programs
CN115186143A (en) Cross-modal retrieval method and device based on low-rank learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20877942

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
ENP Entry into the national phase

Ref document number: 2022522739

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 3155102

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2020877942

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2020877942

Country of ref document: EP

Effective date: 20220519