IL295124A

IL295124A - Computerized analysis of a trained machine learning model

Info

Publication number: IL295124A
Application number: IL295124A
Authority: IL
Original assignee: Citrusx Ltd
Priority date: 2022-07-27
Filing date: 2022-07-27
Publication date: 2024-02-01
Also published as: WO2024023819A1; EP4562562A4; EP4562562A1; JP2025524174A; CA3263333A1

Description

COMPUTERIZED ANALYSIS OF A TRAINED MACHINE LEARNING MODEL TECHNICAL FIELD The presently disclosed subject matter relates to the field of machine learning models. BACKGROUND In various technical fields (e.g., biological or medical field, business field, physics field, statistics field, etc.), a machine learning model is trained to model a phenomenon. Generally, the machine learning is trained to predict, based on an input vector informative of one or more features, an output vector informative of one or more features. Once the machine learning model has been trained, a complex model is obtained, which generally involves a plurality of weights and layers. A technical challenge resides in the fact that it is cumbersome to understand whether the machine learning model has been adequately trained, and how to improve this training. In addition, the behavior of the machine learning model, after its training, is difficult to understand. There is therefore a need to provide new systems and methods which enable a computerized analysis of a trained machine learning model. GENERAL DESCRIPTION In accordance with certain aspects of the presently disclosed subject matter, there is provided a method comprising, by a processor and memory circuitry (PMC): obtaining, for a machine learning model, a set of data points, informative of a set of input vectors and of a set of predicted vectors, wherein each input vector of the set of input vectors has been used to train the machine learning model, wherein each given data point of the set of data points is associated with a given input vector of the set of input vectors and a given predicted vector of the set of predicted vectors, wherein the given predicted vector has been generated by the machine learning model using the given input vector, using the set of data points to generate a database informative of the machine learning model, wherein the database is informative of a plurality of nodes comprising terminal nodes, wherein each given terminal node is associated with a plurality of data points of the set of data points, one or more coefficients defining a function, wherein the function fits a relationship between a plurality of input vectors of the plurality of data points of the given terminal node, and a plurality of predicted vectors of the plurality of data points of the given terminal node, with a quality meeting an accuracy criterion, wherein the database is usable to generate data informative of the machine learning model. In addition to the above features, the method according to this aspect of the presently disclosed subject matter can optionally comprise one or more of features (i) to (xliii) below, in any technically possible combination or permutation: i. the method comprises using the database for determining data informative of a quality of the training of the machine learning model; ii. the method comprises using the database for determining a certainty of the machine learning model in its prediction; iii. the method comprises using the database for determining a certainty of a prediction generated by the machine learning model depending on a range of values of input vectors fed to the machine learning model; iv. the method comprises using the database for providing a recommendation of whether the machine learning model has to be retrained; v. the method comprises using the database for providing a recommendation on a range of values of input vectors for which the machine learning model has to be retrained; vi. the method comprises using the database for determining one or more input variables of the input vectors which most impact a prediction generated by the machine learning model; vii. the method comprises using the database for determining a change in one or more values of a given input vector to obtain a prediction, by the machine learning model, which matches a desired predicted vector; viii. the method comprises using the database for a given input vector, estimating a prediction which would have been generated by the machine learning model using this given input vector; ix. the method comprises using the database for determining data informative of a bias in a prediction of the machine learning model; x. each input vector comprises one or more values for one or more input variables and each predicted vector comprises one or more values for one or more output variables, wherein, for a given terminal node, the function defines a relationship between the input variables and the output variables, determined using the plurality of input vectors of the given terminal node and the plurality of predicted vectors of the given terminal node; xi. for a function of at least one terminal node, each given input variable is associated with data D stat_sign informative of a statistical significance of the input variable; xii. for a function of at least one terminal node, each input variable is associated with a coefficient in the function; xiii. the method comprises, for one or more of the input variables, using a magnitude of a coefficient associated with the input variable to determine data informative of an impact of the given input variable on a prediction of the machine learning model; xiv. each terminal node is associated with a first set of boundaries u0001u0002u0003u0004u0005u0006u0007bt_u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space in which all of the plurality of input vectors of the data points of the terminal node are located, wherein the method comprises, for one or more of the input variables, using a magnitude of a coefficient associated with the input variable to determine data informative of an impact of the given input variable on a prediction of the machine learning model in a range of input values located within the first set of boundaries; xv. the method comprises using coefficients associated with the input variables to determine whether the machine learning model relies more on a limited subset of one or more input variables than one or more input variables which are not part of the subset to generate its prediction; xvi. the function is determined using a regression analysis; xvii. the function is a linear function; xviii. at least one terminal node is further associated with one or more data points, each of these data points including an input vector which have not been used to train the machine learning model and a predicted vector generated by the machine learning model using the input vector; xix. each given terminal node is associated with a first set of boundaries u0001u0002u0003u0004u0005u0006u0007bt_u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space in which all of the plurality of input vectors of all data points associated with the given terminal node are located; xx. the plurality of nodes includes non-terminal nodes and terminal nodes, wherein at least one non-terminal node is linked to one or more child nodes, each child node being either a non-terminal node or a terminal node, said at least one non-terminal node being associated with a first set of boundaries u0001u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space in which all of the plurality of input vectors of all data points of said one or more child nodes are located; xxi. any data point including an input vector of the set of input vectors and a corresponding predicted vector of the set of predicted vectors is associated with a single terminal node of the database; xxii. the plurality of nodes is arranged in hierarchical levels u0010u0006, with i from 1 to N, wherein each node of level u0010u0011 is linked to a parent node of level u0010u0011u0012u0013, with j from 2 to N, wherein each node is associated with a first set of boundaries u0001u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space including input vectors of all data points associated with this node, wherein, for each given node linked to a parent node, a space defined by the first set of boundaries of the given node is included within the space defined by the first set of boundaries of the parent node; xxiii. generating the database comprises: (1) obtaining a given node associated with a plurality of data points of the set of data points, (2) determining coefficients defining a function which fits a relationship between the plurality of input vectors and the plurality of predicted vectors of the data points of the given node, (3) determining whether the function fits the relationship between the plurality of input vectors and the plurality of predicted vectors of the data points of the given node with a quality meeting an accuracy criterion, (4) for a function for which said quality does not meet the accuracy criterion: generating N child nodes linked to the given node, with N≥1, each given child node being associated with a given fraction of the data points of the parent node, and repeating (1) to (3) for each given child node, wherein said repeating of (1) to (3) for each given child node comprises using the given child node as the given node in (1) to (3); xxiv. upon determination at (3) that the function fits the relationship between the plurality of input vectors and the plurality of predicted vectors of the data points of the given node with a quality meeting the accuracy criterion, the method comprises storing the given node as a terminal node in the database; xxv. for a given child node being associated with a given number M1 of data points which is below a threshold, the method comprises obtaining one or more additional input vectors which have not been used to train the machine learning model, using the one or more additional input vectors to generate, by the machine learning model, additional predicted vectors, and associating data points including the additional input vectors and the additional predicted vectors with the given child node; xxvi. the given number M 1 of data points only include input vectors which have been used to train the machine learning model; xxvii. the given child node is associated with a first set of boundaries u0001u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space including the M 1 input vectors of the data points associated with the given child node, wherein the one or more additional input vectors are selected within the first set of boundaries u0001u0007u000bfu0003_u0006u0007ru000eu0002; xxviii. each input vector comprises one or more values for one or more input variables, each terminal node of the plurality of terminal nodes of the database is associated with a first set of boundaries u0001u0002u0003u0004u0005u0006u0007bt_u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space in which all values of the plurality of input vectors of the data points of said terminal node are located, wherein the method comprises: obtaining a data point comprising a first vector comprising one or more values for the one or more input variables, and using the database to determine a given terminal node for which the first vector is located within the space defined by the first set of boundaries u0001u0002u0003u0004u0005u0006u0007bt_u0007u000bfu0003_u0006u0007ru000eu0002 of said given terminal node; xxix. each input vector comprises one or more values for one or more input variables, each predicted vector comprises one or more values for one or more output variables, each terminal node of the plurality of terminal nodes of the database is associated with a first set of boundaries u0001u0002u0003u0004u0005u0006u0007bt_u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space in which all values of the plurality of input vectors of all data points of said terminal node are located, wherein the method comprises: obtaining a first vector comprising one or more values for the one or more input variables, using the database to determine a given terminal node for which the first vector is located within the space defined by the first set of boundaries u0001u0002u0003u0004u0005u0006u0007bt_u0007u000bfu0003_u0006u0007ru000eu0002 of said given terminal node, and using the function associated with said given terminal node and the first vector to determine a second vector comprising one or more values for the one or more output variables; xxx. each input vector is informative of u0014 input variables, wherein the method comprises, for at least one terminal node, obtaining a number N 1 of data points associated with the terminal node which include an input vector which has been used to train the machine learning model, and using N and u0014 to determine data D certainty informative of the certainty of the machine learning model associated with the terminal node; xxxi. each terminal node of the plurality of terminal nodes of the database is associated with a first set of boundaries u0001u0002u0003u0004u0005u0006u0007bt_u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space in which all values of the plurality of input vectors of all data points of said terminal node are located, wherein, the method comprises determining a lack of certainty of the machine learning model within the first set of boundaries u0001u0002u0003u0004u0005u0006u0007bt_u0007u000bfu0003_u0006u0007ru000eu0002; xxxii. the method comprises obtaining, for a first machine learning model, a first set of data points, informative of a first set of input vectors and of a first set of predicted vectors, wherein each input vector of the first set of input vectors has been used to train the machine learning model, wherein each given data point of the first set of data points is associated with a given input vector of the first set of input vectors, and a given predicted vector of the first set of predicted vectors, wherein the given predicted vector has been generated by the machine learning model using the given input vector, using the first set of data points to generate a first database informative of the first machine learning model, wherein the first database is informative of a plurality of nodes comprising terminal nodes, wherein, for at least part of the terminal nodes, each given terminal node is associated with a plurality of data points of the first set of data points, one or more coefficients defining a function, wherein the function fits a relationship between a plurality of input vectors of the plurality of data points of the given terminal node, and a plurality of predicted vectors of the plurality of data points of the given terminal node, with a quality meeting an accuracy criterion, obtaining, for a second machine learning model, a second set of data points, informative of a second set of input vectors and of a second set of predicted vectors, wherein each input vector of the second set of input vectors has been used to train the machine learning model, wherein each given data point of the second set of data points is associated with a given input vector of the second set of input vectors, and a given predicted vector of the second set of predicted vectors, wherein the given predicted vector has been generated by the machine learning model using the given input vector, using the second set of data points to generate a second database informative of the second machine learning model, wherein the second database is informative of a plurality of nodes comprising terminal nodes, wherein, for at least part of the terminal nodes, each given terminal node is associated with a plurality of data points of the second set of data points, one or more coefficients defining a function, wherein the function fits a relationship between a plurality of input vectors of the plurality of data points of the given terminal node, and a plurality of predicted vectors of the plurality of data points of the given terminal node, with a quality meeting an accuracy criterion, using the first database and the second database to compare data informative of the first machine learning model with data informative of the second machine learning model; xxxiii. at least some of the second input vectors used to train the second machine learning model are different from the first input vectors used to train the first machine learning model; xxxiv. the second machine learning model corresponds to the first machine learning model after a retraining using second input vectors different from the first input vectors; xxxv. an architecture of the second machine learning model differs from an architecture of the first machine learning model; xxxvi. the method comprises determining, for a given input vector, a first terminal node of the first database for which a first set of boundaries of the first terminal node contains the given input vector, determining, for the given input vector, a second terminal node of the first database for which a first set of boundaries of the second terminal node contains the given input vector, and comparing data associated with the first terminal node with data associated with the second terminal node; xxxvii. the method comprises comparing data informative of a certainty of the first machine learning model associated with the first terminal node with data informative of a certainty of the second machine learning model associated with the second terminal node; xxxviii. the method comprises comparing a goodness-of-fit measure associated with the function of the first terminal node with a goodness-of-fit measure associated with the function of the second terminal node; xxxix. the method comprises comparing a level of the first terminal node within the first database with a level of the second terminal node within the second database; xl. the method comprises comparing, for at least one input variable, data D stat_sign informative of a statistical significance of the input variable in the first terminal node with data D stat_sign informative of a statistical significance of the input variable in the second terminal node, xli. the method comprises comparing, for at least one input variable, a magnitude of a coefficient associated with this input variable for the first terminal node with a magnitude of a coefficient associated with this input variable for the second terminal node; xlii. the method comprises determining, for a given input vector, a first terminal node of the first database for which a first set of boundaries of the first terminal node contains the given input vector, wherein, when the second database does not include any terminal node which has a first set of boundaries defining a space including the given input vector, outputting alerting data; and xliii. each input vector comprises one or more values for one or more input variables, each predicted vector comprises one or more values for one or more output variables, wherein the method comprises obtaining a first input vector comprising one or more values for the one or more input variables, wherein the machine learning model is operative to generate a first predicted vector by using the first input vector, wherein the first predicted vector comprises one or more values for the one or more output variables, obtaining a desired predicted vector which comprises one or more desired values for the one or more output variables, wherein the desired values are different from the values of the first predicted vector, using the database to determine a modification of the one or more input values of the first input vector, to obtain a modified first input vector, wherein the machine learning model generates, based on the modified first input vector, an output vector matching the desired predicted vector according to a matching criterion. In accordance with certain aspects of the presently disclosed subject matter, there is provided a system comprising a processor and memory circuitry (PMC) configured to perform operations as described with reference to the method above. In accordance with certain aspects of the presently disclosed subject matter, there is provided a non-transitory storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform operations as described with reference to the method above. In accordance with certain aspects of the presently disclosed subject matter, there is provided a database informative of a machine learning model, wherein the machine learning model is associated with a set of data points, informative of a set of input vectors and of a set of predicted vectors, wherein each input vector of the set of input vectors has been used to train the machine learning model, wherein each given data point of the set of data points is associated with a given input vector of the set of input vectors, a given predicted vector of the set of predicted vectors, wherein the given predicted vector has been generated by the machine learning model using the given input vector, wherein the database is informative of a plurality of nodes comprising terminal nodes, wherein each given terminal node is associated with a plurality of data points of the set of data points, one or more coefficients defining a function, wherein the function fits a relationship between a plurality of input vectors of the plurality of data points of the given terminal node, and a plurality of predicted vectors of the plurality of data points of the given terminal node, with a quality meeting an accuracy criterion, wherein the database is usable to generate data informative of the machine learning model. In addition to the above features, the database according to this aspect of the presently disclosed subject matter can optionally comprise one or more of features (xliv) to (lxi) below, in any technically possible combination or permutation: xliv. at least one terminal node is further associated with one or more data points, each of these data points including an input vector which have not been used to train the machine learning model and a predicted vector generated by the machine learning model using the input vector; xlv. each given terminal node is associated with a first set of boundaries u0001u0002u0003u0004u0005u0006u0007bt_u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space in which all of the plurality of input vectors of all data points associated with the given terminal node are located; xlvi. for a function of at least one terminal node, each given input variable is associated with data Dstat_sign informative of a statistical significance of the input variable; xlvii. the plurality of nodes includes non-terminal nodes and terminal nodes, wherein at least one non-terminal node is linked to one or more child nodes, each child node being either a non-terminal node or a terminal node, said at least one non-terminal node being associated with a first set of boundaries u0001u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space in which all of the plurality of input vectors of all data points of said at least one or more child nodes are located; xlviii. any data point including an input vector of the set of input vectors and a corresponding predicted vector of the set of predicted vectors is associated with a single terminal node of the database; xlix. the plurality of nodes is arranged in hierarchical levels u0010u0006, with i from 1 to N, wherein each node of level u0010u0011 is linked to a parent node of level u0010u0011u0012u0013, with j from 2 to N, wherein each node is associated with a first set of boundaries u0001u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space including input vectors of all data points associated with this node, wherein, for each given node linked to a parent node, a space defined by the first set of boundaries of the given node is included within the space defined by the first set of boundaries of the parent node; l. each input vector is informative of u0014 input variables, wherein the database includes, for at least one terminal node, data D certainty informative of the certainty of the machine learning model associated with the terminal node, wherein Dcertainty is obtained using a number N1 of data points associated with the terminal node which include an input vector which has been used to train the machine learning model, and u0014; li. the database is suitable for determining data informative of a quality of the training of the machine learning model; lii. the database is suitable for determining a certainty of the machine learning model in its prediction; liii. the database is suitable for determining a certainty of a prediction generated by the machine learning model depending on a range of values of input vectors fed to the machine learning model; liv. the database is suitable for providing a recommendation of whether the machine learning model has to be retrained; lv. the database is suitable for providing a recommendation on a range of values of input vectors for which the machine learning model has to be retrained; lvi. the database is suitable for determining one or more input variables of the input vectors which most impact a prediction generated by the machine learning model; lvii. the database is suitable for determining a change in one or more values of a given input vector to obtain a prediction, by the machine learning model, which matches a desired predicted vector; lviii. the database is suitable for a given input vector, estimating a prediction which would have been generated by the machine learning model using this given input vector; lix. the database is suitable for determining data informative of a bias in a prediction of the machine learning model; lx. the database is generated using one or more of the methods described above; and lxi. the database is used according to one or more of the methods described above. According to some embodiments, the proposed solution provides a breakthrough in the field of computer-related technology, and in particular, in the field of computer-implemented machine learning models (supervised learning). According to some embodiments, the proposed solution enables to automatically understand whether a machine learning model has been adequately trained. According to some embodiments, the proposed solution enables to automatically understand in which areas of the input values the machine learning model is less accurate (or more accurate) in its prediction. According to some embodiments, the proposed solution enables to automatically assess the certainty of the machine learning model in its prediction. According to some embodiments, the proposed solution can indicate for which input vector the machine learning model tends to underperform, or to be less accurate. According to some embodiments, the proposed solution enables to automatically point out which data should be used to retrain the machine learning model, thereby improving efficiency of the training and performance of the trained machine learning model. According to some embodiments, the proposed solution enables to build a database which can be used to determine, based on an input vector, predicted values, in a quicker way than the machine learning model itself. In some embodiments, the prediction generated using the database is more accurate than the machine learning model itself. According to some embodiments, the proposed solution enables a user to understand which modification should be performed to an input vector to obtain a desired predicted vector by the machine learning model. According to some embodiments, the proposed solution indicates the most optimal modification (e.g., which requires the smallest changes in the input vector) to be applied to the input vector in order to obtain the desired predicted vector.

According to some embodiments, the proposed solution generates a database modelling the behavior of the machine learning model, wherein the database is more flexible and is more simple to query than the machine learning model. According to some embodiments, the proposed solution can compare the performance of two machine learning models, and can compare the performance depending on the input values fed to the two machine learning models. According to some embodiments, the proposed solution can compare the performance of a machine learning model before a retraining of the machine learning model and after a retraining of the machine learning model. According to some embodiments, the proposed solution can indicate which features of the input vector have more impact on the output generated by the machine learning model than other features of the input vector. According to some embodiments, the proposed solution can indicate whether there is a bias in the prediction of the machine learning model.

Claims

1. CLAIMS 1. A method comprising, by a processor and memory circuitry (PMC): - obtaining, for a machine learning model, a set of data points, informative of a set of input vectors and of a set of predicted vectors, wherein each input vector of the set of input vectors has been used to train the machine learning model, wherein each given data point of the set of data points is associated with: o a given input vector of the set of input vectors, o a given predicted vector of the set of predicted vectors, wherein the given predicted vector has been generated by the machine learning model using the given input vector, - using the set of data points to generate a database informative of the machine learning model, wherein the database is informative of a plurality of nodes comprising terminal nodes, wherein each given terminal node is associated with:  a plurality of data points of the set of data points,  one or more coefficients defining a function, wherein the function fits a relationship between:  a plurality of input vectors of the plurality of data points of the given terminal node, and  a plurality of predicted vectors of the plurality of data points of the given terminal node, with a quality meeting an accuracy criterion, wherein the database is usable to generate data informative of the machine learning model.

2. The method of claim 1, comprising using the database for at least one of: (i) determining data informative of a quality of the training of the machine learning model; or (ii) determining a certainty of the machine learning model in its prediction; or (iii) determining a certainty of a prediction generated by the machine learning model depending on a range of values of input vectors fed to the machine learning model; or (iv) providing a recommendation of whether the machine learning model has to be retrained; or (v) providing a recommendation on a range of values of input vectors for which the machine learning model has to be retrained; or (vi) determining one or more input variables of the input vectors which most impact a prediction generated by the machine learning model; or (vii) determining a change in one or more values of a given input vector to obtain a prediction, by the machine learning model, which matches a desired predicted vector; or (viii) for a given input vector, estimating a prediction which would have been generated by the machine learning model using this given input vector; or (ix) determining data informative of a bias in a prediction of the machine learning model.

3. The method of claim 1 or of claim 2, wherein each input vector comprises one or more values for one or more input variables and each predicted vector comprises one or more values for one or more output variables, wherein, for a given terminal node, the function defines a relationship between the input variables and the output variables, determined using the plurality of input vectors of the given terminal node and the plurality of predicted vectors of the given terminal node.

4. The method of claim 3, wherein, for a function of at least one terminal node, each given input variable is associated with data Dstat_sign informative of a statistical significance of the input variable.

5. The method of claim 3 or of claim 4, wherein, for a function of at least one terminal node, each input variable is associated with a coefficient in the function.

6. The method of any one of claims 3 to 5, comprising, for one or more of the input variables, using a magnitude of a coefficient associated with the input variable to determine data informative of an impact of the given input variable on a prediction of the machine learning model.

7. The method of any one of claims 3 to 6, wherein each terminal node is associated with a first set of boundaries u0001u0002u0003u0004u0005u0006u0007bt_u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space in which all of the plurality of input vectors of the data points of the terminal node are located, wherein the method comprises, for one or more of the input variables, using a magnitude of a coefficient associated with the input variable to determine data informative of an impact of the given input variable on a prediction of the machine learning model in a range of input values located within the first set of boundaries.

8. The method of any one of claims 3 to 7, comprising using coefficients associated with the input variables to determine whether the machine learning model relies more on a limited subset of one or more input variables than one or more input variables which are not part of the subset to generate its prediction.

9. The method of any one of claims 1 to 8, wherein at least one of (i) or (ii) is met: (i) the function is determined using a regression analysis, or (ii) the function is a linear function.

10. The method of any one of claims 1 to 9, wherein at least one terminal node is further associated with one or more data points, each of these data points including an input vector which have not been used to train the machine learning model and a predicted vector generated by the machine learning model using the input vector.

11. The method of any one of claims 1 to 10, wherein each given terminal node is associated with a first set of boundaries u0001u0002u0003u0004u0005u0006u0007bt_u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space in which all of the plurality of input vectors of all data points associated with the given terminal node are located.

12. The method of any one of claims 1 to 11, wherein the plurality of nodes includes non-terminal nodes and terminal nodes, wherein at least one non-terminal node is linked to one or more child nodes, each child node being either a non-terminal node or a terminal node, said at least one non-terminal node being associated with a first set of boundaries u0001u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space in which all of the plurality of input vectors of all data points of said one or more child nodes are located.

13. The method of any one of claims 1 to 12, wherein any data point including an input vector of the set of input vectors and a corresponding predicted vector of the set of predicted vectors is associated with a single terminal node of the database.

14. The method of any one of claims 1 to 13, wherein the plurality of nodes is arranged in hierarchical levels u0010u0006, with i from 1 to N, wherein each node of level u0010u0011 is linked to a parent node of level u0010u0011u0012u0013, with j from 2 to N, wherein: each node is associated with a first set of boundaries u0001u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space including input vectors of all data points associated with this node, wherein, for each given node linked to a parent node, a space defined by the first set of boundaries of the given node is included within the space defined by the first set of boundaries of the parent node.

15. The method of any one of claims 1 to 14, wherein generating the database comprises: (1) obtaining a given node associated with a plurality of data points of the set of data points, (2) determining coefficients defining a function which fits a relationship between the plurality of input vectors and the plurality of predicted vectors of the data points of the given node, (3) determining whether the function fits the relationship between the plurality of input vectors and the plurality of predicted vectors of the data points of the given node with a quality meeting an accuracy criterion, (4) for a function for which said quality does not meet the accuracy criterion: generating N child nodes linked to the given node, with N≥1, each given child node being associated with a given fraction of the data points of the parent node, and repeating (1) to (3) for each given child node, wherein said repeating of (1) to (3) for each given child node comprises using the given child node as the given node in (1) to (3).

16. The method of claim 15, wherein, upon determination at (3) that the function fits the relationship between the plurality of input vectors and the plurality of predicted vectors of the data points of the given node with a quality meeting the accuracy criterion, the method comprises storing the given node as a terminal node in the database.

17. The method of claim 15 or of claim 16, wherein, for a given child node being associated with a given number M1 of data points which is below a threshold: obtaining one or more additional input vectors which have not been used to train the machine learning model, using the one or more additional input vectors to generate, by the machine learning model, additional predicted vectors, and associating data points including the additional input vectors and the additional predicted vectors with the given child node.

18. The method of claim 17, wherein the given number M1 of data points only include input vectors which have been used to train the machine learning model.

19. The method of claim 17 or of claim 18, wherein the given child node is associated with a first set of boundaries u0001u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space including the M 1 input vectors of the data points associated with the given child node, wherein the one or more additional input vectors are selected within the first set of boundaries u0001u0007u000bfu0003_u0006u0007ru000eu0002.

20. The method of any one of claims 1 to 19, wherein: each input vector comprises one or more values for one or more input variables, each terminal node of the plurality of terminal nodes of the database is associated with a first set of boundaries u0001u0002u0003u0004u0005u0006u0007bt_u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space in which all values of the plurality of input vectors of the data points of said terminal node are located, wherein the method comprises: obtaining a data point comprising a first vector comprising one or more values for the one or more input variables, and using the database to determine a given terminal node for which the first vector is located within the space defined by the first set of boundaries u0001u0002u0003u0004u0005u0006u0007bt_u0007u000bfu0003_u0006u0007ru000eu0002 of said given terminal node.

21. The method of any one of claims 1 to 20, wherein: each input vector comprises one or more values for one or more input variables, each predicted vector comprises one or more values for one or more output variables, each terminal node of the plurality of terminal nodes of the database is associated with: a first set of boundaries u0001u0002u0003u0004u0005u0006u0007bt_u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space in which all values of the plurality of input vectors of all data points of said terminal node are located, wherein the method comprises: obtaining a first vector comprising one or more values for the one or more input variables, using the database to determine a given terminal node for which the first vector is located within the space defined by the first set of boundaries u0001u0002u0003u0004u0005u0006u0007bt_u0007u000bfu0003_u0006u0007ru000eu0002 of said given terminal node, and using the function associated with said given terminal node and the first vector to determine a second vector comprising one or more values for the one or more output variables.

22. The method of any one of claims 1 to 21, wherein each input vector is informative of u0014 input variables, wherein the method comprises, for at least one terminal node, obtaining a number N1 of data points associated with the terminal node which include an input vector which has been used to train the machine learning model, and using N 1 and u0014 to determine data D certainty informative of the certainty of the machine learning model associated with the terminal node.

23. The method of claim 22, wherein each terminal node of the plurality of terminal nodes of the database is associated with a first set of boundaries u0001u0002u0003u0004u0005u0006u0007bt_u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space in which all values of the plurality of input vectors of all data points of said terminal node are located, wherein, the method comprises determining a lack of certainty of the machine learning model within the first set of boundaries u0001u0002u0003u0004u0005u0006u0007bt_u0007u000bfu0003_u0006u0007ru000eu0002.

24. The method of any one of claims 1 to 23, comprising: - obtaining, for a first machine learning model, a first set of data points, informative of a first set of input vectors and of a first set of predicted vectors, wherein each input vector of the first set of input vectors has been used to train the machine learning model, wherein each given data point of the first set of data points is associated with: o a given input vector of the first set of input vectors, and o a given predicted vector of the first set of predicted vectors, wherein the given predicted vector has been generated by the machine learning model using the given input vector, - using the first set of data points to generate a first database informative of the first machine learning model, wherein the first database is informative of a plurality of nodes comprising terminal nodes, wherein, for at least part of the terminal nodes, each given terminal node is associated with:  a plurality of data points of the first set of data points,  one or more coefficients defining a function, wherein the function fits a relationship between:  a plurality of input vectors of the plurality of data points of the given terminal node, and  a plurality of predicted vectors of the plurality of data points of the given terminal node, with a quality meeting an accuracy criterion, - obtaining, for a second machine learning model, a second set of data points, informative of a second set of input vectors and of a second set of predicted vectors, wherein each input vector of the second set of input vectors has been used to train the machine learning model, wherein each given data point of the second set of data points is associated with: o a given input vector of the second set of input vectors, and o a given predicted vector of the second set of predicted vectors, wherein the given predicted vector has been generated by the machine learning model using the given input vector, - using the second set of data points to generate a second database informative of the second machine learning model, wherein the second database is informative of a plurality of nodes comprising terminal nodes, wherein, for at least part of the terminal nodes, each given terminal node is associated with:  a plurality of data points of the second set of data points,  one or more coefficients defining a function, wherein the function fits a relationship between:  a plurality of input vectors of the plurality of data points of the given terminal node, and  a plurality of predicted vectors of the plurality of data points of the given terminal node, with a quality meeting an accuracy criterion, - using the first database and the second database to compare data informative of the first machine learning model with data informative of the second machine learning model.

25. The method of claim 24, wherein at least one of (i) or (ii) or (iii) is met: (i) at least some of the second input vectors used to train the second machine learning model are different from the first input vectors used to train the first machine learning model; (ii) the second machine learning model corresponds to the first machine learning model after a retraining using second input vectors different from the first input vectors; (iii) an architecture of the second machine learning model differs from an architecture of the first machine learning model.

26. The method of claim 24 or of claim 25, comprising: determining, for a given input vector, a first terminal node of the first database for which a first set of boundaries of the first terminal node contains the given input vector, determining, for the given input vector, a second terminal node of the first database for which a first set of boundaries of the second terminal node contains the given input vector, and comparing data associated with the first terminal node with data associated with the second terminal node.

27. The method of claim 26, comprising comparing data informative of a certainty of the first machine learning model associated with the first terminal node with data informative of a certainty of the second machine learning model associated with the second terminal node.

28. The method of claim 26 or of claim 27, comprising comparing a goodness-of-fit measure associated with the function of the first terminal node with a goodness- of-fit measure associated with the function of the second terminal node.

29. The method of any one of claims 26 to 28, comprising comparing a level of the first terminal node within the first database with a level of the second terminal node within the second database.

30. The method of any one of claims 26 to 29, comprising at least one of (i) or (ii): (i) comparing, for at least one input variable, data D stat_sign informative of a statistical significance of the input variable in the first terminal node with data D stat_sign informative of a statistical significance of the input variable in the second terminal node, or (ii) comparing, for at least one input variable, a magnitude of a coefficient associated with this input variable for the first terminal node with a magnitude of a coefficient associated with this input variable for the second terminal node.

31. The method of any one of claims 24 to 30, comprising: determining, for a given input vector, a first terminal node of the first database for which a first set of boundaries of the first terminal node contains the given input vector, wherein, when the second database does not include any terminal node which has a first set of boundaries defining a space including the given input vector, outputting alerting data.

32. The method of any one of claims 1 to 31, wherein: each input vector comprises one or more values for one or more input variables, each predicted vector comprises one or more values for one or more output variables, wherein the method comprises: obtaining a first input vector comprising one or more values for the one or more input variables, wherein the machine learning model is operative to generate a first predicted vector by using the first input vector, wherein the first predicted vector comprises one or more values for the one or more output variables, obtaining a desired predicted vector which comprises one or more desired values for the one or more output variables, wherein the desired values are different from the values of the first predicted vector, using the database to determine a modification of the one or more input values of the first input vector, to obtain a modified first input vector, wherein the machine learning model generates, based on the modified first input vector, an output vector matching the desired predicted vector according to a matching criterion.

33. A database informative of a machine learning model, wherein the machine learning model is associated with: a set of data points, informative of a set of input vectors and of a set of predicted vectors, wherein each input vector of the set of input vectors has been used to train the machine learning model, wherein each given data point of the set of data points is associated with: a given input vector of the set of input vectors, a given predicted vector of the set of predicted vectors, wherein the given predicted vector has been generated by the machine learning model using the given input vector, wherein the database is informative of a plurality of nodes comprising terminal nodes, wherein each given terminal node is associated with:  a plurality of data points of the set of data points,  one or more coefficients defining a function, wherein the function fits a relationship between:  a plurality of input vectors of the plurality of data points of the given terminal node, and  a plurality of predicted vectors of the plurality of data points of the given terminal node, with a quality meeting an accuracy criterion, wherein the database is usable to generate data informative of the machine learning model.

34. The database of claim 33, wherein at least one terminal node is further associated with one or more data points, each of these data points including an input vector which have not been used to train the machine learning model and a predicted vector generated by the machine learning model using the input vector.

35. The database of claim 33 or of claim 34, wherein each given terminal node is associated with a first set of boundaries u0001u0002u0003u0004u0005u0006u0007bt_u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space in which all of the plurality of input vectors of all data points associated with the given terminal node are located.

36. The database of any one of claims 33 to 35, wherein, for a function of at least one terminal node, each given input variable is associated with data Dstat_sign informative of a statistical significance of the input variable.

37. The database of any one of claims 33 to 36, wherein the plurality of nodes includes non-terminal nodes and terminal nodes, wherein at least one non-terminal node is linked to one or more child nodes, each child node being either a non-terminal node or a terminal node, said at least one non-terminal node being associated with a first set of boundaries u0001u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space in which all of the plurality of input vectors of all data points of said at least one or more child nodes are located.

38. The database of any one of claims 33 to 37, wherein any data point including an input vector of the set of input vectors and a corresponding predicted vector of the set of predicted vectors is associated with a single terminal node of the database.

39. The database of any one of claims 33 to 38, wherein the plurality of nodes is arranged in hierarchical levels u0010u0006, with i from 1 to N, wherein each node of level u0010u0011 is linked to a parent node of level u0010u0011u0012u0013, with j from 2 to N, wherein: each node is associated with a first set of boundaries u0001u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space including input vectors of all data points associated with this node, wherein, for each given node linked to a parent node, a space defined by the first set of boundaries of the given node is included within the space defined by the first set of boundaries of the parent node.

40. The database of any one of claims 33 to 39, wherein each input vector is informative of u0014 input variables, wherein the database includes, for at least one terminal node, data D certainty informative of the certainty of the machine learning model associated with the terminal node, wherein D certainty is obtained using a number N1 of data points associated with the terminal node which include an input vector which has been used to train the machine learning model, and u0014.

41. The database of any one of claims 33 to 40, wherein the database is suitable for at least one of: (i) determining data informative of a quality of the training of the machine learning model; or (ii) determining a certainty of the machine learning model in its prediction; or (iii) determining a certainty of a prediction generated by the machine learning model depending on a range of values of input vectors fed to the machine learning model; or (iv) providing a recommendation of whether the machine learning model has to be retrained; or (v) providing a recommendation on a range of values of input vectors for which the machine learning model has to be retrained; or (vi) determining one or more input variables of the input vectors which most impact a prediction generated by the machine learning model; or (vii) determining a change in one or more values of a given input vector to obtain a prediction, by the machine learning model, which matches a desired predicted vector; or (viii) for a given input vector, estimating a prediction which would have been generated by the machine learning model using this given input vector; or (ix) determining data informative of a bias in a prediction of the machine learning model.

42. A system comprising a processor and memory circuitry (PMC) configured to: - obtain, for a machine learning model, a set of data points, informative of a set of input vectors and of a set of predicted vectors, wherein each input vector of the set of input vectors has been used to train the machine learning model, wherein each given data point of the set of data points is associated with: o a given input vector of the set of input vectors, o a given predicted vector of the set of predicted vectors, wherein the given predicted vector has been generated by the machine learning model using the given input vector, - use the set of data points to generate a database informative of the machine learning model, wherein the database is informative of a plurality of nodes comprising terminal nodes, wherein each given terminal node is associated with:  a plurality of data points of the set of data points,  one or more coefficients defining a function, wherein the function fits a relationship between:  a plurality of input vectors of the plurality of data points of the given terminal node, and  a plurality of predicted vectors of the plurality of data points of the given terminal node, with a quality meeting an accuracy criterion, wherein the database is usable to generate data informative of the machine learning model.

43. The system of claim 42, configured to use the database for at least one of: (i) determining data informative of a quality of the training of the machine learning model; or (ii) determining a certainty of the machine learning model in its prediction; or (iii) determining a certainty of a prediction generated by the machine learning model depending on a range of values of input vectors fed to the machine learning model; or (iv) providing a recommendation of whether the machine learning model has to be retrained; or (v) providing a recommendation on a range of values of input vectors for which the machine learning model has to be retrained; or (vi) determining one or more input variables of the input vectors which most impact a prediction generated by the machine learning model; or (vii) determining a change in one or more values of a given input vector to obtain a prediction, by the machine learning model, which matches a desired predicted vector; or (viii) for a given input vector, estimating a prediction which would have been generated by the machine learning model using this given input vector; or (ix) determining data informative of a bias in a prediction of the machine learning model.

44. The system of claim 42 or of claim 43, wherein each terminal node is associated with a first set of boundaries u0001u0002u0003u0004u0005u0006u0007bt_u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space in which all of the plurality of input vectors of the data points of the terminal node are located.

45. The system of any one of claims 42 to 44, wherein each input vector comprises one or more values for one or more input variables, wherein, for a function of at least one terminal node, each given input variable is associated with data Dstat_sign informative of a statistical significance of the input variable.

46. The system of claim 45, configured to, for one or more of input variables of the input vectors, use a magnitude of a coefficient associated with the input variable to determine data informative of an impact of the given input variable on a prediction of the machine learning model.

47. The system of any one of claims 42 to 46, wherein: each input vector comprises one or more values for one or more input variables, each terminal node of the plurality of terminal nodes of the database is associated with a first set of boundaries u0001u0002u0003u0004u0005u0006u0007bt_u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space in which all values of the plurality of input vectors of the data points of said terminal node are located, wherein the system is configured to: obtain a data point comprising a first vector comprising one or more values for the one or more input variables, and use the database to determine a given terminal node for which the first vector is located within the space defined by the first set of boundaries u0001u0002u0003u0004u0005u0006u0007bt_u0007u000bfu0003_u0006u0007ru000eu0002 of said given terminal node.

48. The system of any one of claims 42 to 47, wherein: each input vector comprises one or more values for one or more input variables, each predicted vector comprises one or more values for one or more output variables, each terminal node of the plurality of terminal nodes of the database is associated with: a first set of boundaries u0001u0002u0003u0004u0005u0006u0007bt_u0007u000bfu0003_u0006u0007ru000eu0002 defining boundaries of a space in which all values of the plurality of input vectors of all data points of said terminal node are located, wherein the system is configured to: obtain a first vector comprising one or more values for the one or more input variables, use the database to determine a given terminal node for which the first vector is located within the space defined by the first set of boundaries u0001u0002u0003u0004u0005u0006u0007bt_u0007u000bfu0003_u0006u0007ru000eu0002 of said given terminal node, use the function associated with said given terminal node and the first vector to determine a second vector comprising one or more values for the one or more output variables.

49. The system of any one of claims 42 to 48, wherein each input vector is informative of u0014 input variables, wherein the system is configured to, for at least one terminal node, obtain a number N 1 of data points associated with the terminal node which include an input vector which has been used to train the machine learning model, and use N 1 and u0014 to determine data D certainty informative of the certainty of the machine learning model associated with the terminal node.

50. A non-transitory storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform: - obtaining, for a machine learning model, a set of data points, informative of a set of input vectors and of a set of predicted vectors, wherein each input vector of the set of input vectors has been used to train the machine learning model, wherein each given data point of the set of data points is associated with: o a given input vector of the set of input vectors, o a given predicted vector of the set of predicted vectors, wherein the given predicted vector has been generated by the machine learning model using the given input vector, - using the set of data points to generate a database informative of the machine learning model, wherein the database is informative of a plurality of nodes comprising terminal nodes, wherein each given terminal node is associated with:  a plurality of data points of the set of data points,  one or more coefficients defining a function, wherein the function fits a relationship between:  a plurality of input vectors of the plurality of data points of the given terminal node, and  a plurality of predicted vectors of the plurality of data points of the given terminal node, with a quality meeting an accuracy criterion, wherein the database is usable to generate data informative of the machine learning model.