CN111753995A - Local interpretable method based on gradient lifting tree - Google Patents
Local interpretable method based on gradient lifting tree Download PDFInfo
- Publication number
- CN111753995A CN111753995A CN202010580912.6A CN202010580912A CN111753995A CN 111753995 A CN111753995 A CN 111753995A CN 202010580912 A CN202010580912 A CN 202010580912A CN 111753995 A CN111753995 A CN 111753995A
- Authority
- CN
- China
- Prior art keywords
- model
- gradient lifting
- importance
- feature
- tree model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 230000009467 reduction Effects 0.000 claims abstract description 11
- 238000012549 training Methods 0.000 claims description 36
- 238000003066 decision tree Methods 0.000 claims description 16
- 238000013140 knowledge distillation Methods 0.000 claims description 13
- 238000012163 sequencing technique Methods 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 9
- 239000012535 impurity Substances 0.000 claims description 9
- 238000013527 convolutional neural network Methods 0.000 claims description 4
- 230000007246 mechanism Effects 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 3
- 238000003058 natural language processing Methods 0.000 abstract description 9
- 238000010187 selection method Methods 0.000 abstract description 2
- 238000012545 processing Methods 0.000 description 7
- 238000010801 machine learning Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 239000000654 additive Substances 0.000 description 2
- 230000000996 additive effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 235000008733 Citrus aurantifolia Nutrition 0.000 description 1
- 102100030851 Cortistatin Human genes 0.000 description 1
- 235000011941 Tilia x europaea Nutrition 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 235000000332 black box Nutrition 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000009849 deactivation Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000002779 inactivation Effects 0.000 description 1
- 239000004571 lime Substances 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000013105 post hoc analysis Methods 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a local interpretable method based on a gradient lifting tree, which is characterized in that a complex model is distilled by using knowledge to obtain a gradient lifting tree model, the weighted average of the contribution of each gradient lifting tree to node information gain is obtained by improving the traditional method for calculating the importance of average purity reduction (MDI), and the importance ranking of input characteristics is obtained by ranking the weighted average, so that the complex model is interpreted. The invention is a universal interpretable method capable of extracting and interpreting datasets in various fields, such as natural language processing datasets, image datasets and tabular datasets. Meanwhile, the method can use a sub-module selection method to promote and apply the local interpretation to the global interpretation of the acquired model.
Description
Technical Field
The invention relates to the field of artificial intelligence, in particular to a local interpretable method based on a gradient lifting tree, which is applied to extracting and interpreting various artificial intelligence models.
Background
As machine learning models are increasingly used in key areas such as automotive, healthcare, financial markets, and legal systems, understanding predictions made by machine learning algorithms becomes critical to humans. Many complex models (e.g., deep neural networks and ensemble learning) are fine-tuned to optimize prediction accuracy, which makes it difficult to interpret predictions. Interpretable machine learning solves this problem from two directions. The first approach attempts to build inherently interpretable models based on decision trees (sets or rules), GAM (generalized additive models), logistic regression, etc., and these models often face the need to reduce prediction accuracy. Another approach provides a global understanding of the entire model or a local interpretation of a single prediction. Some interpretation methods are model independent and can be applied to any classifier or regressor, while others are designed for a particular model. The form of interpretation varies from functional importance to decision sets or rules.
Interpretable machine learning has recently attracted an increasing number of researchers. With the revival of deep learning, understanding complex neural networks becomes increasingly difficult. It is still challenging because deep neural networks typically contain a large number of hidden layers and parameters, as well as related activity features that reside on the hidden layers. Meanwhile, GBM (gradient elevator) is a powerful whole-body learning algorithm that can prove its competitive performance on many tasks, such as online advertising. Boosting is a powerful supervised learning method that enhances the predictive performance of a model by iteratively refining and combining multiple weak learners (typically decision trees) to enhance the predictive performance of the model. Gradient enhancement extends the enhancement method to any differentiable loss function, and can be used for regression and classification problems. In practice, GBM works well in many application domains and is supported by many publicly available implementations. Learning competitions like Kaggle are all tree-based gradient boosting methods, especially LightGBM, XGBoost, etc. One of the most popular basic learners of GBM is probably the fixed-size CART (classification and regression tree), which yields the GBDT (gradient enhanced decision tree, also called gradient tree enhancement). In the present invention we focus on interpreting the single prediction of the tree-based GBM in terms of functional importance. For tree-based integration methods, while decision trees are relatively easy to understand, the final additive model becomes less transparent after model integration.
Recent advances in model-agnostic interpretation methods are available for interpreting integrated methods. The model independent interpretation method treats the target model as a black box, enabling interpretation of any classifier or regressor. The existing work for model agnostic methods is typically to perform post-hoc analysis on a given black-box model that fits the data. One common approach is to learn another model that approximates the predictions of the original model and is relatively easy to interpret. Earlier work approximated the original prediction globally, while recently some methods, such as LIME and Anchor, were proposed that could obtain a locally interpretable model of a single sample. Most model-independent methods will perturb the input instances according to some perturbation distribution to explain, by which the most likely important features can be specified for the prediction. For complex models, it is often difficult to globally interpret the behavior of the model using simple interpretable sets or rules formed from selected significant features. Also, the interpretation of the entire model may not perfectly interpret a single prediction. Therefore, in this case, it is preferable to use a local interpretation method with a concise interpretable description. To further evaluate the entire model, one may choose to apply the interpretation generated by the subset of inputs to the unknown instance.
Disclosure of Invention
The invention aims to provide a local interpretable method based on a gradient lifting tree, which can improve the interpretable capacity of a model by improving a characteristic importance calculation method of an integrated model, and can utilize knowledge distillation technology to interpret an original complex model.
The specific technical scheme for realizing the purpose of the invention is as follows:
a local interpretable method based on a gradient lifting tree is characterized in that: the method comprises the following specific steps:
step 1: performing parameter training on the initial complex model by using a training data set, and extracting input features;
step 2: knowledge distillation is carried out on the trained model to obtain soft label output of input characteristics;
and step 3: training a gradient lifting tree model by using the input features obtained in the step 1 and the output soft labels obtained in the step 2 to obtain a trained gradient lifting tree model;
and 4, step 4: extracting feature importance from the trained gradient lifting tree model, sequencing the feature importance, and selecting the features with higher feature importance as the explanation of the initial complex model.
Step 1, the training data set is a natural language data set, an image data set and a table data set; the initial model is a long-short term memory network, a convolutional neural network and a multilayer perceptron based on an attention mechanism; the parameter training is carried out: the natural language data set uses a long-short term memory network based on an attention mechanism; the image dataset using a convolutional neural network; the tabular data set uses a multi-layer perceptron.
And 2, carrying out knowledge distillation to obtain soft label output of the input characteristics, wherein the soft label output formula is as follows:
wherein, LabelsoftIs referred to as soft tag output, ziThe method refers to the final output of an initial model, T is a temperature parameter, i refers to the prediction of the ith class, and j refers to the prediction class of the total prediction tasks.
The trained gradient lifting tree model obtained in the step 3 comprises M weak classifiers, each weak classifier is a decision tree model, and M is a parameter of the gradient lifting tree model.
Step 4, extracting feature importance from the trained gradient lifting tree model, sorting the feature importance, and selecting the features with higher feature importance as the explanation of the initial complex model, which specifically comprises:
the calculation formula of the feature importance is as follows:
wherein,expressing the importance expectation of a feature P, which is composed of K data, PkThe k data of the characteristic is obtained; imp (P)k) I.e., the feature importance of the kth data of the feature, wherein Imp(Pk) Each weight gamma inmhm(x) Namely the contribution degree of the m-th weak discriminator in the trained gradient lifting tree model to the whole model,the m weak discriminator defined as normalization has an input of PkThe impurity reduction rate of the time, which is the prediction characteristic P of the weak discriminatorkIn time, P is used in node segmentationkThe impurity reduction of (a) is a ratio of the total impurity reduction of (b); the purity of the reaction is calculated by the characteristic PkCalculated by dividing node n through the decision tree model, i.e. Gain (P)k,n)=i(n)-pLi(nL)-pRi(nR) Wherein i (n) represents the purity of the node split, and pLAnd pRRespectively represent that the sample reaches n after splittingLAnd nRA moiety of (a); in the trained gradient lifting tree model, TmRepresenting the m-th weak arbiter, i.e. the m-th decision tree model, and using Tm(x) When the input sample is x, wherein the sample x contains multiple features P, the decision tree model TmA corresponding path at the time of prediction; the higher the importance expectation of the feature P indicates that the more important the feature is for model decision making; all features to be obtainedAnd sequencing the models from large to small to serve as the explanation extracted from the gradient lifting tree model and serve as the explanation of the initial complex model.
The invention is a universal interpretable method capable of extracting and interpreting datasets in various fields, such as natural language processing datasets, image datasets and tabular datasets. Meanwhile, the method can use a sub-module selection method to promote and apply the local interpretation to the global interpretation of the acquired model.
Drawings
FIG. 1 is a detailed flowchart of example 1 of the present invention;
FIG. 2 is a diagram of an initial model framework for image processing according to embodiment 2 of the present invention;
FIG. 3 is a diagram of an initial model framework of natural language processing according to embodiment 1 of the present invention;
FIG. 4 is a table task initial model framework diagram according to embodiment 3 of the present invention;
FIG. 5 is a flow chart of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to be limiting.
The invention provides a local interpretation algorithm based on a gradient lifting tree model, which is characterized in that relative importance is calculated through nodes passing through in a sample prediction process, and importance ranking of input features is obtained through ranking to obtain local interpretable results. The invention is a universal interpretable method capable of extracting and interpreting datasets in various fields, such as natural language processing datasets, image datasets and table datasets.
The process of the invention is as shown in fig. 5, and comprises the steps of initial complex model training, extracting input features and output soft labels, gradient lifting tree model training, extracting feature importance, and sequencing feature importance to generate interpretation.
Firstly, dividing original sample data into a training set and a test set; secondly, training an original model on a training set, and extracting soft label output of the original model by using a knowledge distillation method; then, training a gradient lifting tree model by using the input of the training set and the output of the soft label; then, the feature importance of the sample is calculated on a single sample on the test set by using the feature calculation method of the invention, and the interpretation of the sample is obtained by sequencing.
The invention provides a formula for calculating the feature importance, which is as follows:
wherein,expressing the importance expectation of a feature P, which is composed of K data, PkThe k data of the characteristic is obtained; imp (P)k) I.e., the feature importance of the kth data of the feature, wherein Imp(Pk) Each weight gamma inmhm(x) Namely the contribution degree of the m-th weak discriminator in the trained gradient lifting tree model to the whole model,the m weak discriminator defined as normalization has an input of PkImpurity reduction rate of time, which is P used in node division when weak discriminator predicts XkThe impurity reduction amount of (A) is less than the total impurity reduction amount of (B)The ratio of the reduction in purity; the purity of the reaction is calculated by the characteristic PkCalculated by dividing node n through the decision tree model, i.e. Gain (P)k,n)=i(n)-pLi(nL)-pRi(nR) Wherein i (n) represents the purity of the node split, and pLAnd pRRespectively represent that the sample reaches n after splittingLAnd nRA moiety of (a); in the trained gradient lifting tree model, TmRepresenting the m-th weak arbiter, i.e. the m-th decision tree model, and using Tm(x) When the input sample is x (the sample x contains a plurality of features P), the decision tree model TmA corresponding path at the time of prediction; the higher the importance expectation of the feature P indicates that the more important the feature is for model decision making; all features to be obtainedAnd sequencing the models from large to small to serve as the explanation extracted from the gradient lifting tree model and serve as the explanation of the initial complex model.
Example 1
The following is set forth for the main parts and implementation strategies of the present invention:
fig. 1 is a specific process of embodiment 1, including training of an initial complex model, extracting input features and output soft labels, training a gradient lifting tree model, extracting feature importance from the gradient lifting tree model, and sequencing to obtain an explanation. The method comprises the following steps: performing parameter training on the initial complex model by using a training data set, and extracting input features
Firstly, a training data set is constructed by using a natural language processing data set SST2, and then an initial complex model corresponding to a natural language processing task is designed and constructed, wherein the structure of the initial complex model is shown in FIG. 3 and comprises a word embedding layer, a long-short term memory network layer, an attention layer, a random inactivation layer and a full connection network layer. Training of the training data set is performed using the initial complex model. After the model is trained, the output of the word embedding layer is extracted as the input characteristic.
Step two: knowledge distillation is carried out on the trained model to obtain soft label output of input characteristics
Knowledge distillation is a technique commonly used in model compression and migration learning to transfer the knowledge of one complex network to another simpler model. It is actually very difficult to directly explain a complex model, and knowledge distillation is a very useful technique for enabling explanation, and a model with low interpretability is distilled onto a model with high interpretability, and the former can be explained by explaining the latter. Therefore, the invention uses a knowledge distillation method to extract a soft label corresponding to the input feature from the final output layer of the initial model, and the formula of the soft label is as follows:
wherein z isiAnd (4) outputting the logits of the model, wherein T is the temperature of knowledge distillation, and i corresponds to the number of the types of the prediction tasks. For natural language processing tasks, T is set to 2. Obtaining the output soft Label corresponding to the input characteristic through calculationsoft. Step three: training of gradient boosting trees using input features and output soft labels
And (4) constructing the input features and the soft labels obtained in the step two into a data set, training by using a gradient lifting tree model, and correspondingly adjusting parameters of the constructed gradient lifting tree for different tasks so as to ensure that the precision of the trained gradient lifting tree is high enough, wherein the acquisition of the high-precision gradient lifting tree model is a precondition for better explaining the extracted model. For the natural language processing task, setting the parameter M of the gradient lifting tree to be 100, that is, the gradient lifting tree model includes 100 weak classifiers, that is, 100 decision tree models.
Step four: extracting feature importance from the trained gradient lifting tree model, sequencing the feature importance, and selecting the features with higher feature importance as the explanation of the initial complex model
By utilizing the calculation method provided by the invention, any sample is selected from the test set, the sample is predicted by using the gradient lifting tree model trained in the step three,calculating the importance of the feature for each feature of the sample while predicting the sample, wherein the higher the expectation of the importance of the feature P is, the more important the feature is for model decision; all features to be obtainedAnd sequencing the models from large to small to serve as the explanation extracted from the gradient lifting tree model and serve as the explanation of the initial complex model, namely the local explanation. In the natural language processing task, the input sample is a sentence, each word in the sentence is the characteristic of the sample, the characteristic importance of each word can be obtained through the calculation, the importance ranking of the words can be obtained through ranking, and the important word is used as the explanation of the sample.
Example 2
The following is set forth for the main parts and implementation strategies of the present invention:
the method comprises the following steps: performing parameter training on the initial complex model by using a training data set, and extracting input features
Firstly, an image processing data set MNIST is used for constructing a training data set, then an initial complex model corresponding to an image processing task is designed and constructed, and the structure of the model is shown in fig. 2 and comprises a convolutional layer, an activation layer, a pooling layer, a random deactivation layer and a full connection network layer. Training of the training data set is performed using the initial complex model. The image two-dimensional pixel data is used directly as input features.
Step two: knowledge distillation is carried out on the trained model to obtain soft label output of input characteristics
For the image processing task, T is set to 1. Obtaining the output soft Label corresponding to the input characteristic through calculationsoft。
The subsequent steps were the same as in example 1.
Example 3
The following is set forth for the main parts and implementation strategies of the present invention:
the method comprises the following steps: performing parameter training on the initial complex model by using a training data set, and extracting input features
Firstly, a training data set is constructed by using a table processing data set adult, and then an initial complex model corresponding to an image processing task is designed and constructed, wherein the structure of the model is shown in fig. 4 and comprises a fully-connected network layer. Training of the training data set is performed using the initial complex model. The form data is used directly as input features.
Step two: knowledge distillation is carried out on the trained model to obtain soft label output of input characteristics
For the table processing task, T is set to 1. Obtaining the output soft Label corresponding to the input characteristic through calculationsoft。
The subsequent steps were the same as in example 1.
The specific embodiments of the present invention have been described above with reference to the accompanying drawings. It will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention. The technical solutions and solutions of the present invention, after being modified and substituted by equivalents, are all within the scope of the present invention as claimed.
Claims (5)
1. A local interpretable method based on a gradient lifting tree is characterized by comprising the following specific steps:
step 1: performing parameter training on the initial complex model by using a training data set, and extracting input features;
step 2: knowledge distillation is carried out on the trained model to obtain soft label output of input characteristics;
and step 3: training a gradient lifting tree model by using the input features obtained in the step 1 and the output soft labels obtained in the step 2 to obtain a trained gradient lifting tree model;
and 4, step 4: extracting feature importance from the trained gradient lifting tree model, sequencing the feature importance, and selecting the features with higher feature importance as the explanation of the initial complex model.
2. The gradient spanning tree-based locally interpretable method of claim 1, wherein the training dataset of step 1 is a natural language dataset, an image dataset, and a table dataset; the initial model is a long-short term memory network, a convolutional neural network and a multilayer perceptron based on an attention mechanism; the parameter training is carried out: the natural language data set uses a long-short term memory network based on an attention mechanism; the image dataset using a convolutional neural network; the tabular data set uses a multi-layer perceptron.
3. The gradient-boosting tree-based locally interpretable method according to claim 1, wherein the knowledge distillation performed in step 2 obtains a soft label output of the input features, and the soft label output formula is as follows:
wherein, LabelsoftIs referred to as soft tag output, ziThe method refers to the final output of an initial model, T is a temperature parameter, i refers to the prediction of the ith class, and j refers to the prediction class of the total prediction tasks.
4. The local interpretable method of claim 1, wherein the trained gradient-boosting tree model obtained in step 3 comprises M weak classifiers, each weak classifier being a decision tree model, wherein M is a parameter of the gradient-boosting tree model.
5. The local interpretable method of claim 1, wherein the step 4 of extracting feature importance from the trained gradient lifting tree model, sorting the feature importance, and selecting features with higher feature importance as an interpretation of the initial complex model specifically comprises:
the calculation formula of the feature importance is as follows:
wherein,expressing the importance expectation of a feature P, which is composed of K data, PkThe k data of the characteristic is obtained; imp (P)k) I.e., the feature importance of the kth data of the feature, wherein Imp(Pk) Each weight gamma inmhm(x) Namely the contribution degree of the m-th weak discriminator in the trained gradient lifting tree model to the whole model,the m weak discriminator defined as normalization has an input of PkThe impurity reduction rate of the time, which is the prediction characteristic P of the weak discriminatorkIn time, P is used in node segmentationkThe impurity reduction of (a) is a ratio of the total impurity reduction of (b); the purity of the reaction is calculated by the characteristic PkCalculated by dividing node n through the decision tree model, i.e. Gain (P)k,n)=i(n)-pLi(nL)-pRi(nR) Wherein i (n) represents the purity of the node split, and pLAnd pRRespectively represent that the sample reaches n after splittingLAnd nRA moiety of (a); in the trained gradient lifting tree model, TmRepresenting the m-th weak arbiter, i.e. the m-th decision tree model, and using Tm(x) When the input sample is x, wherein the sample x contains multiple features P, the decision tree model TmA corresponding path at the time of prediction; the higher the importance expectation of the feature P indicates that the more important the feature is for model decision making; all features to be obtainedAnd sequencing the models from large to small to serve as the explanation extracted from the gradient lifting tree model and serve as the explanation of the initial complex model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010580912.6A CN111753995B (en) | 2020-06-23 | 2020-06-23 | Local interpretable method based on gradient lifting tree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010580912.6A CN111753995B (en) | 2020-06-23 | 2020-06-23 | Local interpretable method based on gradient lifting tree |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111753995A true CN111753995A (en) | 2020-10-09 |
CN111753995B CN111753995B (en) | 2024-06-28 |
Family
ID=72676993
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010580912.6A Active CN111753995B (en) | 2020-06-23 | 2020-06-23 | Local interpretable method based on gradient lifting tree |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111753995B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113240119A (en) * | 2021-04-08 | 2021-08-10 | 南京大学 | Cross-model distilling device for game AI strategy explanation |
CN113902978A (en) * | 2021-09-10 | 2022-01-07 | 长沙理工大学 | Interpretable SAR image target detection method and system based on deep learning |
CN114841233A (en) * | 2022-03-22 | 2022-08-02 | 阿里巴巴(中国)有限公司 | Path interpretation method, device and computer program product |
CN116704208A (en) * | 2023-08-04 | 2023-09-05 | 南京理工大学 | Local interpretable method based on characteristic relation |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180158552A1 (en) * | 2016-12-01 | 2018-06-07 | University Of Southern California | Interpretable deep learning framework for mining and predictive modeling of health care data |
CN108363714A (en) * | 2017-12-21 | 2018-08-03 | 北京至信普林科技有限公司 | A kind of method and system for the ensemble machine learning for facilitating data analyst to use |
CN108960434A (en) * | 2018-06-28 | 2018-12-07 | 第四范式(北京)技术有限公司 | The method and device of data is analyzed based on machine learning model explanation |
CN109978050A (en) * | 2019-03-25 | 2019-07-05 | 北京理工大学 | Decision Rules Extraction and reduction method based on SVM-RF |
CN110443346A (en) * | 2019-08-12 | 2019-11-12 | 腾讯科技(深圳)有限公司 | A kind of model explanation method and device based on input feature vector importance |
CN111027060A (en) * | 2019-12-17 | 2020-04-17 | 电子科技大学 | Knowledge distillation-based neural network black box attack type defense method |
CN111091179A (en) * | 2019-12-03 | 2020-05-01 | 浙江大学 | Heterogeneous depth model mobility measurement method based on attribution graph |
CN111160473A (en) * | 2019-12-30 | 2020-05-15 | 深圳前海微众银行股份有限公司 | Feature mining method and device for classified labels |
CN111311400A (en) * | 2020-03-30 | 2020-06-19 | 百维金科(上海)信息科技有限公司 | Modeling method and system of grading card model based on GBDT algorithm |
-
2020
- 2020-06-23 CN CN202010580912.6A patent/CN111753995B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180158552A1 (en) * | 2016-12-01 | 2018-06-07 | University Of Southern California | Interpretable deep learning framework for mining and predictive modeling of health care data |
CN108363714A (en) * | 2017-12-21 | 2018-08-03 | 北京至信普林科技有限公司 | A kind of method and system for the ensemble machine learning for facilitating data analyst to use |
CN108960434A (en) * | 2018-06-28 | 2018-12-07 | 第四范式(北京)技术有限公司 | The method and device of data is analyzed based on machine learning model explanation |
CN109978050A (en) * | 2019-03-25 | 2019-07-05 | 北京理工大学 | Decision Rules Extraction and reduction method based on SVM-RF |
CN110443346A (en) * | 2019-08-12 | 2019-11-12 | 腾讯科技(深圳)有限公司 | A kind of model explanation method and device based on input feature vector importance |
CN111091179A (en) * | 2019-12-03 | 2020-05-01 | 浙江大学 | Heterogeneous depth model mobility measurement method based on attribution graph |
CN111027060A (en) * | 2019-12-17 | 2020-04-17 | 电子科技大学 | Knowledge distillation-based neural network black box attack type defense method |
CN111160473A (en) * | 2019-12-30 | 2020-05-15 | 深圳前海微众银行股份有限公司 | Feature mining method and device for classified labels |
CN111311400A (en) * | 2020-03-30 | 2020-06-19 | 百维金科(上海)信息科技有限公司 | Modeling method and system of grading card model based on GBDT algorithm |
Non-Patent Citations (1)
Title |
---|
NICHOLAS FROSST ET AL: "Distilling a Neural Network Into a Soft Decision Tree", ARXIV:1711.09784V1, 27 November 2017 (2017-11-27), pages 1 - 8, XP080840510 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113240119A (en) * | 2021-04-08 | 2021-08-10 | 南京大学 | Cross-model distilling device for game AI strategy explanation |
CN113240119B (en) * | 2021-04-08 | 2024-03-19 | 南京大学 | Cross-model distillation device for game AI strategy interpretation |
CN113902978A (en) * | 2021-09-10 | 2022-01-07 | 长沙理工大学 | Interpretable SAR image target detection method and system based on deep learning |
CN114841233A (en) * | 2022-03-22 | 2022-08-02 | 阿里巴巴(中国)有限公司 | Path interpretation method, device and computer program product |
CN114841233B (en) * | 2022-03-22 | 2024-05-31 | 阿里巴巴(中国)有限公司 | Path interpretation method, apparatus and computer program product |
CN116704208A (en) * | 2023-08-04 | 2023-09-05 | 南京理工大学 | Local interpretable method based on characteristic relation |
CN116704208B (en) * | 2023-08-04 | 2023-10-20 | 南京理工大学 | Local interpretable method based on characteristic relation |
Also Published As
Publication number | Publication date |
---|---|
CN111753995B (en) | 2024-06-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210166112A1 (en) | Method for neural network and apparatus performing same method | |
Titsias et al. | Spike and slab variational inference for multi-task and multiple kernel learning | |
CN111753995B (en) | Local interpretable method based on gradient lifting tree | |
US11996116B2 (en) | Methods and systems for implementing on-device non-semantic representation fine-tuning for speech classification | |
CN110443372B (en) | Transfer learning method and system based on entropy minimization | |
CN113963165B (en) | Small sample image classification method and system based on self-supervision learning | |
Liu et al. | EACP: An effective automatic channel pruning for neural networks | |
CN109977094A (en) | A method of the semi-supervised learning for structural data | |
CN116644755B (en) | Multi-task learning-based few-sample named entity recognition method, device and medium | |
CN112861936A (en) | Graph node classification method and device based on graph neural network knowledge distillation | |
Jin et al. | Cold-start active learning for image classification | |
CN113159072B (en) | Online ultralimit learning machine target identification method and system based on consistency regularization | |
US20220036150A1 (en) | System and method for synthesis of compact and accurate neural networks (scann) | |
Qin et al. | Active learning with extreme learning machine for online imbalanced multiclass classification | |
CN111259938B (en) | Manifold learning and gradient lifting model-based image multi-label classification method | |
Sharma et al. | Transfer learning and its application in computer vision: A review | |
Su et al. | Low‐Rank Deep Convolutional Neural Network for Multitask Learning | |
US20220138425A1 (en) | Acronym definition network | |
CN111783688B (en) | Remote sensing image scene classification method based on convolutional neural network | |
Xie et al. | Scalenet: Searching for the model to scale | |
Shen et al. | On image classification: Correlation vs causality | |
Shetty et al. | Comparative analysis of different classification techniques | |
Yang et al. | iCausalOSR: invertible Causal Disentanglement for Open-set Recognition | |
CN115063374A (en) | Model training method, face image quality scoring method, electronic device and storage medium | |
Zu et al. | Consecutive layer collaborative filter similarity for differentiable neural network pruning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |