CN111753995A - Local interpretable method based on gradient lifting tree - Google Patents

Local interpretable method based on gradient lifting tree Download PDF

Info

Publication number
CN111753995A
CN111753995A CN202010580912.6A CN202010580912A CN111753995A CN 111753995 A CN111753995 A CN 111753995A CN 202010580912 A CN202010580912 A CN 202010580912A CN 111753995 A CN111753995 A CN 111753995A
Authority
CN
China
Prior art keywords
model
gradient lifting
importance
feature
tree model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010580912.6A
Other languages
Chinese (zh)
Other versions
CN111753995B (en
Inventor
仇鑫
李鑫
张瑞
徐宏刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Normal University
Original Assignee
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Normal University filed Critical East China Normal University
Priority to CN202010580912.6A priority Critical patent/CN111753995B/en
Publication of CN111753995A publication Critical patent/CN111753995A/en
Application granted granted Critical
Publication of CN111753995B publication Critical patent/CN111753995B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a local interpretable method based on a gradient lifting tree, which is characterized in that a complex model is distilled by using knowledge to obtain a gradient lifting tree model, the weighted average of the contribution of each gradient lifting tree to node information gain is obtained by improving the traditional method for calculating the importance of average purity reduction (MDI), and the importance ranking of input characteristics is obtained by ranking the weighted average, so that the complex model is interpreted. The invention is a universal interpretable method capable of extracting and interpreting datasets in various fields, such as natural language processing datasets, image datasets and tabular datasets. Meanwhile, the method can use a sub-module selection method to promote and apply the local interpretation to the global interpretation of the acquired model.

Description

Local interpretable method based on gradient lifting tree
Technical Field
The invention relates to the field of artificial intelligence, in particular to a local interpretable method based on a gradient lifting tree, which is applied to extracting and interpreting various artificial intelligence models.
Background
As machine learning models are increasingly used in key areas such as automotive, healthcare, financial markets, and legal systems, understanding predictions made by machine learning algorithms becomes critical to humans. Many complex models (e.g., deep neural networks and ensemble learning) are fine-tuned to optimize prediction accuracy, which makes it difficult to interpret predictions. Interpretable machine learning solves this problem from two directions. The first approach attempts to build inherently interpretable models based on decision trees (sets or rules), GAM (generalized additive models), logistic regression, etc., and these models often face the need to reduce prediction accuracy. Another approach provides a global understanding of the entire model or a local interpretation of a single prediction. Some interpretation methods are model independent and can be applied to any classifier or regressor, while others are designed for a particular model. The form of interpretation varies from functional importance to decision sets or rules.
Interpretable machine learning has recently attracted an increasing number of researchers. With the revival of deep learning, understanding complex neural networks becomes increasingly difficult. It is still challenging because deep neural networks typically contain a large number of hidden layers and parameters, as well as related activity features that reside on the hidden layers. Meanwhile, GBM (gradient elevator) is a powerful whole-body learning algorithm that can prove its competitive performance on many tasks, such as online advertising. Boosting is a powerful supervised learning method that enhances the predictive performance of a model by iteratively refining and combining multiple weak learners (typically decision trees) to enhance the predictive performance of the model. Gradient enhancement extends the enhancement method to any differentiable loss function, and can be used for regression and classification problems. In practice, GBM works well in many application domains and is supported by many publicly available implementations. Learning competitions like Kaggle are all tree-based gradient boosting methods, especially LightGBM, XGBoost, etc. One of the most popular basic learners of GBM is probably the fixed-size CART (classification and regression tree), which yields the GBDT (gradient enhanced decision tree, also called gradient tree enhancement). In the present invention we focus on interpreting the single prediction of the tree-based GBM in terms of functional importance. For tree-based integration methods, while decision trees are relatively easy to understand, the final additive model becomes less transparent after model integration.
Recent advances in model-agnostic interpretation methods are available for interpreting integrated methods. The model independent interpretation method treats the target model as a black box, enabling interpretation of any classifier or regressor. The existing work for model agnostic methods is typically to perform post-hoc analysis on a given black-box model that fits the data. One common approach is to learn another model that approximates the predictions of the original model and is relatively easy to interpret. Earlier work approximated the original prediction globally, while recently some methods, such as LIME and Anchor, were proposed that could obtain a locally interpretable model of a single sample. Most model-independent methods will perturb the input instances according to some perturbation distribution to explain, by which the most likely important features can be specified for the prediction. For complex models, it is often difficult to globally interpret the behavior of the model using simple interpretable sets or rules formed from selected significant features. Also, the interpretation of the entire model may not perfectly interpret a single prediction. Therefore, in this case, it is preferable to use a local interpretation method with a concise interpretable description. To further evaluate the entire model, one may choose to apply the interpretation generated by the subset of inputs to the unknown instance.
Disclosure of Invention
The invention aims to provide a local interpretable method based on a gradient lifting tree, which can improve the interpretable capacity of a model by improving a characteristic importance calculation method of an integrated model, and can utilize knowledge distillation technology to interpret an original complex model.
The specific technical scheme for realizing the purpose of the invention is as follows:
a local interpretable method based on a gradient lifting tree is characterized in that: the method comprises the following specific steps:
step 1: performing parameter training on the initial complex model by using a training data set, and extracting input features;
step 2: knowledge distillation is carried out on the trained model to obtain soft label output of input characteristics;
and step 3: training a gradient lifting tree model by using the input features obtained in the step 1 and the output soft labels obtained in the step 2 to obtain a trained gradient lifting tree model;
and 4, step 4: extracting feature importance from the trained gradient lifting tree model, sequencing the feature importance, and selecting the features with higher feature importance as the explanation of the initial complex model.
Step 1, the training data set is a natural language data set, an image data set and a table data set; the initial model is a long-short term memory network, a convolutional neural network and a multilayer perceptron based on an attention mechanism; the parameter training is carried out: the natural language data set uses a long-short term memory network based on an attention mechanism; the image dataset using a convolutional neural network; the tabular data set uses a multi-layer perceptron.
And 2, carrying out knowledge distillation to obtain soft label output of the input characteristics, wherein the soft label output formula is as follows:
Figure BDA0002552282610000021
wherein, LabelsoftIs referred to as soft tag output, ziThe method refers to the final output of an initial model, T is a temperature parameter, i refers to the prediction of the ith class, and j refers to the prediction class of the total prediction tasks.
The trained gradient lifting tree model obtained in the step 3 comprises M weak classifiers, each weak classifier is a decision tree model, and M is a parameter of the gradient lifting tree model.
Step 4, extracting feature importance from the trained gradient lifting tree model, sorting the feature importance, and selecting the features with higher feature importance as the explanation of the initial complex model, which specifically comprises:
the calculation formula of the feature importance is as follows:
Figure BDA0002552282610000031
wherein,
Figure BDA0002552282610000032
expressing the importance expectation of a feature P, which is composed of K data, PkThe k data of the characteristic is obtained; imp (P)k) I.e., the feature importance of the kth data of the feature, wherein
Figure BDA0002552282610000033
Figure BDA0002552282610000034
Imp(Pk) Each weight gamma inmhm(x) Namely the contribution degree of the m-th weak discriminator in the trained gradient lifting tree model to the whole model,
Figure BDA0002552282610000035
the m weak discriminator defined as normalization has an input of PkThe impurity reduction rate of the time, which is the prediction characteristic P of the weak discriminatorkIn time, P is used in node segmentationkThe impurity reduction of (a) is a ratio of the total impurity reduction of (b); the purity of the reaction is calculated by the characteristic PkCalculated by dividing node n through the decision tree model, i.e. Gain (P)k,n)=i(n)-pLi(nL)-pRi(nR) Wherein i (n) represents the purity of the node split, and pLAnd pRRespectively represent that the sample reaches n after splittingLAnd nRA moiety of (a); in the trained gradient lifting tree model, TmRepresenting the m-th weak arbiter, i.e. the m-th decision tree model, and using Tm(x) When the input sample is x, wherein the sample x contains multiple features P, the decision tree model TmA corresponding path at the time of prediction; the higher the importance expectation of the feature P indicates that the more important the feature is for model decision making; all features to be obtained
Figure BDA0002552282610000036
And sequencing the models from large to small to serve as the explanation extracted from the gradient lifting tree model and serve as the explanation of the initial complex model.
The invention is a universal interpretable method capable of extracting and interpreting datasets in various fields, such as natural language processing datasets, image datasets and tabular datasets. Meanwhile, the method can use a sub-module selection method to promote and apply the local interpretation to the global interpretation of the acquired model.
Drawings
FIG. 1 is a detailed flowchart of example 1 of the present invention;
FIG. 2 is a diagram of an initial model framework for image processing according to embodiment 2 of the present invention;
FIG. 3 is a diagram of an initial model framework of natural language processing according to embodiment 1 of the present invention;
FIG. 4 is a table task initial model framework diagram according to embodiment 3 of the present invention;
FIG. 5 is a flow chart of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to be limiting.
The invention provides a local interpretation algorithm based on a gradient lifting tree model, which is characterized in that relative importance is calculated through nodes passing through in a sample prediction process, and importance ranking of input features is obtained through ranking to obtain local interpretable results. The invention is a universal interpretable method capable of extracting and interpreting datasets in various fields, such as natural language processing datasets, image datasets and table datasets.
The process of the invention is as shown in fig. 5, and comprises the steps of initial complex model training, extracting input features and output soft labels, gradient lifting tree model training, extracting feature importance, and sequencing feature importance to generate interpretation.
Firstly, dividing original sample data into a training set and a test set; secondly, training an original model on a training set, and extracting soft label output of the original model by using a knowledge distillation method; then, training a gradient lifting tree model by using the input of the training set and the output of the soft label; then, the feature importance of the sample is calculated on a single sample on the test set by using the feature calculation method of the invention, and the interpretation of the sample is obtained by sequencing.
The invention provides a formula for calculating the feature importance, which is as follows:
Figure BDA0002552282610000041
wherein,
Figure BDA0002552282610000042
expressing the importance expectation of a feature P, which is composed of K data, PkThe k data of the characteristic is obtained; imp (P)k) I.e., the feature importance of the kth data of the feature, wherein
Figure BDA0002552282610000043
Figure BDA0002552282610000044
Imp(Pk) Each weight gamma inmhm(x) Namely the contribution degree of the m-th weak discriminator in the trained gradient lifting tree model to the whole model,
Figure BDA0002552282610000045
the m weak discriminator defined as normalization has an input of PkImpurity reduction rate of time, which is P used in node division when weak discriminator predicts XkThe impurity reduction amount of (A) is less than the total impurity reduction amount of (B)The ratio of the reduction in purity; the purity of the reaction is calculated by the characteristic PkCalculated by dividing node n through the decision tree model, i.e. Gain (P)k,n)=i(n)-pLi(nL)-pRi(nR) Wherein i (n) represents the purity of the node split, and pLAnd pRRespectively represent that the sample reaches n after splittingLAnd nRA moiety of (a); in the trained gradient lifting tree model, TmRepresenting the m-th weak arbiter, i.e. the m-th decision tree model, and using Tm(x) When the input sample is x (the sample x contains a plurality of features P), the decision tree model TmA corresponding path at the time of prediction; the higher the importance expectation of the feature P indicates that the more important the feature is for model decision making; all features to be obtained
Figure BDA0002552282610000046
And sequencing the models from large to small to serve as the explanation extracted from the gradient lifting tree model and serve as the explanation of the initial complex model.
Example 1
The following is set forth for the main parts and implementation strategies of the present invention:
fig. 1 is a specific process of embodiment 1, including training of an initial complex model, extracting input features and output soft labels, training a gradient lifting tree model, extracting feature importance from the gradient lifting tree model, and sequencing to obtain an explanation. The method comprises the following steps: performing parameter training on the initial complex model by using a training data set, and extracting input features
Firstly, a training data set is constructed by using a natural language processing data set SST2, and then an initial complex model corresponding to a natural language processing task is designed and constructed, wherein the structure of the initial complex model is shown in FIG. 3 and comprises a word embedding layer, a long-short term memory network layer, an attention layer, a random inactivation layer and a full connection network layer. Training of the training data set is performed using the initial complex model. After the model is trained, the output of the word embedding layer is extracted as the input characteristic.
Step two: knowledge distillation is carried out on the trained model to obtain soft label output of input characteristics
Knowledge distillation is a technique commonly used in model compression and migration learning to transfer the knowledge of one complex network to another simpler model. It is actually very difficult to directly explain a complex model, and knowledge distillation is a very useful technique for enabling explanation, and a model with low interpretability is distilled onto a model with high interpretability, and the former can be explained by explaining the latter. Therefore, the invention uses a knowledge distillation method to extract a soft label corresponding to the input feature from the final output layer of the initial model, and the formula of the soft label is as follows:
Figure BDA0002552282610000051
wherein z isiAnd (4) outputting the logits of the model, wherein T is the temperature of knowledge distillation, and i corresponds to the number of the types of the prediction tasks. For natural language processing tasks, T is set to 2. Obtaining the output soft Label corresponding to the input characteristic through calculationsoft. Step three: training of gradient boosting trees using input features and output soft labels
And (4) constructing the input features and the soft labels obtained in the step two into a data set, training by using a gradient lifting tree model, and correspondingly adjusting parameters of the constructed gradient lifting tree for different tasks so as to ensure that the precision of the trained gradient lifting tree is high enough, wherein the acquisition of the high-precision gradient lifting tree model is a precondition for better explaining the extracted model. For the natural language processing task, setting the parameter M of the gradient lifting tree to be 100, that is, the gradient lifting tree model includes 100 weak classifiers, that is, 100 decision tree models.
Step four: extracting feature importance from the trained gradient lifting tree model, sequencing the feature importance, and selecting the features with higher feature importance as the explanation of the initial complex model
By utilizing the calculation method provided by the invention, any sample is selected from the test set, the sample is predicted by using the gradient lifting tree model trained in the step three,calculating the importance of the feature for each feature of the sample while predicting the sample, wherein the higher the expectation of the importance of the feature P is, the more important the feature is for model decision; all features to be obtained
Figure BDA0002552282610000052
And sequencing the models from large to small to serve as the explanation extracted from the gradient lifting tree model and serve as the explanation of the initial complex model, namely the local explanation. In the natural language processing task, the input sample is a sentence, each word in the sentence is the characteristic of the sample, the characteristic importance of each word can be obtained through the calculation, the importance ranking of the words can be obtained through ranking, and the important word is used as the explanation of the sample.
Example 2
The following is set forth for the main parts and implementation strategies of the present invention:
the method comprises the following steps: performing parameter training on the initial complex model by using a training data set, and extracting input features
Firstly, an image processing data set MNIST is used for constructing a training data set, then an initial complex model corresponding to an image processing task is designed and constructed, and the structure of the model is shown in fig. 2 and comprises a convolutional layer, an activation layer, a pooling layer, a random deactivation layer and a full connection network layer. Training of the training data set is performed using the initial complex model. The image two-dimensional pixel data is used directly as input features.
Step two: knowledge distillation is carried out on the trained model to obtain soft label output of input characteristics
For the image processing task, T is set to 1. Obtaining the output soft Label corresponding to the input characteristic through calculationsoft
The subsequent steps were the same as in example 1.
Example 3
The following is set forth for the main parts and implementation strategies of the present invention:
the method comprises the following steps: performing parameter training on the initial complex model by using a training data set, and extracting input features
Firstly, a training data set is constructed by using a table processing data set adult, and then an initial complex model corresponding to an image processing task is designed and constructed, wherein the structure of the model is shown in fig. 4 and comprises a fully-connected network layer. Training of the training data set is performed using the initial complex model. The form data is used directly as input features.
Step two: knowledge distillation is carried out on the trained model to obtain soft label output of input characteristics
For the table processing task, T is set to 1. Obtaining the output soft Label corresponding to the input characteristic through calculationsoft
The subsequent steps were the same as in example 1.
The specific embodiments of the present invention have been described above with reference to the accompanying drawings. It will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention. The technical solutions and solutions of the present invention, after being modified and substituted by equivalents, are all within the scope of the present invention as claimed.

Claims (5)

1. A local interpretable method based on a gradient lifting tree is characterized by comprising the following specific steps:
step 1: performing parameter training on the initial complex model by using a training data set, and extracting input features;
step 2: knowledge distillation is carried out on the trained model to obtain soft label output of input characteristics;
and step 3: training a gradient lifting tree model by using the input features obtained in the step 1 and the output soft labels obtained in the step 2 to obtain a trained gradient lifting tree model;
and 4, step 4: extracting feature importance from the trained gradient lifting tree model, sequencing the feature importance, and selecting the features with higher feature importance as the explanation of the initial complex model.
2. The gradient spanning tree-based locally interpretable method of claim 1, wherein the training dataset of step 1 is a natural language dataset, an image dataset, and a table dataset; the initial model is a long-short term memory network, a convolutional neural network and a multilayer perceptron based on an attention mechanism; the parameter training is carried out: the natural language data set uses a long-short term memory network based on an attention mechanism; the image dataset using a convolutional neural network; the tabular data set uses a multi-layer perceptron.
3. The gradient-boosting tree-based locally interpretable method according to claim 1, wherein the knowledge distillation performed in step 2 obtains a soft label output of the input features, and the soft label output formula is as follows:
Figure FDA0002552282600000011
wherein, LabelsoftIs referred to as soft tag output, ziThe method refers to the final output of an initial model, T is a temperature parameter, i refers to the prediction of the ith class, and j refers to the prediction class of the total prediction tasks.
4. The local interpretable method of claim 1, wherein the trained gradient-boosting tree model obtained in step 3 comprises M weak classifiers, each weak classifier being a decision tree model, wherein M is a parameter of the gradient-boosting tree model.
5. The local interpretable method of claim 1, wherein the step 4 of extracting feature importance from the trained gradient lifting tree model, sorting the feature importance, and selecting features with higher feature importance as an interpretation of the initial complex model specifically comprises:
the calculation formula of the feature importance is as follows:
Figure FDA0002552282600000012
wherein,
Figure FDA0002552282600000013
expressing the importance expectation of a feature P, which is composed of K data, PkThe k data of the characteristic is obtained; imp (P)k) I.e., the feature importance of the kth data of the feature, wherein
Figure FDA0002552282600000014
Figure FDA0002552282600000015
Imp(Pk) Each weight gamma inmhm(x) Namely the contribution degree of the m-th weak discriminator in the trained gradient lifting tree model to the whole model,
Figure FDA0002552282600000021
the m weak discriminator defined as normalization has an input of PkThe impurity reduction rate of the time, which is the prediction characteristic P of the weak discriminatorkIn time, P is used in node segmentationkThe impurity reduction of (a) is a ratio of the total impurity reduction of (b); the purity of the reaction is calculated by the characteristic PkCalculated by dividing node n through the decision tree model, i.e. Gain (P)k,n)=i(n)-pLi(nL)-pRi(nR) Wherein i (n) represents the purity of the node split, and pLAnd pRRespectively represent that the sample reaches n after splittingLAnd nRA moiety of (a); in the trained gradient lifting tree model, TmRepresenting the m-th weak arbiter, i.e. the m-th decision tree model, and using Tm(x) When the input sample is x, wherein the sample x contains multiple features P, the decision tree model TmA corresponding path at the time of prediction; the higher the importance expectation of the feature P indicates that the more important the feature is for model decision making; all features to be obtained
Figure FDA0002552282600000022
And sequencing the models from large to small to serve as the explanation extracted from the gradient lifting tree model and serve as the explanation of the initial complex model.
CN202010580912.6A 2020-06-23 2020-06-23 Local interpretable method based on gradient lifting tree Active CN111753995B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010580912.6A CN111753995B (en) 2020-06-23 2020-06-23 Local interpretable method based on gradient lifting tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010580912.6A CN111753995B (en) 2020-06-23 2020-06-23 Local interpretable method based on gradient lifting tree

Publications (2)

Publication Number Publication Date
CN111753995A true CN111753995A (en) 2020-10-09
CN111753995B CN111753995B (en) 2024-06-28

Family

ID=72676993

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010580912.6A Active CN111753995B (en) 2020-06-23 2020-06-23 Local interpretable method based on gradient lifting tree

Country Status (1)

Country Link
CN (1) CN111753995B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113240119A (en) * 2021-04-08 2021-08-10 南京大学 Cross-model distilling device for game AI strategy explanation
CN113902978A (en) * 2021-09-10 2022-01-07 长沙理工大学 Interpretable SAR image target detection method and system based on deep learning
CN114841233A (en) * 2022-03-22 2022-08-02 阿里巴巴(中国)有限公司 Path interpretation method, device and computer program product
CN116704208A (en) * 2023-08-04 2023-09-05 南京理工大学 Local interpretable method based on characteristic relation

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180158552A1 (en) * 2016-12-01 2018-06-07 University Of Southern California Interpretable deep learning framework for mining and predictive modeling of health care data
CN108363714A (en) * 2017-12-21 2018-08-03 北京至信普林科技有限公司 A kind of method and system for the ensemble machine learning for facilitating data analyst to use
CN108960434A (en) * 2018-06-28 2018-12-07 第四范式(北京)技术有限公司 The method and device of data is analyzed based on machine learning model explanation
CN109978050A (en) * 2019-03-25 2019-07-05 北京理工大学 Decision Rules Extraction and reduction method based on SVM-RF
CN110443346A (en) * 2019-08-12 2019-11-12 腾讯科技(深圳)有限公司 A kind of model explanation method and device based on input feature vector importance
CN111027060A (en) * 2019-12-17 2020-04-17 电子科技大学 Knowledge distillation-based neural network black box attack type defense method
CN111091179A (en) * 2019-12-03 2020-05-01 浙江大学 Heterogeneous depth model mobility measurement method based on attribution graph
CN111160473A (en) * 2019-12-30 2020-05-15 深圳前海微众银行股份有限公司 Feature mining method and device for classified labels
CN111311400A (en) * 2020-03-30 2020-06-19 百维金科(上海)信息科技有限公司 Modeling method and system of grading card model based on GBDT algorithm

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180158552A1 (en) * 2016-12-01 2018-06-07 University Of Southern California Interpretable deep learning framework for mining and predictive modeling of health care data
CN108363714A (en) * 2017-12-21 2018-08-03 北京至信普林科技有限公司 A kind of method and system for the ensemble machine learning for facilitating data analyst to use
CN108960434A (en) * 2018-06-28 2018-12-07 第四范式(北京)技术有限公司 The method and device of data is analyzed based on machine learning model explanation
CN109978050A (en) * 2019-03-25 2019-07-05 北京理工大学 Decision Rules Extraction and reduction method based on SVM-RF
CN110443346A (en) * 2019-08-12 2019-11-12 腾讯科技(深圳)有限公司 A kind of model explanation method and device based on input feature vector importance
CN111091179A (en) * 2019-12-03 2020-05-01 浙江大学 Heterogeneous depth model mobility measurement method based on attribution graph
CN111027060A (en) * 2019-12-17 2020-04-17 电子科技大学 Knowledge distillation-based neural network black box attack type defense method
CN111160473A (en) * 2019-12-30 2020-05-15 深圳前海微众银行股份有限公司 Feature mining method and device for classified labels
CN111311400A (en) * 2020-03-30 2020-06-19 百维金科(上海)信息科技有限公司 Modeling method and system of grading card model based on GBDT algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
NICHOLAS FROSST ET AL: "Distilling a Neural Network Into a Soft Decision Tree", ARXIV:1711.09784V1, 27 November 2017 (2017-11-27), pages 1 - 8, XP080840510 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113240119A (en) * 2021-04-08 2021-08-10 南京大学 Cross-model distilling device for game AI strategy explanation
CN113240119B (en) * 2021-04-08 2024-03-19 南京大学 Cross-model distillation device for game AI strategy interpretation
CN113902978A (en) * 2021-09-10 2022-01-07 长沙理工大学 Interpretable SAR image target detection method and system based on deep learning
CN114841233A (en) * 2022-03-22 2022-08-02 阿里巴巴(中国)有限公司 Path interpretation method, device and computer program product
CN114841233B (en) * 2022-03-22 2024-05-31 阿里巴巴(中国)有限公司 Path interpretation method, apparatus and computer program product
CN116704208A (en) * 2023-08-04 2023-09-05 南京理工大学 Local interpretable method based on characteristic relation
CN116704208B (en) * 2023-08-04 2023-10-20 南京理工大学 Local interpretable method based on characteristic relation

Also Published As

Publication number Publication date
CN111753995B (en) 2024-06-28

Similar Documents

Publication Publication Date Title
US20210166112A1 (en) Method for neural network and apparatus performing same method
Titsias et al. Spike and slab variational inference for multi-task and multiple kernel learning
CN111753995B (en) Local interpretable method based on gradient lifting tree
US11996116B2 (en) Methods and systems for implementing on-device non-semantic representation fine-tuning for speech classification
CN110443372B (en) Transfer learning method and system based on entropy minimization
CN113963165B (en) Small sample image classification method and system based on self-supervision learning
Liu et al. EACP: An effective automatic channel pruning for neural networks
CN109977094A (en) A method of the semi-supervised learning for structural data
CN116644755B (en) Multi-task learning-based few-sample named entity recognition method, device and medium
CN112861936A (en) Graph node classification method and device based on graph neural network knowledge distillation
Jin et al. Cold-start active learning for image classification
CN113159072B (en) Online ultralimit learning machine target identification method and system based on consistency regularization
US20220036150A1 (en) System and method for synthesis of compact and accurate neural networks (scann)
Qin et al. Active learning with extreme learning machine for online imbalanced multiclass classification
CN111259938B (en) Manifold learning and gradient lifting model-based image multi-label classification method
Sharma et al. Transfer learning and its application in computer vision: A review
Su et al. Low‐Rank Deep Convolutional Neural Network for Multitask Learning
US20220138425A1 (en) Acronym definition network
CN111783688B (en) Remote sensing image scene classification method based on convolutional neural network
Xie et al. Scalenet: Searching for the model to scale
Shen et al. On image classification: Correlation vs causality
Shetty et al. Comparative analysis of different classification techniques
Yang et al. iCausalOSR: invertible Causal Disentanglement for Open-set Recognition
CN115063374A (en) Model training method, face image quality scoring method, electronic device and storage medium
Zu et al. Consecutive layer collaborative filter similarity for differentiable neural network pruning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant