WO2021012220A1 - Evasion attack method and device for integrated tree classifier - Google Patents

Evasion attack method and device for integrated tree classifier Download PDF

Info

Publication number
WO2021012220A1
WO2021012220A1 PCT/CN2019/097532 CN2019097532W WO2021012220A1 WO 2021012220 A1 WO2021012220 A1 WO 2021012220A1 CN 2019097532 W CN2019097532 W CN 2019097532W WO 2021012220 A1 WO2021012220 A1 WO 2021012220A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
feature
shortest path
classifier
decision
Prior art date
Application number
PCT/CN2019/097532
Other languages
French (fr)
Chinese (zh)
Inventor
张福勇
王艺
李宽
Original Assignee
东莞理工学院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 东莞理工学院 filed Critical 东莞理工学院
Priority to PCT/CN2019/097532 priority Critical patent/WO2021012220A1/en
Publication of WO2021012220A1 publication Critical patent/WO2021012220A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements

Definitions

  • the present invention relates to the technical field of network security research, in particular to an evasion attack method and device for an integrated tree classifier.
  • Machine learning algorithms such as random forests, aim to learn predictive models through training data to distinguish malicious samples from legitimate samples.
  • security-related tasks involve smart adversaries who can analyze the vulnerabilities of the learning-based model and attack based on the system output.
  • traditional learning-based classifiers are vulnerable to evasion attacks in security-based applications.
  • the attacker can manipulate the sample to evade system detection.
  • malware detection in order to make the malicious code evade the detection of the system, the attacker will modify some typical malicious statements in the malicious code (that is, statements that often appear in malicious code but rarely appear in normal code. Statements, malicious code detection systems usually detect malicious code based on these statements), or add some normal statements to the malicious code (that is, statements that frequently appear in normal code but rarely appear in malicious code). In spam filtering, attackers can disguise their email behavior by spelling mistakes or adding normal words.
  • the existing vulnerability analysis for learning-based classification models mainly uses gradient-based attack methods, which are only effective for models with differentiable loss functions, and cannot be applied to ensemble tree classification models.
  • the ensemble tree classifier attack method based on mixed integer linear programming proposed by Kantchelian et al. can only be applied to white-box attack scenarios, and the algorithm complexity is high, which cannot be applied to larger data sets.
  • the inquiry-based black box attack method proposed by Cheng et al. requires that the feature value must be a continuous real value, which cannot be applied to the widely used binary features in the field of network security, and this method is not specifically for the integrated tree classifier Design, the attack effect is poor.
  • the technical problem to be solved by the present invention is to provide an evasion attack method and device for the ensemble tree classifier, so as to conduct in-depth research on the black box attack method for the ensemble tree classifier, so as to provide for the design of robust classifiers Basis and reference.
  • an embodiment of the present invention provides an evasion attack method for an integrated tree classifier, including the steps:
  • step (3) Determine whether the current feature modification times reach the preset maximum modification times threshold; if not, according to the alternative classifier, use the shortest path algorithm and the preset evasion attack strategy to find the optimal modification feature, and according to the
  • the optimal modification feature modifies the corresponding feature of the original input sample, generates a trial sample, and executes step (4); if it is, the operation ends;
  • step (3) Use the target classifier to classify the trial samples to obtain trial classification results, and determine whether the trial classification results are consistent with the pre-stored original classification results; if yes, perform step (3); if not, then output The trial sample; wherein the original classification result is a result of the target classifier classifying the original input sample.
  • the shortest path algorithm and the preset evasion attack strategy are used to find the optimal modification feature, specifically:
  • shortest path algorithm search for the shortest target decision path of each decision tree according to the target classification result type, and obtain the shortest path set of the target;
  • the weights of each feature are accumulated and the accumulated weights of each feature are compared to obtain the optimal modified feature.
  • the target shortest path set includes a first target shortest path set and a second target shortest path set
  • the shortest path algorithm is used to search the target shortest decision path of each decision tree according to the target classification result type to obtain the target shortest path set, specifically:
  • the alternative classifier is divided into a first type decision tree and a second type decision tree; wherein the decision value of the first type decision tree is inconsistent with the target classification result type, the The decision value of the second type of decision tree is consistent with the target classification result type;
  • the shortest path algorithm is used to find the shortest path of the first target in the first type of decision tree to obtain the shortest path set of the first target.
  • the shortest path algorithm is used to find the shortest target of the second type of decision tree. Path, the shortest path set of the second target is obtained.
  • the assigned weight value of the feature in the first target shortest path set is a positive number
  • the assigned weight value of the feature in the second target shortest path set is a negative number
  • the step of accumulating the weight of each feature and comparing the cumulative weight of each feature to obtain the optimal modified feature is specifically:
  • the weight of each feature is accumulated and the cumulative weight of each feature is compared, and the feature with the largest cumulative weight is found as the optimal modified feature.
  • the features in the first target shortest path set are weighted according to the formula 1/10 n-1 , where n represents the position order of the feature relative to the decision path.
  • the present invention also provides an evasion attack device for the integrated tree classifier, which includes a data acquisition module, an alternative classifier training module, a feature modification module, and an evasion attack detection module; wherein,
  • the data acquisition module is used to acquire original input samples, a replacement data set, and a learning model of a target classifier, where the replacement data set is a data set that has consistent distribution characteristics with the target classifier training data;
  • the alternative classifier training module is configured to train according to the alternative data set and the learning model to obtain an alternative classifier
  • the feature modification module is used to determine whether the current feature modification times reach the preset maximum modification times threshold; if not, according to the alternative classifier, the shortest path algorithm and the preset evasion attack strategy are used to find the optimal modification feature , And modify the corresponding feature of the original input sample according to the optimal modification feature to generate a trial sample; if yes, end the operation;
  • the evasion attack detection module is used to classify the trial sample by using the target classifier to obtain a trial classification result, and determine whether the trial classification result is consistent with the prestored original classification result; if so, repeat the feature modification process If not, output the trial sample; wherein the original classification result is the result of the target classifier classifying the original input sample.
  • the shortest path algorithm and the preset evasion attack strategy are used to find the optimal modification feature, specifically:
  • shortest path algorithm search for the shortest target decision path of each decision tree according to the target classification result type, and obtain the shortest path set of the target;
  • the weights of each feature are accumulated and the accumulated weights of each feature are compared to obtain the optimal modified feature.
  • the target shortest path set includes a first target shortest path set and a second target shortest path set
  • the shortest path algorithm is used to search the target shortest decision path of each decision tree according to the target classification result type to obtain the target shortest path set, specifically:
  • the alternative classifier is divided into a first type decision tree and a second type decision tree; wherein the decision value of the first type decision tree is inconsistent with the target classification result type, the The decision value of the second type of decision tree is consistent with the target classification result type;
  • the shortest path algorithm is used to find the shortest path of the first target in the first type of decision tree to obtain the shortest path set of the first target.
  • the shortest path algorithm is used to find the shortest target of the second type of decision tree. Path, the shortest path set of the second target is obtained.
  • the assigned weight value of the feature in the first target shortest path set is a positive number
  • the assigned weight value of the feature in the second target shortest path set is a negative number
  • the step of accumulating the weight of each feature and comparing the cumulative weight of each feature to obtain the optimal modified feature is specifically:
  • the weight of each feature is accumulated and the cumulative weight of each feature is compared, and the feature with the largest cumulative weight is found as the optimal modified feature.
  • the features in the first target shortest path set are weighted according to the formula 1/10 n-1 , where n represents the position order of the feature relative to the decision path.
  • the present invention has the following beneficial effects:
  • the present invention starts with the decision structure of the decision tree, finds the decision path of each base classifier, and analyzes its performance from the decision path set of the ensemble tree classifier. Misleading the key features of its decision, and finally realize the attack by modifying the key decision features.
  • the black box attack method of the integrated tree classifier gradient boosting tree, random forest, etc.
  • the integrated tree classifier can be deeply studied, so as to provide basis and reference for designing a robust integrated tree classifier.
  • FIG. 1 is a schematic flowchart of an evasion attack method for an integrated tree classifier provided by an embodiment of the present invention
  • FIG. 2 is a schematic diagram of the structure of an integrated tree classifier provided by an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of the structure of the first type of decision tree in the integrated classifier provided by an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a second type of decision tree in an integrated classifier provided by an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of an attack evasion process and model provided by an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of an evasion attack device for an integrated tree classifier provided by an embodiment of the present invention.
  • an embodiment of the present invention provides an evasion attack method for an integrated tree classifier, including the steps:
  • step (3) Determine whether the current feature modification times reach the preset maximum modification times threshold; if not, according to the alternative classifier, use the shortest path algorithm and the preset evasion attack strategy to find the optimal modification feature, and according to the
  • the optimal modification feature modifies the corresponding feature of the original input sample, generates a trial sample, and executes step (4); if it is, the operation ends;
  • step (3) Use the target classifier to classify the trial samples to obtain trial classification results, and determine whether the trial classification results are consistent with the pre-stored original classification results; if yes, perform step (3); if not, then output The trial sample; wherein the original classification result is a result of the target classifier classifying the original input sample.
  • the existing vulnerability analysis for learning-based classification models mainly uses gradient-based attack methods, which are only effective for models with differentiable loss functions and cannot be applied to ensemble tree classification models.
  • the ensemble tree classifier attack method based on mixed integer linear programming proposed by Kantchelian et al. can only be applied to white-box attack scenarios, and the algorithm complexity is high, which cannot be applied to larger data sets.
  • the query-based black box attack method proposed by Cheng et al. requires that the feature value must be a continuous real value, which cannot be applied to the widely used binary features in the field of network security, and this method is not specifically designed for integrated tree classifiers. The attack effect is poor.
  • the present invention starts with the decision structure of the decision tree, finds the decision path of each base classifier, and analyzes it from the set of decision paths of the ensemble tree classifier to be misleading.
  • the key feature of its decision-making is finally realized by modifying the key decision feature to realize the attack.
  • the key issue is how to find the key decision feature from the set of decision paths.
  • the present invention starts with the voting-based integration strategy of the ensemble tree classifier, and finds the key features that can change (or mislead) the decision value of the majority base classifier from the ensemble tree classifier.
  • the method finds a key in each cycle process Decide the characteristics, and modify the corresponding characteristic data of the input sample to generate an attack sample. If the attack cannot be successful, find the next key characteristic on this basis until the attack is successful or the maximum number of modified characteristics is reached.
  • the attacker's purpose is to mislead the target model's decision by estimating the decision boundary of the target model and manipulating input samples.
  • the output of the target model is c(x).
  • the attack strategy is to modify x to the minimum and find a sample x'such that c(x') ⁇ c(x).
  • d(x,x') is a distance function describing the modifier.
  • the evasion attack problem can be described as:
  • x is the input sample
  • c(x) is the output category of the classification model to x
  • sample x' is the attack sample.
  • the meaning of function (1) is to achieve the purpose of changing the output category (ie attack) by modifying x to the minimum.
  • the knowledge of the target system can be divided into four levels: 1) training data D; 2) feature space X; 3) learning algorithm F; 4) target model parameter w.
  • Black box attack This scenario assumes that the attacker has a certain understanding of the target system.
  • the attacker knows the learning algorithm F and the feature space X, but does not know the training data D and the target model parameter w.
  • the attacker can collect an alternative data set D'through the Internet or other sources, and use this data set to estimate the target model parameters w'.
  • the attacker may also obtain a subset of the original training set.
  • the ensemble tree classifier f:R n ⁇ R shown in Figure 2 is a set T composed of multiple decision trees.
  • the decision tree T i ⁇ T is a binary tree, in which each internal node n ⁇ T i .nodes with predicate logic. If the result of the predicate is true, the output edge points to its left child n.leftchild, otherwise, the output edge points to its right child n.rightchild.
  • Each leaf node has a l ⁇ T i .leaves category value l.class ⁇ R.
  • T i decision tree path is the path from a leaf node to which the root node.
  • T i classification results of sample x is a value T i .class l.class classification leaf node of the path.
  • the decision value f(x) of the ensemble tree is the result of the majority vote of all decision trees.
  • the embodiment of the present invention mainly aims at a binary classification tree based on binary features.
  • a certain binary classification tree parameter is as follows: R ⁇ ⁇ -1, 1 ⁇ , x i ⁇ ⁇ 0, 1 ⁇ .
  • d( ⁇ , ⁇ ) corresponds to L 0 norm or Hamming distance, which means that the feature can only be added (from 0 to 1) or deleted (from 1 to 0) from the initial sample x.
  • the decision value of more than half of the decision trees in the set should be -1.
  • the basic idea of the attack algorithm is to modify the minimum number of features so that more than half of the trees get a decision value of -1.
  • the shortest path algorithm and the preset evasion attack strategy are used to find the optimal modification feature, specifically:
  • the weights of each feature are accumulated and the accumulated weights of each feature are compared to obtain the optimal modified feature.
  • the target shortest path set includes a first target shortest path set and a second target shortest path set
  • the shortest path algorithm is used to search the target shortest decision path of each decision tree according to the target classification result type to obtain the target shortest path set, specifically:
  • the alternative classifier is divided into a first type decision tree and a second type decision tree; wherein the decision value of the first type decision tree is inconsistent with the target classification result type, the The decision value of the second type of decision tree is consistent with the target classification result type;
  • the shortest path algorithm is used to find the shortest path of the first target in the first type of decision tree to obtain the shortest path set of the first target.
  • the shortest path algorithm is used to find the shortest target of the second type of decision tree. Path, the shortest path set of the second target is obtained.
  • the weight assigned to the feature in the first target shortest path set is a positive number, and the weight assigned to the feature in the second target shortest path set is a negative number;
  • the step of accumulating the weight of each feature and comparing the cumulative weight of each feature to obtain the optimal modified feature is specifically:
  • the weight of each feature is accumulated and the cumulative weight of each feature is compared, and the feature with the largest cumulative weight is found as the optimal modified feature.
  • the features in the first target shortest path set are weighted according to the formula 1/10 n-1 , where n represents the position order of the feature relative to the decision path.
  • T 1 and T 2 are divided into the first type of decision tree
  • T 3 is divided into the first type of decision tree.
  • the second type of decision tree since the classification result type of most decision trees is 1, it can be determined that the target classification result type is -1. Therefore, T 1 and T 2 are divided into the first type of decision tree, and T 3 is divided into the first type of decision tree.
  • the second type of decision tree since the classification result type of most decision trees is 1, it can be determined that the target classification result type is -1. Therefore, T 1 and T 2 are divided into the first type of decision tree, and T 3 is divided into the first type of decision tree. The
  • the features that can be modified are x 2 , x 4 , and x 7 .
  • the first thing to consider is to modify as few features as possible so that the decision value of as many trees with the current decision value of 1 (the first type of decision tree) becomes -1. For a tree whose current decision value is 1, we need to find the shortest path from each internal node in the tree's classification path to the leaf node with the value -1.
  • Algorithm Shortest path algorithm.
  • T integrated tree classifier
  • x input sample
  • the feature x 2 appears twice (P 21 and P 22 ) as the first feature in the path that T 2 needs to be modified, and both paths are the shortest paths.
  • the path P 21 can be selected as the shortest path of T 2 . Therefore, the shortest path set (first target shortest path set) in the tree of T that changes the decision value from 1 to -1 is shown in equation (4).
  • each tree has multiple shortest paths. We need to find out which feature is the best for each modification, so that more trees can get a decision value of -1.
  • the weight assignment rule can be to assign the weight 1/10 n-1 to the nth feature in a shortest path. It should be noted that the actual application is not limited to this assignment rule.
  • the feature x 2 in the third path is assigned a weight value of 1
  • x 1 is assigned a weight value of 0.1
  • the feature x 3 in the fourth path is assigned a weight value 1
  • x 8 is assigned a weight of 0.1.
  • the optimal modified feature can be found in the first target shortest path set, so that the decision value of as many trees as possible in the integrated tree changes from 1 to -1.
  • the above process only considers the tree with the current decision value of 1, and there may be trees with the current decision value of -1 in the integrated tree.
  • the corresponding feature of the input sample x needs to be modified. Since the modified feature will cause the classification path of multiple trees in the random forest to change, it is necessary to recalculate the sets P and P', and select the next optimal feature according to the path in the new set, until the detection is evaded or reached Maximum modification limit ("Evasion of detection” means that the attack is successful, and "Maximum modification limit” means that the attack is not successful when the maximum number of modifications is reached).
  • the specific process of the evasion detection model is shown in Algorithm 2.
  • the symbol P ijk used here refers to the k-th feature of the j-th path in the shortest path set of the i-th tree with a decision value of 1.
  • P ijk .weight refers to the weight of P ijk .
  • T ensemble tree classifier
  • x input sample
  • m max maximum number of modified features.
  • FIG. 5 in order to more intuitively illustrate the main working principle of the present invention, in the embodiment of the present invention, we assume that the attacker knows the learning model f and the replacement data set D'that has a consistent distribution with the training data. First, the attacker needs to train an alternative ensemble tree model based on his own knowledge. Secondly, use evasion attack method to locate and modify the key features of the input sample x. Finally, use the modified sample x'to attack the target classifier.
  • attack samples (adversarial samples) that successfully perform evasion attacks
  • the improvement of the decision tree can be significantly improved. safety.
  • the ensemble tree model (including random forest, gradient boosting tree, etc.) is a commonly used classification model because it is easy to use and can significantly improve the classification accuracy.
  • the embodiment of the present invention proposes a new evasion attack method for the integrated tree classifier to study its security against evasion attacks.
  • the present invention uses the shortest path algorithm to find the least features that can change the decision value of the ensemble tree classifier.
  • the time complexity of finding a modified feature in this scheme is
  • the time complexity of Kantchelian's method to complete the same task is Table 1 shows the comparison of the present invention with the Kantchelian method and Cheng's method.
  • the data held by the attacker can be divided into 20%, 50%, 80%, and 100% to evaluate the security of the classifier when the attacker has different amounts of data.
  • the security evaluation of the classifier adopts two evaluation criteria: Hardness of evasion and Evasion rate.
  • the present invention starts with the decision structure of the decision tree, finds the decision path of each base classifier, and analyzes its performance from the decision path set of the ensemble tree classifier. Misleading the key features of its decision, and finally realize the attack by modifying the key decision features.
  • black box attack methods for widely used binary feature-based integrated tree classifiers (gradient boosting trees, random forests, etc.), thereby providing a basis for designing robust classifiers and reference.
  • the present invention also provides an evasion attack device for the integrated tree classifier, including a data acquisition module 1, an alternative classifier training module 2, a feature modification module 3, and an evasion attack probe Module 4; among them,
  • the data acquisition module 1 is configured to acquire original input samples, a replacement data set, and a learning model of a target classifier, where the replacement data set is a data set that has consistent distribution characteristics with the target classifier training data;
  • the alternative classifier training module 2 is configured to train according to the alternative data set and the learning model to obtain an alternative classifier
  • the feature modification module 3 is used to determine whether the current feature modification times reach the preset maximum modification times threshold; if not, the shortest path algorithm and the preset evasion attack strategy are used to find the optimal modification according to the alternative classifier Feature, and modify the corresponding feature of the original input sample according to the optimal modified feature to generate a trial sample; if yes, end the operation;
  • the evasive attack detection module 4 is used to classify the trial samples by using the target classifier to obtain a trial classification result, and determine whether the trial classification result is consistent with the pre-stored original classification result; if so, repeat feature modification Process; if not, output the trial sample; wherein, the original classification result is the result of the target classifier classifying the original input sample.
  • the shortest path algorithm and the preset evasion attack strategy are used to find the optimal modification feature, specifically:
  • shortest path algorithm search for the shortest target decision path of each decision tree according to the target classification result type, and obtain the shortest path set of the target;
  • the weights of each feature are accumulated and the accumulated weights of each feature are compared to obtain the optimal modified feature.
  • the target shortest path set includes a first target shortest path set and a second target shortest path set
  • the shortest path algorithm is used to search the target shortest decision path of each decision tree according to the target classification result type to obtain the target shortest path set, specifically:
  • the alternative classifier is divided into a first type decision tree and a second type decision tree; wherein the decision value of the first type decision tree is inconsistent with the target classification result type, the The decision value of the second type of decision tree is consistent with the target classification result type;
  • the shortest path algorithm is used to find the shortest path of the first target in the first type of decision tree to obtain the shortest path set of the first target.
  • the shortest path algorithm is used to find the shortest target of the second type of decision tree. Path, the shortest path set of the second target is obtained.
  • the assigned weight value of the feature in the first target shortest path set is a positive number
  • the assigned weight value of the feature in the second target shortest path set is a negative number
  • the step of accumulating the weight of each feature and comparing the cumulative weight of each feature to obtain the optimal modified feature is specifically:
  • the weight of each feature is accumulated and the cumulative weight of each feature is compared, and the feature with the largest cumulative weight is found as the optimal modified feature.
  • the features in the first target shortest path set are weighted according to the formula 1/10 n-1 , where n represents the position order of the feature relative to the decision path.
  • the present invention provides an evasion attack device for the integrated tree classifier, which can realize the method described in any method item embodiment of the present invention. Provides an evasion attack method for the integrated tree classifier.
  • the present invention has the following beneficial effects:
  • the present invention starts with the decision structure of the decision tree, finds the decision path of each base classifier, and analyzes its performance from the decision path set of the ensemble tree classifier. Misleading the key features of its decision, and finally realize the attack by modifying the key decision features.
  • black box attack methods for widely used binary feature-based integrated tree classifiers (gradient boosting trees, random forests, etc.), thereby providing a basis for designing robust classifiers and reference.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed in the present invention are an evasion attack method and device for an integrated tree classifier. The method comprises: acquiring a substitution data set and a learning model for training to obtain a substitution classifier; and searching an optimal modification feature by using a shortest path algorithm and an evasion attack strategy according to the substitution classifier, positioning and modifying a corresponding feature of an original input sample, and generating a tentative sample to perform tentative evasion attack on a target classifier until the evasion attack succeeds or the maximum modification frequency limit is reached. By implementing the present method, a black box attack method of the integrated tree classifier (a gradient boosting tree, a random forest, etc.) can be deeply researched, so that a basis and a reference are provided for designing a robust integrated tree classifier.

Description

一种针对集成树分类器的逃避攻击方法及装置Method and device for evading attack against integrated tree classifier 技术领域Technical field
本发明涉及网络安全研究技术领域,尤其是涉及一种针对集成树分类器的逃避攻击方法及装置。The present invention relates to the technical field of network security research, in particular to an evasion attack method and device for an integrated tree classifier.
背景技术Background technique
随着信息数据的不断增加,机器学习作为一种重要的数据分析工具已成功应用于入侵检测、恶意代码检测、垃圾邮件过滤、恶意网页检测等多个网络安全应用中。机器学习算法,如随机森林,旨在通过训练数据学习预测模型,来区分恶意样本和合法样本。与其他操作环境是静态的应用程序不同,安全相关的任务涉及智能对手,这些对手能够分析基于学习的模型的漏洞,并根据系统输出进行攻击。在这样的对抗环境中,传统的基于学习的分类器在基于安全的应用中易受到逃避攻击。在逃避攻击中,攻击者能够操纵样本以逃避系统检测。例如,在恶意代码检测的应用中,攻击者为了使恶意代码逃避系统的检测,会修改恶意代码中一些典型的恶意语句(即经常在恶意代码中出现的语句而很少在正常代码中出现的语句,恶意代码检测系统通常根据这些语句来检测恶意代码),或者在恶意代码中加入一些正常语句(即频繁出现在正常代码中而很少出现在恶意代码中的语句)。在垃圾邮件过滤中,攻击者可以通过拼写错误或添加正常的单词来伪装他们的电子邮件行为。With the continuous increase of information data, machine learning as an important data analysis tool has been successfully applied to multiple network security applications such as intrusion detection, malicious code detection, spam filtering, and malicious webpage detection. Machine learning algorithms, such as random forests, aim to learn predictive models through training data to distinguish malicious samples from legitimate samples. Unlike other applications where the operating environment is static, security-related tasks involve smart adversaries who can analyze the vulnerabilities of the learning-based model and attack based on the system output. In such a confrontational environment, traditional learning-based classifiers are vulnerable to evasion attacks in security-based applications. In an evasion attack, the attacker can manipulate the sample to evade system detection. For example, in the application of malicious code detection, in order to make the malicious code evade the detection of the system, the attacker will modify some typical malicious statements in the malicious code (that is, statements that often appear in malicious code but rarely appear in normal code. Statements, malicious code detection systems usually detect malicious code based on these statements), or add some normal statements to the malicious code (that is, statements that frequently appear in normal code but rarely appear in malicious code). In spam filtering, attackers can disguise their email behavior by spelling mistakes or adding normal words.
在对抗性环境下,要防止攻击者从训练数据和目标模型中推理敏感信息,在训练检测分类器时,就必须考虑到系统应对潜在智能攻击的鲁棒性。对抗机器学习与传统机器学习最大的不同在于设计算法时考虑的是一个博弈模型——即不但要通过学习训练集数据、优化目标函数达到算法性能,还需要预测对手在各阶段可能的攻击策略并提出相应的防守措施。面对新一代基于对抗性机器学习的智能攻击,目前针对机器学习模型的安全保护技术还不成熟。因此,研究对抗环境下机器学习方法的行为和缺陷,对网络安全相关的应用十分重要。In an adversarial environment, to prevent attackers from inferring sensitive information from training data and target models, when training the detection classifier, the robustness of the system against potential intelligent attacks must be considered. The biggest difference between adversarial machine learning and traditional machine learning is that the design of the algorithm considers a game model-that is, not only must the performance of the algorithm be achieved by learning the training set data and optimizing the objective function, but also the possible attack strategies of the opponent at each stage and Propose corresponding defensive measures. In the face of a new generation of intelligent attacks based on adversarial machine learning, the current security protection technology for machine learning models is not yet mature. Therefore, studying the behavior and defects of machine learning methods in a confrontational environment is very important for network security-related applications.
现有针对基于学习的分类模型的漏洞分析主要采用的是基于梯度的攻击方法,这类方法只对具有可微损失函数的模型有效,无法应用于集成树分类模型。目前可用于攻击集成树模型的方法主要有两种。Kantchelian等人提出的基于混合整数线性规划集成树分类器攻击方法只能应用于白盒攻击场景,且算法复杂度高,无法应用于较大的数据集。另外,Cheng等人提出的基 于问询的黑盒攻击方法,要求特征值必须是连续的实数值,无法应用于网络安全领域中使用较为广泛的二进制特征,并且此方法不是专门针对集成树分类器设计,攻击效果较差。The existing vulnerability analysis for learning-based classification models mainly uses gradient-based attack methods, which are only effective for models with differentiable loss functions, and cannot be applied to ensemble tree classification models. There are two main methods currently available for attacking the integrated tree model. The ensemble tree classifier attack method based on mixed integer linear programming proposed by Kantchelian et al. can only be applied to white-box attack scenarios, and the algorithm complexity is high, which cannot be applied to larger data sets. In addition, the inquiry-based black box attack method proposed by Cheng et al. requires that the feature value must be a continuous real value, which cannot be applied to the widely used binary features in the field of network security, and this method is not specifically for the integrated tree classifier Design, the attack effect is poor.
综上,在网络安全研究领域中,对于某些集成树分类器(梯度提升树、随机森林等)的黑盒攻击尚未得到有效研究,无法在此方面为设计鲁棒的分类器提供依据和参考。In summary, in the field of network security research, black box attacks on certain integrated tree classifiers (gradient boosting trees, random forests, etc.) have not been effectively studied, and it is impossible to provide basis and reference for the design of robust classifiers in this regard. .
发明内容Summary of the invention
本发明所要解决的技术问题在于,提供了一种针对集成树分类器的逃避攻击方法及装置,以对针对集成树分类器的黑盒攻击方法进行深入研究,从而为设计鲁棒的分类器提供依据和参考。The technical problem to be solved by the present invention is to provide an evasion attack method and device for the ensemble tree classifier, so as to conduct in-depth research on the black box attack method for the ensemble tree classifier, so as to provide for the design of robust classifiers Basis and reference.
为了解决上述技术问题,本发明实施例提供了一种针对集成树分类器的逃避攻击方法,包括步骤:In order to solve the above technical problems, an embodiment of the present invention provides an evasion attack method for an integrated tree classifier, including the steps:
(1)获取原始输入样本、替代数据集和目标分类器的学习模型,其中,所述替代数据集为与目标分类器训练数据具有一致分布特征的数据集;(1) Obtain an original input sample, a replacement data set, and a learning model of the target classifier, wherein the replacement data set is a data set that has consistent distribution characteristics with the target classifier training data;
(2)根据所述替代数据集和所述学习模型进行训练,得到替代分类器;(2) Training according to the replacement data set and the learning model to obtain a replacement classifier;
(3)判断当前特征修改次数是否达到预设的最大修改次数阈值;若否,则根据所述替代分类器,利用最短路径算法以及预设的逃避攻击策略寻找最优修改特征,并根据所述最优修改特征对所述原始输入样本的对应特征进行修改,生成试探样本,执行步骤(4);若是,则结束运行;(3) Determine whether the current feature modification times reach the preset maximum modification times threshold; if not, according to the alternative classifier, use the shortest path algorithm and the preset evasion attack strategy to find the optimal modification feature, and according to the The optimal modification feature modifies the corresponding feature of the original input sample, generates a trial sample, and executes step (4); if it is, the operation ends;
(4)利用所述目标分类器对所述试探样本进行分类得到试探分类结果,判断所述试探分类结果与预存的原始分类结果是否一致;若是,则执行步骤(3);若否,则输出所述试探样本;其中,所述原始分类结果为所述目标分类器对所述原始输入样本进行分类的结果。(4) Use the target classifier to classify the trial samples to obtain trial classification results, and determine whether the trial classification results are consistent with the pre-stored original classification results; if yes, perform step (3); if not, then output The trial sample; wherein the original classification result is a result of the target classifier classifying the original input sample.
进一步地,根据所述替代分类器,利用最短路径算法以及预设的逃避攻击策略寻找最优修改特征,具体为:Further, according to the alternative classifier, the shortest path algorithm and the preset evasion attack strategy are used to find the optimal modification feature, specifically:
根据所述替代分类器对输入样本进行分类得到的分类结果类型,确定需要得到的目标分类结果类型;Determine the target classification result type that needs to be obtained according to the classification result type obtained by classifying the input sample by the alternative classifier;
利用最短路径算法,根据所述目标分类结果类型查找每一决策树的目标最短决策路径,得到目标最短路径集合;Using the shortest path algorithm, search for the shortest target decision path of each decision tree according to the target classification result type, and obtain the shortest path set of the target;
根据所述预设的逃避攻击策略对所述目标最短路径集合中的每个特征进行权值分配;Assigning weights to each feature in the shortest path set of the target according to the preset evasion attack strategy;
对各个特征的权值进行累加并比较每个特征的累计权值,得到所述最优修改特征。The weights of each feature are accumulated and the accumulated weights of each feature are compared to obtain the optimal modified feature.
进一步地,所述目标最短路径集合包括第一目标最短路径集合和第二目标最短路径集合;Further, the target shortest path set includes a first target shortest path set and a second target shortest path set;
所述利用最短路径算法,根据所述目标分类结果类型查找每一决策树的目标最短决策路径,得到目标最短路径集合,具体为:The shortest path algorithm is used to search the target shortest decision path of each decision tree according to the target classification result type to obtain the target shortest path set, specifically:
根据所述目标分类结果类型将所述替代分类器分为第一类决策树和第二类决策树;其中,所述第一类决策树的决策值与所述目标分类结果类型不一致,所述第二类决策树的决策值与所述目标分类结果类型相一致;According to the target classification result type, the alternative classifier is divided into a first type decision tree and a second type decision tree; wherein the decision value of the first type decision tree is inconsistent with the target classification result type, the The decision value of the second type of decision tree is consistent with the target classification result type;
利用最短路径算法,查找所述第一类决策树的第一类目标最短路径,得到第一目标最短路径集合,同时,利用最短路径算法,查找所述第二类决策树的第二类目标最短路径,得到第二目标最短路径集合。The shortest path algorithm is used to find the shortest path of the first target in the first type of decision tree to obtain the shortest path set of the first target. At the same time, the shortest path algorithm is used to find the shortest target of the second type of decision tree. Path, the shortest path set of the second target is obtained.
进一步地,所述第一目标最短路径集合中的特征被分配的权值为正数,所述第二目标最短路径集合中的特征被分配的权值为负数;Further, the assigned weight value of the feature in the first target shortest path set is a positive number, and the assigned weight value of the feature in the second target shortest path set is a negative number;
所述对各个特征的权值进行累加并比较每个特征的累计权值,得到所述最优修改特征,具体为:The step of accumulating the weight of each feature and comparing the cumulative weight of each feature to obtain the optimal modified feature is specifically:
对各个特征的权值进行累加并比较每个特征的累计权值,查找出累计权值最大的特征作为所述最优修改特征。The weight of each feature is accumulated and the cumulative weight of each feature is compared, and the feature with the largest cumulative weight is found as the optimal modified feature.
进一步地,所述第一目标最短路径集合中的特征按照公式1/10 n-1进行权值分配,其中,n代表该特征相对于所在决策路径的位置次序。 Further, the features in the first target shortest path set are weighted according to the formula 1/10 n-1 , where n represents the position order of the feature relative to the decision path.
为了解决相同的技术问题,本发明还提供了一种针对集成树分类器的逃避攻击装置,包括数据获取模块、替代分类器训练模块、特征修改模块和逃避攻击试探模块;其中,In order to solve the same technical problem, the present invention also provides an evasion attack device for the integrated tree classifier, which includes a data acquisition module, an alternative classifier training module, a feature modification module, and an evasion attack detection module; wherein,
所述数据获取模块,用于获取原始输入样本、替代数据集和目标分类器的学习模型,其中,所述替代数据集为与目标分类器训练数据具有一致分布特征的数据集;The data acquisition module is used to acquire original input samples, a replacement data set, and a learning model of a target classifier, where the replacement data set is a data set that has consistent distribution characteristics with the target classifier training data;
所述替代分类器训练模块,用于根据所述替代数据集和所述学习模型进行训练,得到替代分类器;The alternative classifier training module is configured to train according to the alternative data set and the learning model to obtain an alternative classifier;
所述特征修改模块,用于判断当前特征修改次数是否达到预设的最大修改次数阈值;若否,则根据所述替代分类器,利用最短路径算法以及预设的逃避攻击策略寻找最优修改特征,并根据所述最优修改特征对所述原始输入样本的对应特征进行修改,生成试探样本;若是,则结束运行;The feature modification module is used to determine whether the current feature modification times reach the preset maximum modification times threshold; if not, according to the alternative classifier, the shortest path algorithm and the preset evasion attack strategy are used to find the optimal modification feature , And modify the corresponding feature of the original input sample according to the optimal modification feature to generate a trial sample; if yes, end the operation;
所述逃避攻击试探模块,用于利用所述目标分类器对所述试探样本进行分类得到试探分类结果,判断所述试探分类结果与预存的原始分类结果是否一致;若是,则重复执行特征修改过程;若否,则输出所述试探样本;其中,所述原始分类结果为所述目标分类器对所述原始输入 样本进行分类的结果。The evasion attack detection module is used to classify the trial sample by using the target classifier to obtain a trial classification result, and determine whether the trial classification result is consistent with the prestored original classification result; if so, repeat the feature modification process If not, output the trial sample; wherein the original classification result is the result of the target classifier classifying the original input sample.
进一步地,根据所述替代分类器,利用最短路径算法以及预设的逃避攻击策略寻找最优修改特征,具体为:Further, according to the alternative classifier, the shortest path algorithm and the preset evasion attack strategy are used to find the optimal modification feature, specifically:
根据所述替代分类器对输入样本进行分类得到的分类结果类型,确定需要得到的目标分类结果类型;Determine the target classification result type that needs to be obtained according to the classification result type obtained by classifying the input sample by the alternative classifier;
利用最短路径算法,根据所述目标分类结果类型查找每一决策树的目标最短决策路径,得到目标最短路径集合;Using the shortest path algorithm, search for the shortest target decision path of each decision tree according to the target classification result type, and obtain the shortest path set of the target;
根据所述预设的逃避攻击策略对所述目标最短路径集合中的每个特征进行权值分配;Assigning weights to each feature in the shortest path set of the target according to the preset evasion attack strategy;
对各个特征的权值进行累加并比较每个特征的累计权值,得到所述最优修改特征。The weights of each feature are accumulated and the accumulated weights of each feature are compared to obtain the optimal modified feature.
进一步地,所述目标最短路径集合包括第一目标最短路径集合和第二目标最短路径集合;Further, the target shortest path set includes a first target shortest path set and a second target shortest path set;
所述利用最短路径算法,根据所述目标分类结果类型查找每一决策树的目标最短决策路径,得到目标最短路径集合,具体为:The shortest path algorithm is used to search the target shortest decision path of each decision tree according to the target classification result type to obtain the target shortest path set, specifically:
根据所述目标分类结果类型将所述替代分类器分为第一类决策树和第二类决策树;其中,所述第一类决策树的决策值与所述目标分类结果类型不一致,所述第二类决策树的决策值与所述目标分类结果类型相一致;According to the target classification result type, the alternative classifier is divided into a first type decision tree and a second type decision tree; wherein the decision value of the first type decision tree is inconsistent with the target classification result type, the The decision value of the second type of decision tree is consistent with the target classification result type;
利用最短路径算法,查找所述第一类决策树的第一类目标最短路径,得到第一目标最短路径集合,同时,利用最短路径算法,查找所述第二类决策树的第二类目标最短路径,得到第二目标最短路径集合。The shortest path algorithm is used to find the shortest path of the first target in the first type of decision tree to obtain the shortest path set of the first target. At the same time, the shortest path algorithm is used to find the shortest target of the second type of decision tree. Path, the shortest path set of the second target is obtained.
进一步地,所述第一目标最短路径集合中的特征被分配的权值为正数,所述第二目标最短路径集合中的特征被分配的权值为负数;Further, the assigned weight value of the feature in the first target shortest path set is a positive number, and the assigned weight value of the feature in the second target shortest path set is a negative number;
所述对各个特征的权值进行累加并比较每个特征的累计权值,得到所述最优修改特征,具体为:The step of accumulating the weight of each feature and comparing the cumulative weight of each feature to obtain the optimal modified feature is specifically:
对各个特征的权值进行累加并比较每个特征的累计权值,查找出累计权值最大的特征作为所述最优修改特征。The weight of each feature is accumulated and the cumulative weight of each feature is compared, and the feature with the largest cumulative weight is found as the optimal modified feature.
进一步地,所述第一目标最短路径集合中的特征按照公式1/10 n-1进行权值分配,其中,n代表该特征相对于所在决策路径的位置次序。 Further, the features in the first target shortest path set are weighted according to the formula 1/10 n-1 , where n represents the position order of the feature relative to the decision path.
相比于现有技术,本发明具有如下有益效果:Compared with the prior art, the present invention has the following beneficial effects:
本发明针对集成树这类不具有可微损失函数的分类模型,从决策树的决策结构入手,找出每个基分类器的决策路径,并从集成树分类器的决策路径集合中分析其能够误导其决策的关键 特征,最后通过修改关键决策特征实现攻击。通过实施本发明,能够对集成树分类器(梯度提升树、随机森林等)的黑盒攻击方法进行深入研究,从而为设计鲁棒的集成树分类器提供依据和参考。Aiming at the classification model that does not have a differentiable loss function such as ensemble trees, the present invention starts with the decision structure of the decision tree, finds the decision path of each base classifier, and analyzes its performance from the decision path set of the ensemble tree classifier. Misleading the key features of its decision, and finally realize the attack by modifying the key decision features. Through the implementation of the present invention, the black box attack method of the integrated tree classifier (gradient boosting tree, random forest, etc.) can be deeply studied, so as to provide basis and reference for designing a robust integrated tree classifier.
附图说明Description of the drawings
图1是本发明一实施例提供的针对集成树分类器的逃避攻击方法的流程示意图;FIG. 1 is a schematic flowchart of an evasion attack method for an integrated tree classifier provided by an embodiment of the present invention;
图2是本发明一实施例提供的集成树分类器结构示意图;2 is a schematic diagram of the structure of an integrated tree classifier provided by an embodiment of the present invention;
图3是本发明一实施例提供的集成分类器中的第一类决策树的结构示意图;3 is a schematic diagram of the structure of the first type of decision tree in the integrated classifier provided by an embodiment of the present invention;
图4是本发明一实施例提供的集成分类器中的第二类决策树的结构示意图;4 is a schematic structural diagram of a second type of decision tree in an integrated classifier provided by an embodiment of the present invention;
图5是本发明一实施例提供的逃避攻击流程及模型示意图;FIG. 5 is a schematic diagram of an attack evasion process and model provided by an embodiment of the present invention;
图6是本发明一实施例提供的针对集成树分类器的逃避攻击装置的结构示意图。FIG. 6 is a schematic structural diagram of an evasion attack device for an integrated tree classifier provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整的描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following describes the technical solutions in the embodiments of the present invention clearly and completely with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
请参见图1,本发明实施例提供了一种针对集成树分类器的逃避攻击方法,包括步骤:Referring to Fig. 1, an embodiment of the present invention provides an evasion attack method for an integrated tree classifier, including the steps:
(1)获取原始输入样本、替代数据集和目标分类器的学习模型,其中,所述替代数据集为与目标分类器训练数据具有一致分布特征的数据集;(1) Obtain an original input sample, a replacement data set, and a learning model of the target classifier, wherein the replacement data set is a data set that has consistent distribution characteristics with the target classifier training data;
(2)根据所述替代数据集和所述学习模型进行训练,得到替代分类器;(2) Training according to the replacement data set and the learning model to obtain a replacement classifier;
(3)判断当前特征修改次数是否达到预设的最大修改次数阈值;若否,则根据所述替代分类器,利用最短路径算法以及预设的逃避攻击策略寻找最优修改特征,并根据所述最优修改特征对所述原始输入样本的对应特征进行修改,生成试探样本,执行步骤(4);若是,则结束运行;(3) Determine whether the current feature modification times reach the preset maximum modification times threshold; if not, according to the alternative classifier, use the shortest path algorithm and the preset evasion attack strategy to find the optimal modification feature, and according to the The optimal modification feature modifies the corresponding feature of the original input sample, generates a trial sample, and executes step (4); if it is, the operation ends;
(4)利用所述目标分类器对所述试探样本进行分类得到试探分类结果,判断所述试探分类结果与预存的原始分类结果是否一致;若是,则执行步骤(3);若否,则输出所述试探样本;其中,所述原始分类结果为所述目标分类器对所述原始输入样本进行分类的结果。(4) Use the target classifier to classify the trial samples to obtain trial classification results, and determine whether the trial classification results are consistent with the pre-stored original classification results; if yes, perform step (3); if not, then output The trial sample; wherein the original classification result is a result of the target classifier classifying the original input sample.
目前,现有针对基于学习的分类模型的漏洞分析主要采用的是基于梯度的攻击方法,这类方法只对具有可微损失函数的模型有效,无法应用于集成树分类模型。目前可用于攻击集成树 模型的方法主要有两种。Kantchelian等人提出的基于混合整数线性规划集成树分类器攻击方法只能应用于白盒攻击场景,且算法复杂度高,无法应用于较大的数据集。Cheng等人提出的基于问询的黑盒攻击方法,要求特征值必须是连续的实数值,无法应用于网络安全领域中使用较为广泛的二进制特征,并且此方法不是专门针对集成树分类器设计,攻击效果较差。At present, the existing vulnerability analysis for learning-based classification models mainly uses gradient-based attack methods, which are only effective for models with differentiable loss functions and cannot be applied to ensemble tree classification models. There are two main methods currently available for attacking the integrated tree model. The ensemble tree classifier attack method based on mixed integer linear programming proposed by Kantchelian et al. can only be applied to white-box attack scenarios, and the algorithm complexity is high, which cannot be applied to larger data sets. The query-based black box attack method proposed by Cheng et al. requires that the feature value must be a continuous real value, which cannot be applied to the widely used binary features in the field of network security, and this method is not specifically designed for integrated tree classifiers. The attack effect is poor.
本发明针对集成树这类不具有可微损失函数的分类模型,从决策树的决策结构入手,找出每个基分类器的决策路径,从集成树分类器的决策路径集合中分析其能够误导其决策的关键特征,最后通过修改关键决策特征实现攻击,其中,关键的问题在于如何从决策路径集合中找出关键决策特征。本发明从集成树分类器基于投票的集成策略入手,从集成树分类器中找出能改变(或误导)多数基分类器决策值的关键特征,本方法在每一次循环流程中找出一个关键决策特征,并修改输入样本的相应特征数据,生成攻击样本,如果不能攻击成功,就在此基础上找下一个关键特征,直到攻击成功,或达到最大修改特征数。Aiming at classification models such as ensemble trees that do not have a differentiable loss function, the present invention starts with the decision structure of the decision tree, finds the decision path of each base classifier, and analyzes it from the set of decision paths of the ensemble tree classifier to be misleading. The key feature of its decision-making is finally realized by modifying the key decision feature to realize the attack. The key issue is how to find the key decision feature from the set of decision paths. The present invention starts with the voting-based integration strategy of the ensemble tree classifier, and finds the key features that can change (or mislead) the decision value of the majority base classifier from the ensemble tree classifier. The method finds a key in each cycle process Decide the characteristics, and modify the corresponding characteristic data of the input sample to generate an attack sample. If the attack cannot be successful, find the next key characteristic on this basis until the attack is successful or the maximum number of modified characteristics is reached.
可以理解的是,在逃避攻击中,攻击者的目的是通过估计目标模型的决策边界,操纵输入样本来误导目标模型的决策。假设对输入样本x,目标模型的输出为c(x),攻击策略是通过最少限度地修改x,找到一个样本x'使c(x')≠c(x)。假设d(x,x')是描述修改量的距离函数。逃避攻击问题可以描述为:It is understandable that in evasive attacks, the attacker's purpose is to mislead the target model's decision by estimating the decision boundary of the target model and manipulating input samples. Suppose that for the input sample x, the output of the target model is c(x). The attack strategy is to modify x to the minimum and find a sample x'such that c(x')≠c(x). Assume that d(x,x') is a distance function describing the modifier. The evasion attack problem can be described as:
A(x)=arg min x'd(x,x'),s.t.c(x')≠c(x)  (一) A (x) = arg min x 'd (x, x'), stc (x ') ≠ c (x) ( a)
其中,x为输入样本,为c(x)为分类模型对x的输出类别,样本x'为攻击样本。函数(一)的含义为通过最少限度地修改x,达到改变输出类别(即攻击)的目的。Among them, x is the input sample, c(x) is the output category of the classification model to x, and sample x'is the attack sample. The meaning of function (1) is to achieve the purpose of changing the output category (ie attack) by modifying x to the minimum.
为了进行逃避攻击,攻击者需要对目标系统有一定的了解。对目标系统的知识可以分为四个层次:1)训练数据D;2)特征空间X;3)学习算法F;4)目标模型参数w。攻击者关于目标系统的知识可以用θ=(D,X,F,w)来表示。我们可以根据攻击者的知识层次分为两类攻击场景:In order to evade the attack, the attacker needs to have a certain understanding of the target system. The knowledge of the target system can be divided into four levels: 1) training data D; 2) feature space X; 3) learning algorithm F; 4) target model parameter w. The attacker's knowledge about the target system can be represented by θ=(D, X, F, w). We can divide the attacker's level of knowledge into two types of attack scenarios:
白盒攻击:在这个场景中,假定攻击者知道目标系统的全部知识,即θ=(D,X,F,w),这时攻击者可以用最少的代价实现逃避攻击。在实践中,攻击者不太可能拥有全部的知识。不过,这个场景可以用来评估基于学习的分类器在最坏情况下的安全性。White box attack: In this scenario, it is assumed that the attacker knows all the knowledge of the target system, ie θ=(D,X,F,w), then the attacker can avoid the attack with the least cost. In practice, an attacker is unlikely to have all the knowledge. However, this scenario can be used to evaluate the worst-case safety of a learning-based classifier.
黑盒攻击:该场景假设攻击者对目标系统有一定的了解。这里我们假设攻击者知道学习算法F和特征空间X,但不知道训练数据D和目标模型参数w。然而,攻击者可以通过网络或 其它来源收集一个替代数据集D',并用这个数据集估计目标模型参数w'。当然,攻击者也有可能获得原始训练集的子集。在这种情况下,攻击者拥有的知识可以定义为θ'=(D',X,F,w')。Black box attack: This scenario assumes that the attacker has a certain understanding of the target system. Here we assume that the attacker knows the learning algorithm F and the feature space X, but does not know the training data D and the target model parameter w. However, the attacker can collect an alternative data set D'through the Internet or other sources, and use this data set to estimate the target model parameters w'. Of course, the attacker may also obtain a subset of the original training set. In this case, the knowledge possessed by the attacker can be defined as θ'=(D',X,F,w').
如图2所示的集成树分类器f:R n→R是由多颗决策树组成的集合T。在不失通用性的情况下,假设决策树T i∈T是一颗二叉树,其中每个具有谓词逻辑的内部节点n∈T i.nodes。如果谓词的结果为true,则输出边指向其左孩子n.leftchild,否则,输出边指向它的右孩子n.rightchild。每个叶子结点l∈T i.leaves拥有一个类别值l.class∈R。对于一个给定的样本x∈R n,决策树T i的决策路径是从根结点到其中一个叶子结点的路径。T i对样本x的分类结果T i.class为分类路径上叶子结点的值l.class。集成树的决策值f(x)是所有决策树多数投票的结果。 The ensemble tree classifier f:R n →R shown in Figure 2 is a set T composed of multiple decision trees. Without loss of versatility, suppose that the decision tree T i ∈T is a binary tree, in which each internal node n∈T i .nodes with predicate logic. If the result of the predicate is true, the output edge points to its left child n.leftchild, otherwise, the output edge points to its right child n.rightchild. Each leaf node has a l∈T i .leaves category value l.class∈R. For a given sample x∈R n, T i decision tree path is the path from a leaf node to which the root node. T i classification results of sample x is a value T i .class l.class classification leaf node of the path. The decision value f(x) of the ensemble tree is the result of the majority vote of all decision trees.
本发明实施例主要针对基于二进制特征的二叉分类树,作为举例,某个二叉分类树参数如下:R∈{-1,1},x i∈{0,1}。假设分类器对输入样本x的分类结果f(x)=1,我们的攻击目标是找到样本x',使得f(x')=-1,并且能够最小化d(x,x')。当特征值为二进制值时,d(·,·)对应于L 0范数或汉明距离,表示特征只能从初始样本x添加(从0到1),或删除(从1到0)。 The embodiment of the present invention mainly aims at a binary classification tree based on binary features. As an example, a certain binary classification tree parameter is as follows: R ∈ {-1, 1}, x i ∈ {0, 1}. Assuming that the classification result of the classifier on the input sample x is f(x)=1, our attack goal is to find the sample x'such that f(x')=-1, and can minimize d(x,x'). When the feature value is a binary value, d(·,·) corresponds to L 0 norm or Hamming distance, which means that the feature can only be added (from 0 to 1) or deleted (from 1 to 0) from the initial sample x.
根据多数投票策略,如果我们想使得f(x')=-1,那么集合中超过一半决策树的决策值应为-1。攻击算法的基本思想是修改最少数量的特征,使超过一半的树得到-1的决策值。另外,我们有两个发现,一是对于具有二进制特征的决策树,在从根结点到叶子结点的决策路径中,没有特征会分裂两次;二是对于具有二进制特征的决策树,如果想要改变决策类别,分类路径中的某个特征必须首先被修改。这两个发现对于寻找能够误导集成树分类器决策的关键特征至关重要。According to the majority voting strategy, if we want to make f(x')=-1, then the decision value of more than half of the decision trees in the set should be -1. The basic idea of the attack algorithm is to modify the minimum number of features so that more than half of the trees get a decision value of -1. In addition, we have two findings. One is that for decision trees with binary features, no feature will split twice in the decision path from the root node to the leaf node; the second is for decision trees with binary features, if To change the decision category, a certain feature in the classification path must first be modified. These two findings are essential to find the key features that can mislead the decision of the ensemble tree classifier.
在本发明实施例中,进一步地,根据所述替代分类器,利用最短路径算法以及预设的逃避攻击策略寻找最优修改特征,具体为:In the embodiment of the present invention, further, according to the alternative classifier, the shortest path algorithm and the preset evasion attack strategy are used to find the optimal modification feature, specifically:
根据所述替代分类器对输入样本进行分类得到的分类结果类型,确定需要得到的目标分类结果类型;在本发明实施例中,分类器对输入样本进行分类得到的分类结果类型为f(x)=1,那么,本步骤需要确定的目标分类结果类型为f(x')=-1。According to the classification result type obtained by classifying the input sample by the alternative classifier, the target classification result type that needs to be obtained is determined; in the embodiment of the present invention, the classification result type obtained by the classifier classifying the input sample is f(x) = 1, then the target classification result type to be determined in this step is f(x')=-1.
利用最短路径算法,根据所述目标分类结果类型查找每一决策树的目标最短决策路径,得 到目标最短路径集合;Using the shortest path algorithm, search for the shortest target decision path of each decision tree according to the target classification result type, and obtain the target shortest path set;
根据所述预设的逃避攻击策略对所述目标最短路径集合中的每个特征进行权值分配;Assigning weights to each feature in the shortest path set of the target according to the preset evasion attack strategy;
对各个特征的权值进行累加并比较每个特征的累计权值,得到所述最优修改特征。The weights of each feature are accumulated and the accumulated weights of each feature are compared to obtain the optimal modified feature.
在本发明实施例中,进一步地,所述目标最短路径集合包括第一目标最短路径集合和第二目标最短路径集合;In the embodiment of the present invention, further, the target shortest path set includes a first target shortest path set and a second target shortest path set;
所述利用最短路径算法,根据所述目标分类结果类型查找每一决策树的目标最短决策路径,得到目标最短路径集合,具体为:The shortest path algorithm is used to search the target shortest decision path of each decision tree according to the target classification result type to obtain the target shortest path set, specifically:
根据所述目标分类结果类型将所述替代分类器分为第一类决策树和第二类决策树;其中,所述第一类决策树的决策值与所述目标分类结果类型不一致,所述第二类决策树的决策值与所述目标分类结果类型相一致;According to the target classification result type, the alternative classifier is divided into a first type decision tree and a second type decision tree; wherein the decision value of the first type decision tree is inconsistent with the target classification result type, the The decision value of the second type of decision tree is consistent with the target classification result type;
利用最短路径算法,查找所述第一类决策树的第一类目标最短路径,得到第一目标最短路径集合,同时,利用最短路径算法,查找所述第二类决策树的第二类目标最短路径,得到第二目标最短路径集合。The shortest path algorithm is used to find the shortest path of the first target in the first type of decision tree to obtain the shortest path set of the first target. At the same time, the shortest path algorithm is used to find the shortest target of the second type of decision tree. Path, the shortest path set of the second target is obtained.
在本发明实施例中,进一步地,所述第一目标最短路径集合中的特征被分配的权值为正数,所述第二目标最短路径集合中的特征被分配的权值为负数;In the embodiment of the present invention, further, the weight assigned to the feature in the first target shortest path set is a positive number, and the weight assigned to the feature in the second target shortest path set is a negative number;
所述对各个特征的权值进行累加并比较每个特征的累计权值,得到所述最优修改特征,具体为:The step of accumulating the weight of each feature and comparing the cumulative weight of each feature to obtain the optimal modified feature is specifically:
对各个特征的权值进行累加并比较每个特征的累计权值,查找出累计权值最大的特征作为所述最优修改特征。The weight of each feature is accumulated and the cumulative weight of each feature is compared, and the feature with the largest cumulative weight is found as the optimal modified feature.
在本发明实施例中,进一步地,所述第一目标最短路径集合中的特征按照公式1/10 n-1进行权值分配,其中,n代表该特征相对于所在决策路径的位置次序。 In the embodiment of the present invention, further, the features in the first target shortest path set are weighted according to the formula 1/10 n-1 , where n represents the position order of the feature relative to the decision path.
下面采用一个集成树分类器的例子来解释我们的模型。假设一个集成树分类器由图3和图4所示的决策树T 1,T 2和T 3组成,而x=[x 0...x 9]=1100101100是一个10维的样本。深色结点显示了三棵树对样本x的分类路径,且T 1.class=1,T 2.class=1,T 3.class=-1。在本实施例中,由于多数决策树的分类结果类型为1,那么可以确定目标分类结果类型为-1,因此,将T 1、T 2划分为第一类决策树,将T 3划分为第二类决策树。 The following uses an example of an ensemble tree classifier to explain our model. Suppose an ensemble tree classifier is composed of the decision trees T 1 , T 2 and T 3 shown in Figure 3 and Figure 4, and x = [x 0 ... x 9 ] = 1100101100 is a 10-dimensional sample. The dark nodes show the classification paths of the three trees to the sample x, and T 1 .class=1, T 2 .class=1, T 3 .class=-1. In this embodiment, since the classification result type of most decision trees is 1, it can be determined that the target classification result type is -1. Therefore, T 1 and T 2 are divided into the first type of decision tree, and T 3 is divided into the first type of decision tree. The second type of decision tree.
从T 1的分类路径可以看到,如果我们想使得T 1.class=-1,首先必须修改分类路径中的某个特征。在本实施例中,可以修改特征是x 2,x 4,x 7。其次我们需要知道依次修改哪些特征才能使 d(x,x')最小化。为了实现这个目标,首先要考虑的是修改尽可能少的特征使尽可能多的当前决策值为1的树(第一类决策树)的决策值变为-1。对一颗当前决策值为1的树,我们需要找到这棵树的分类路径中的每个内部节点到值为-1的叶子结点的最短路径。首先,我们列出从分类路径中的每个内部节点到值为-1的叶子结点的所有路径,并且这些路径中不包含分类路径上其它的内部结点。这些路径是树中节点的路径,我们称之为树的路径。但是,树的路径中的特征并不意味着它们需要修改。然后,根据输入样本x确定哪些特征是需要修改的,得到一个需要修改的特征路径集合。最后,我们列出要将该树决策类别变为-1需要修改特征的最短路径(第一类目标最短路径)。具体流程详见算法1。对于本例子中的决策值为1的树T1和T2,其最短路径的生成过程如式(二)和式(三)所示。 From the classification path of T 1 we can see that if we want to make T 1 .class = -1, we must first modify a certain feature in the classification path. In this embodiment, the features that can be modified are x 2 , x 4 , and x 7 . Secondly, we need to know which features to modify in order to minimize d(x,x'). In order to achieve this goal, the first thing to consider is to modify as few features as possible so that the decision value of as many trees with the current decision value of 1 (the first type of decision tree) becomes -1. For a tree whose current decision value is 1, we need to find the shortest path from each internal node in the tree's classification path to the leaf node with the value -1. First, we list all paths from each internal node in the classification path to the leaf node with a value of -1, and these paths do not include other internal nodes on the classification path. These paths are the paths of nodes in the tree, and we call them tree paths. However, features in the path of the tree do not mean that they need to be modified. Then, determine which features need to be modified according to the input sample x, and obtain a feature path set that needs to be modified. Finally, we list the shortest path (the shortest path for the first type of target) that needs to modify the feature to change the tree decision category to -1. The specific process is detailed in Algorithm 1. For the trees T1 and T2 with a decision value of 1 in this example, the shortest path generation process is shown in equations (2) and (3).
算法1.最短路径算法. Algorithm 1. Shortest path algorithm.
输入:T:集成树分类器,x:输入样本.Input: T: integrated tree classifier, x: input sample.
输出:P:最短路径集合.Output: P: shortest path set.
FOR T i∈T and T i.class=1 DO FOR T i ∈ T and T i .class=1 DO
列出该树分类路径上的所有内部结点T i.innodes List all internal nodes T i .innodes on the tree classification path
FOR每一个n∈T i.innodes DO FOR each n ∈ T i .innodes DO
IF n.leftchild∈T i.innodes or n.leftchild=1 THEN IF n.leftchild∈T i .innodes or n.leftchild=1 THEN
列出从n经过n.rightchild到值-1的叶子的所有路径List all paths from n through n.rightchild to the leaf with value -1
ELSEIF n.rightchild∈T i.innodes or n.rightchild=1 THEN ELSEIF n.rightchild∈T i .innodes or n.rightchild=1 THEN
列出从n经过n.leftchild到值-1的叶子的所有路径List all paths from n through n.leftchild to the leaf with value -1
ENDIFENDIF
ENDFORENDFOR
用样本x找到需要修改的路径PM i Use sample x to find the path PM i that needs to be modified
P i←列出PM i中的最短路径 P i ←List the shortest path in PM i
ENDFORENDFOR
RETURN:PRETURN:P
本发明实施例中第一类目标最短路径的生成过程如下:The shortest path generation process of the first type of target in the embodiment of the present invention is as follows:
Figure PCTCN2019097532-appb-000001
Figure PCTCN2019097532-appb-000001
Figure PCTCN2019097532-appb-000002
Figure PCTCN2019097532-appb-000002
在本例子中,特征x 2在T 2需要修改的路径中作为首个特征出现了两次(P 21和P 22),且这两条路径均为最短路径。对于当某个特征作为最短路径中的第一个特征出现多次时,我们随机选择其中一个。在式(三)的例子中,可以选择路径P 21作为T 2的最短路径。因此,T的树中使决策值从1变为-1的最短路径集合(第一目标最短路径集合)如式(四)所示。 In this example, the feature x 2 appears twice (P 21 and P 22 ) as the first feature in the path that T 2 needs to be modified, and both paths are the shortest paths. When a feature appears multiple times as the first feature in the shortest path, we randomly select one of them. In the example of formula (3), the path P 21 can be selected as the shortest path of T 2 . Therefore, the shortest path set (first target shortest path set) in the tree of T that changes the decision value from 1 to -1 is shown in equation (4).
Figure PCTCN2019097532-appb-000003
Figure PCTCN2019097532-appb-000003
因为集成分类器中有多棵决策树,每棵树都有多条最短路径。我们需要找出每次修改哪个特性是最优的,以便让更多的树得到-1的决策值。我们为最短路径集合P中的每个特征分配权值,并选择权值最大的特征作为每次最优修改特征。权值的分配规则可以是对于一条最短路径中的第n个特征分配权值1/10 n-1,需要说明的是实际应用中不仅限于此分配规则。对于式(四)的最短路径集合中的四条路径,按照上述分配规则,前两条路径中的特征x 2和x 7分别赋权值1(x 2和x 7在所在路径中的位置次序为1,因此权值为1/10 1-1=1),第三条路径中的特征x 2赋权值1,x 1的赋权值0.1,第四条路径中的特征x 3赋权值1,x 8赋权值为0.1。 Because there are multiple decision trees in the ensemble classifier, each tree has multiple shortest paths. We need to find out which feature is the best for each modification, so that more trees can get a decision value of -1. We assign a weight to each feature in the shortest path set P, and select the feature with the largest weight as the optimal modification feature each time. The weight assignment rule can be to assign the weight 1/10 n-1 to the nth feature in a shortest path. It should be noted that the actual application is not limited to this assignment rule. For the four paths in the shortest path set of formula (4), according to the above distribution rules, the features x 2 and x 7 in the first two paths are respectively assigned a weight of 1 (the position order of x 2 and x 7 in the path is 1, so the weight is 1/10 1-1 =1), the feature x 2 in the third path is assigned a weight value of 1, x 1 is assigned a weight value of 0.1, and the feature x 3 in the fourth path is assigned a weight value 1, x 8 is assigned a weight of 0.1.
虽然,通过以上的流程可以在第一目标最短路径集合中找出最优的修改特征,使得集成树中尽可能多的树的决策值从1变为-1。但以上流程只考虑了当前决策值为1的树,而集成树中还可能存在当前决策值为-1的树。选择最优修改特征时应当考虑当前决策值为-1的树是否有可能因为特征的修改而使得决策值变为1。因此,我们将当前决策值为-1的树中可能导致决策值变为1的路径列到集合P'(第二目标最短路径集合)中。对于本实施例中决策值为-1的树T 3 可能由于特征修改决策值变为1的路径集合如式(五)所示。 Although, through the above process, the optimal modified feature can be found in the first target shortest path set, so that the decision value of as many trees as possible in the integrated tree changes from 1 to -1. However, the above process only considers the tree with the current decision value of 1, and there may be trees with the current decision value of -1 in the integrated tree. When selecting the optimal modified feature, you should consider whether the tree with the current decision value of -1 is likely to cause the decision value to become 1 due to feature modification. Therefore, we list the paths in the tree with the current decision value of -1 that may cause the decision value to become 1 into the set P'(the set of the second target shortest paths). For the tree T 3 with the decision value of -1 in this embodiment, the path set whose decision value becomes 1 due to feature modification is shown in equation (5).
本发明实施例中第二类目标最短路径的生成过程如下:The shortest path generation process of the second type of target in the embodiment of the present invention is as follows:
Figure PCTCN2019097532-appb-000004
Figure PCTCN2019097532-appb-000004
如式(五)所示,考虑到路径中多于一个特征时改变某个特征不会直接导致决策值的改变,因此,针对决策值从-1变为1的路径集合,我们只考虑修改一个特征便能导致决策值改变的情况,并给这个特征赋权值-1。对式(五)的例子,特征x 3赋权值-1。将集合P和P'中所有的相同特征的权值加和后,通过比较得到权值最大的特征为x 2,其权值和为2。 As shown in formula (5), considering that when there is more than one feature in the path, changing a feature will not directly lead to a change in the decision value. Therefore, for the set of paths whose decision value changes from -1 to 1, we only consider modifying one The feature can lead to a situation where the decision value changes, and a weight value of -1 is assigned to this feature. For the example of formula (5), the feature x 3 is assigned a weight of -1. The right to set P and P 'are all the same features and the added value, characterized by comparing the maximum value obtained for the weight x 2, and its weight is 2.
找到本次最优的修改特征x 2之后,需要修改输入样本x的对应特征。由于特征被修改后会导致随机森林中多棵树的分类路径发生改变,因此,需要重新计算集合P和P',并根据新集合中路径的情况选择下一个最优特征,直到逃避检测或达到最大修改限制(“逃避检测”即说明攻击成功,“达到最大修改限制”意味着达到最大修改次数时未攻击成功)。该逃避检测模型的具体流程见算法2。其中用到的符号P ijk指的是第i颗决策值为1的树的最短路径集合中的第j条路径的第k个特征;P ijk.weight指的是P ijk的权值。 After finding the optimal modified feature x 2 this time, the corresponding feature of the input sample x needs to be modified. Since the modified feature will cause the classification path of multiple trees in the random forest to change, it is necessary to recalculate the sets P and P', and select the next optimal feature according to the path in the new set, until the detection is evaded or reached Maximum modification limit ("Evasion of detection" means that the attack is successful, and "Maximum modification limit" means that the attack is not successful when the maximum number of modifications is reached). The specific process of the evasion detection model is shown in Algorithm 2. The symbol P ijk used here refers to the k-th feature of the j-th path in the shortest path set of the i-th tree with a decision value of 1. P ijk .weight refers to the weight of P ijk .
算法2.攻击方法.Algorithm 2. Attack method.
输入:T:集成树分类器,x:输入样本,m max:最大修改特征数. Input: T: ensemble tree classifier, x: input sample, m max : maximum number of modified features.
输出:攻击样本x'.Output: Attack sample x'.
用最短路径算法得到T对x的最短路径集合P和集合P'Use the shortest path algorithm to get the shortest path set P and set P'of T versus x
m←0m←0
WHILE
Figure PCTCN2019097532-appb-000005
and m<m maxDO
WHILE
Figure PCTCN2019097532-appb-000005
and m<m max DO
FOR每个特征P ijkDO FOR each feature P ijk DO
Figure PCTCN2019097532-appb-000006
Figure PCTCN2019097532-appb-000006
ENDFORENDFOR
为P'中长度为1的路径中的唯一特征赋权值-1Assign a weight of -1 to the unique feature in the path of length 1 in P'
相同特征的权值加和并找到权值最大的特征x w Add the weights of the same feature and find the feature with the largest weight x w
x'←修改样本的对应特征x w x'←Modify the corresponding feature of the sample x w
m←m+1m←m+1
IF f(x')=-1IF f(x')=-1
RETURN:x'RETURN:x'
ELSEELSE
重新计算集合P和集合P'Recalculate set P and set P'
ENDIFENDIF
ENDWHILEENDWHILE
请参见图5,为了更直观说明本发明的主要工作原理,在本发明实施例中,我们假设攻击者知道学习模型f和与训练数据具有一致分布的替代数据集D'。首先,攻击者需要基于自己的知识训练一个替代的集成树模型。其次,采用逃避攻击方法对输入样本x的关键特征进行定位并修改。最后,利用修改后的样本x'攻击目标分类器。Please refer to FIG. 5, in order to more intuitively illustrate the main working principle of the present invention, in the embodiment of the present invention, we assume that the attacker knows the learning model f and the replacement data set D'that has a consistent distribution with the training data. First, the attacker needs to train an alternative ensemble tree model based on his own knowledge. Secondly, use evasion attack method to locate and modify the key features of the input sample x. Finally, use the modified sample x'to attack the target classifier.
需要说明的是,通过实施本发明实施例获得成功进行逃避攻击的攻击样本(对抗样本)后,在决策树的训练过程中,通过将对抗样本加入到训练数据集中,可以显著提高提升决策树的安全性。It should be noted that after implementing the embodiments of the present invention to obtain attack samples (adversarial samples) that successfully perform evasion attacks, in the decision tree training process, by adding the adversarial samples to the training data set, the improvement of the decision tree can be significantly improved. safety.
集成树模型(包括随机森林,梯度提升树等)是一种常用的分类模型,因为它易于使用并能显著提高分类准确率。本发明实施例提出了一种新的针对集成树分类器的逃避攻击方法,来研究其对抗逃避攻击的安全性。The ensemble tree model (including random forest, gradient boosting tree, etc.) is a commonly used classification model because it is easy to use and can significantly improve the classification accuracy. The embodiment of the present invention proposes a new evasion attack method for the integrated tree classifier to study its security against evasion attacks.
与现有技术的其他方法相比,本发明通过最短路径算法找出可以改变集成树分类器决策值的最少特征。本方案中寻找一个修改特征的时间复杂度为
Figure PCTCN2019097532-appb-000007
而Kantchelian的方法完成同样任务的时间复杂度为
Figure PCTCN2019097532-appb-000008
表1给出了本发明与Kantchelian方法和Cheng的方法的比较。
Compared with other methods in the prior art, the present invention uses the shortest path algorithm to find the least features that can change the decision value of the ensemble tree classifier. The time complexity of finding a modified feature in this scheme is
Figure PCTCN2019097532-appb-000007
The time complexity of Kantchelian's method to complete the same task is
Figure PCTCN2019097532-appb-000008
Table 1 shows the comparison of the present invention with the Kantchelian method and Cheng's method.
表1三种方法比较Table 1 Comparison of three methods
方法method 算法效率Algorithm efficiency 支持二进制特征Support binary features 黑盒攻击Black box attack
本发明this invention high Yes Yes
Kantchelian的方法Kantchelian's method low Yes no
Cheng的方法Cheng's method high no Yes
可以理解的是,在实际应用中,我们同时采用白盒攻击和黑盒攻击两种方式评估集成树分类器对抗逃避攻击的安全性。对于白盒攻击,我们假设攻击者拥有跟目标系统相同的知识。对于黑盒攻击(本发明实施例),我们根据攻击者掌握的训练数据的程度考虑两种攻击场景,第 一个攻击场景称为训练子集场景,该场景假设攻击者知道原始训练数据的子集,第二个场景称为替代数据场景,该场景假设攻击者不知道原始的训练数据,但是能够通过网络或其它方式收集到与原始训练数据同分布的替代数据集。在这两个攻击场景中,可以将攻击者掌握的数据划分比例20%、50%、80%、100%,来评估攻击者掌握不同数据量的情况下分类器的安全性。分类器安全性的评价采用攻击难度(Hardness of evasion)和逃避率(Evasion rate)两种评价标准。It is understandable that in practical applications, we use both white box attacks and black box attacks to evaluate the security of the integrated tree classifier against evasion attacks. For white box attacks, we assume that the attacker has the same knowledge as the target system. For the black box attack (embodiment of the present invention), we consider two attack scenarios based on the degree of training data the attacker has. The first attack scenario is called the training subset scenario. This scenario assumes that the attacker knows the subset of the original training data. The second scenario is called the replacement data scenario. This scenario assumes that the attacker does not know the original training data, but can collect a replacement data set with the same distribution as the original training data through the network or other means. In these two attack scenarios, the data held by the attacker can be divided into 20%, 50%, 80%, and 100% to evaluate the security of the classifier when the attacker has different amounts of data. The security evaluation of the classifier adopts two evaluation criteria: Hardness of evasion and Evasion rate.
本发明针对集成树这类不具有可微损失函数的分类模型,从决策树的决策结构入手,找出每个基分类器的决策路径,并从集成树分类器的决策路径集合中分析其能够误导其决策的关键特征,最后通过修改关键决策特征实现攻击。通过实施本发明实施例,能够对针对应用广泛的基于二进制特征的集成树分类器(梯度提升树、随机森林等)的黑盒攻击方法进行深入研究,从而为设计鲁棒的分类器提供依据和参考。Aiming at the classification model that does not have a differentiable loss function such as ensemble trees, the present invention starts with the decision structure of the decision tree, finds the decision path of each base classifier, and analyzes its performance from the decision path set of the ensemble tree classifier. Misleading the key features of its decision, and finally realize the attack by modifying the key decision features. Through the implementation of the embodiments of the present invention, it is possible to conduct in-depth research on black box attack methods for widely used binary feature-based integrated tree classifiers (gradient boosting trees, random forests, etc.), thereby providing a basis for designing robust classifiers and reference.
需要说明的是,对于以上方法或流程实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明实施例并不受所描述的动作顺序的限制,因为依据本发明实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作并不一定是本发明实施例所必须的。It should be noted that for the above method or process embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should know that the embodiments of the present invention are not affected by the described sequence of actions. Limitation, because according to the embodiment of the present invention, some steps can be performed in other order or simultaneously. Secondly, those skilled in the art should also be aware that the embodiments described in the specification are all optional embodiments, and the actions involved are not necessarily required by the embodiments of the present invention.
请参见图6,为了解决相同的技术问题,本发明还提供了一种针对集成树分类器的逃避攻击装置,包括数据获取模块1、替代分类器训练模块2、特征修改模块3和逃避攻击试探模块4;其中,Please refer to FIG. 6, in order to solve the same technical problem, the present invention also provides an evasion attack device for the integrated tree classifier, including a data acquisition module 1, an alternative classifier training module 2, a feature modification module 3, and an evasion attack probe Module 4; among them,
所述数据获取模块1,用于获取原始输入样本、替代数据集和目标分类器的学习模型,其中,所述替代数据集为与目标分类器训练数据具有一致分布特征的数据集;The data acquisition module 1 is configured to acquire original input samples, a replacement data set, and a learning model of a target classifier, where the replacement data set is a data set that has consistent distribution characteristics with the target classifier training data;
所述替代分类器训练模块2,用于根据所述替代数据集和所述学习模型进行训练,得到替代分类器;The alternative classifier training module 2 is configured to train according to the alternative data set and the learning model to obtain an alternative classifier;
所述特征修改模块3,用于判断当前特征修改次数是否达到预设的最大修改次数阈值;若否,则根据所述替代分类器,利用最短路径算法以及预设的逃避攻击策略寻找最优修改特征,并根据所述最优修改特征对所述原始输入样本的对应特征进行修改,生成试探样本;若是,则结束运行;The feature modification module 3 is used to determine whether the current feature modification times reach the preset maximum modification times threshold; if not, the shortest path algorithm and the preset evasion attack strategy are used to find the optimal modification according to the alternative classifier Feature, and modify the corresponding feature of the original input sample according to the optimal modified feature to generate a trial sample; if yes, end the operation;
所述逃避攻击试探模块4,用于利用所述目标分类器对所述试探样本进行分类得到试探分类结果,判断所述试探分类结果与预存的原始分类结果是否一致;若是,则重复执行特征修改过程;若否,则输出所述试探样本;其中,所述原始分类结果为所述目标分类器对所述原始输 入样本进行分类的结果。The evasive attack detection module 4 is used to classify the trial samples by using the target classifier to obtain a trial classification result, and determine whether the trial classification result is consistent with the pre-stored original classification result; if so, repeat feature modification Process; if not, output the trial sample; wherein, the original classification result is the result of the target classifier classifying the original input sample.
进一步地,根据所述替代分类器,利用最短路径算法以及预设的逃避攻击策略寻找最优修改特征,具体为:Further, according to the alternative classifier, the shortest path algorithm and the preset evasion attack strategy are used to find the optimal modification feature, specifically:
根据所述替代分类器对输入样本进行分类得到的分类结果类型,确定需要得到的目标分类结果类型;Determine the target classification result type that needs to be obtained according to the classification result type obtained by classifying the input sample by the alternative classifier;
利用最短路径算法,根据所述目标分类结果类型查找每一决策树的目标最短决策路径,得到目标最短路径集合;Using the shortest path algorithm, search for the shortest target decision path of each decision tree according to the target classification result type, and obtain the shortest path set of the target;
根据所述预设的逃避攻击策略对所述目标最短路径集合中的每个特征进行权值分配;Assigning weights to each feature in the shortest path set of the target according to the preset evasion attack strategy;
对各个特征的权值进行累加并比较每个特征的累计权值,得到所述最优修改特征。The weights of each feature are accumulated and the accumulated weights of each feature are compared to obtain the optimal modified feature.
进一步地,所述目标最短路径集合包括第一目标最短路径集合和第二目标最短路径集合;Further, the target shortest path set includes a first target shortest path set and a second target shortest path set;
所述利用最短路径算法,根据所述目标分类结果类型查找每一决策树的目标最短决策路径,得到目标最短路径集合,具体为:The shortest path algorithm is used to search the target shortest decision path of each decision tree according to the target classification result type to obtain the target shortest path set, specifically:
根据所述目标分类结果类型将所述替代分类器分为第一类决策树和第二类决策树;其中,所述第一类决策树的决策值与所述目标分类结果类型不一致,所述第二类决策树的决策值与所述目标分类结果类型相一致;According to the target classification result type, the alternative classifier is divided into a first type decision tree and a second type decision tree; wherein the decision value of the first type decision tree is inconsistent with the target classification result type, the The decision value of the second type of decision tree is consistent with the target classification result type;
利用最短路径算法,查找所述第一类决策树的第一类目标最短路径,得到第一目标最短路径集合,同时,利用最短路径算法,查找所述第二类决策树的第二类目标最短路径,得到第二目标最短路径集合。The shortest path algorithm is used to find the shortest path of the first target in the first type of decision tree to obtain the shortest path set of the first target. At the same time, the shortest path algorithm is used to find the shortest target of the second type of decision tree. Path, the shortest path set of the second target is obtained.
进一步地,所述第一目标最短路径集合中的特征被分配的权值为正数,所述第二目标最短路径集合中的特征被分配的权值为负数;Further, the assigned weight value of the feature in the first target shortest path set is a positive number, and the assigned weight value of the feature in the second target shortest path set is a negative number;
所述对各个特征的权值进行累加并比较每个特征的累计权值,得到所述最优修改特征,具体为:The step of accumulating the weight of each feature and comparing the cumulative weight of each feature to obtain the optimal modified feature is specifically:
对各个特征的权值进行累加并比较每个特征的累计权值,查找出累计权值最大的特征作为所述最优修改特征。The weight of each feature is accumulated and the cumulative weight of each feature is compared, and the feature with the largest cumulative weight is found as the optimal modified feature.
进一步地,所述第一目标最短路径集合中的特征按照公式1/10 n-1进行权值分配,其中,n代表该特征相对于所在决策路径的位置次序。 Further, the features in the first target shortest path set are weighted according to the formula 1/10 n-1 , where n represents the position order of the feature relative to the decision path.
可以理解的是,上述系统项实施例是与本发明方法项实施例相对应的,本发明提供的一种针对集成树分类器的逃避攻击装置,可以实现本发明任意一项方法项实施例所提供的针对集成树分类器的逃避攻击方法。It is understandable that the foregoing system item embodiment corresponds to the method item embodiment of the present invention. The present invention provides an evasion attack device for the integrated tree classifier, which can realize the method described in any method item embodiment of the present invention. Provides an evasion attack method for the integrated tree classifier.
相比于现有技术,本发明具有如下有益效果:Compared with the prior art, the present invention has the following beneficial effects:
本发明针对集成树这类不具有可微损失函数的分类模型,从决策树的决策结构入手,找出每个基分类器的决策路径,并从集成树分类器的决策路径集合中分析其能够误导其决策的关键特征,最后通过修改关键决策特征实现攻击。通过实施本发明实施例,能够对针对应用广泛的基于二进制特征的集成树分类器(梯度提升树、随机森林等)的黑盒攻击方法进行深入研究,从而为设计鲁棒的分类器提供依据和参考。Aiming at the classification model that does not have a differentiable loss function such as ensemble trees, the present invention starts with the decision structure of the decision tree, finds the decision path of each base classifier, and analyzes its performance from the decision path set of the ensemble tree classifier. Misleading the key features of its decision, and finally realize the attack by modifying the key decision features. Through the implementation of the embodiments of the present invention, it is possible to conduct in-depth research on black box attack methods for widely used binary feature-based integrated tree classifiers (gradient boosting trees, random forests, etc.), thereby providing a basis for designing robust classifiers and reference.
以上所述是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也视为本发明的保护范围。The above are the preferred embodiments of the present invention. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of the present invention, several improvements and modifications can be made, and these improvements and modifications are also considered This is the protection scope of the present invention.

Claims (10)

  1. 一种针对集成树分类器的逃避攻击方法,其特征在于,包括步骤:An evasion attack method for ensemble tree classifiers is characterized in that it comprises the steps:
    (1)获取原始输入样本、替代数据集和目标分类器的学习模型,其中,所述替代数据集为与目标分类器训练数据具有一致分布特征的数据集;(1) Obtain an original input sample, a replacement data set, and a learning model of the target classifier, wherein the replacement data set is a data set that has consistent distribution characteristics with the target classifier training data;
    (2)根据所述替代数据集和所述学习模型进行训练,得到替代分类器;(2) Training according to the replacement data set and the learning model to obtain a replacement classifier;
    (3)判断当前特征修改次数是否达到预设的最大修改次数阈值;若否,则根据所述替代分类器,利用最短路径算法以及预设的逃避攻击策略寻找最优修改特征,并根据所述最优修改特征对所述原始输入样本的对应特征进行修改,生成试探样本,执行步骤(4);若是,则结束运行;(3) Determine whether the current feature modification times reach the preset maximum modification times threshold; if not, according to the alternative classifier, use the shortest path algorithm and the preset evasion attack strategy to find the optimal modification feature, and according to the The optimal modification feature modifies the corresponding feature of the original input sample, generates a trial sample, and executes step (4); if it is, the operation ends;
    (4)利用所述目标分类器对所述试探样本进行分类得到试探分类结果,判断所述试探分类结果与预存的原始分类结果是否一致;若是,则执行步骤(3);若否,则输出所述试探样本;其中,所述原始分类结果为所述目标分类器对所述原始输入样本进行分类的结果。(4) Use the target classifier to classify the trial samples to obtain trial classification results, and determine whether the trial classification results are consistent with the pre-stored original classification results; if yes, perform step (3); if not, then output The trial sample; wherein the original classification result is a result of the target classifier classifying the original input sample.
  2. 如权利要求1所述的针对集成树分类器的逃避攻击方法,其特征在于,根据所述替代分类器,利用最短路径算法以及预设的逃避攻击策略寻找最优修改特征,具体为:The evasion attack method for the ensemble tree classifier according to claim 1, wherein, according to the alternative classifier, the shortest path algorithm and a preset evasion attack strategy are used to find the optimal modification feature, specifically:
    根据所述替代分类器对输入样本进行分类得到的分类结果类型,确定需要得到的目标分类结果类型;Determine the target classification result type that needs to be obtained according to the classification result type obtained by classifying the input sample by the alternative classifier;
    利用最短路径算法,根据所述目标分类结果类型查找每一决策树的目标最短决策路径,得到目标最短路径集合;Using the shortest path algorithm, search for the shortest target decision path of each decision tree according to the target classification result type, and obtain the shortest path set of the target;
    根据所述预设的逃避攻击策略对所述目标最短路径集合中的每个特征进行权值分配;Assigning weights to each feature in the shortest path set of the target according to the preset evasion attack strategy;
    对各个特征的权值进行累加并比较每个特征的累计权值,得到所述最优修改特征。The weights of each feature are accumulated and the accumulated weights of each feature are compared to obtain the optimal modified feature.
  3. 如权利要求2所述的针对集成树分类器的逃避攻击方法,其特征在于,所述目标最短路径集合包括第一目标最短路径集合和第二目标最短路径集合;The evasion attack method for the ensemble tree classifier according to claim 2, wherein the shortest path set for the target includes a shortest path set for a first target and a shortest path set for a second target;
    所述利用最短路径算法,根据所述目标分类结果类型查找每一决策树的目标最短决策路径,得到目标最短路径集合,具体为:The shortest path algorithm is used to search the target shortest decision path of each decision tree according to the target classification result type to obtain the target shortest path set, specifically:
    根据所述目标分类结果类型将所述替代分类器分为第一类决策树和第二类决策树;其中,所述第一类决策树的决策值与所述目标分类结果类型不一致,所述第二类决策树的决策值与所述目标分类结果类型相一致;According to the target classification result type, the alternative classifier is divided into a first type decision tree and a second type decision tree; wherein the decision value of the first type decision tree is inconsistent with the target classification result type, the The decision value of the second type of decision tree is consistent with the target classification result type;
    利用最短路径算法,查找所述第一类决策树的第一类目标最短路径,得到第一目标最短路径集合,同时,利用最短路径算法,查找所述第二类决策树的第二类目标最短路径,得到第二目标最短路径集合。The shortest path algorithm is used to find the shortest path of the first target in the first type of decision tree to obtain the shortest path set of the first target. At the same time, the shortest path algorithm is used to find the shortest target of the second type of decision tree. Path, the shortest path set of the second target is obtained.
  4. 如权利要求3所述的针对集成树分类器的逃避攻击方法,其特征在于,所述第一目标最短路径集合中的特征被分配的权值为正数,所述第二目标最短路径集合中的特征被分配的权值为负数;The evasion attack method for the ensemble tree classifier according to claim 3, wherein the weights assigned to the features in the first target shortest path set are positive numbers, and the second target shortest path set is The assigned weight of the feature is negative;
    所述对各个特征的权值进行累加并比较每个特征的累计权值,得到所述最优修改特征,具体为:The step of accumulating the weight of each feature and comparing the cumulative weight of each feature to obtain the optimal modified feature is specifically:
    对各个特征的权值进行累加并比较每个特征的累计权值,查找出累计权值最大的特征作为所述最优修改特征。The weight of each feature is accumulated and the cumulative weight of each feature is compared, and the feature with the largest cumulative weight is found as the optimal modified feature.
  5. 如权利要求4所述的针对集成树分类器的逃避攻击方法,其特征在于,所述第一目标最短路径集合中的特征按照公式1/10 n-1进行权值分配,其中,n代表该特征相对于所在决策路径的位置次序。 The evasion attack method for the ensemble tree classifier according to claim 4, wherein the features in the shortest path set of the first target are weighted according to the formula 1/10 n-1 , where n represents the The position order of the feature relative to the decision path.
  6. 一种针对集成树分类器的逃避攻击装置,其特征在于,包括数据获取模块、替代分类器训练模块、特征修改模块和逃避攻击试探模块;其中,An evasion attack device for an integrated tree classifier, which is characterized by comprising a data acquisition module, a replacement classifier training module, a feature modification module, and an evasion attack detection module; wherein,
    所述数据获取模块,用于获取原始输入样本、替代数据集和目标分类器的学习模型,其中,所述替代数据集为与目标分类器训练数据具有一致分布特征的数据集;The data acquisition module is used to acquire original input samples, a replacement data set, and a learning model of a target classifier, where the replacement data set is a data set that has consistent distribution characteristics with the target classifier training data;
    所述替代分类器训练模块,用于根据所述替代数据集和所述学习模型进行训练,得到替代分类器;The alternative classifier training module is configured to train according to the alternative data set and the learning model to obtain an alternative classifier;
    所述特征修改模块,用于判断当前特征修改次数是否达到预设的最大修改次数阈值;若否,则根据所述替代分类器,利用最短路径算法以及预设的逃避攻击策略寻找最优修改特征,并根据所述最优修改特征对所述原始输入样本的对应特征进行修改,生成试探样本;若是,则结束运行;The feature modification module is used to determine whether the current feature modification times reach the preset maximum modification times threshold; if not, according to the alternative classifier, the shortest path algorithm and the preset evasion attack strategy are used to find the optimal modification feature , And modify the corresponding feature of the original input sample according to the optimal modification feature to generate a trial sample; if yes, end the operation;
    所述逃避攻击试探模块,用于利用所述目标分类器对所述试探样本进行分类得到试探分类结果,判断所述试探分类结果与预存的原始分类结果是否一致;若是,则重复执行特征修改过程;若否,则输出所述试探样本;其中,所述原始分类结果为所述目标分类器对所述原始输入 样本进行分类的结果。The evasion attack detection module is used to classify the trial sample by using the target classifier to obtain a trial classification result, and determine whether the trial classification result is consistent with the prestored original classification result; if so, repeat the feature modification process If not, output the trial sample; wherein the original classification result is the result of the target classifier classifying the original input sample.
  7. 如权利要求6所述的针对集成树分类器的逃避攻击装置,其特征在于,根据所述替代分类器,利用最短路径算法以及预设的逃避攻击策略寻找最优修改特征,具体为:The evasion attack device for the ensemble tree classifier according to claim 6, characterized in that, according to the alternative classifier, the shortest path algorithm and a preset evasion attack strategy are used to find the optimal modification feature, specifically:
    根据所述替代分类器对输入样本进行分类得到的分类结果类型,确定需要得到的目标分类结果类型;Determine the target classification result type that needs to be obtained according to the classification result type obtained by classifying the input sample by the alternative classifier;
    利用最短路径算法,根据所述目标分类结果类型查找每一决策树的目标最短决策路径,得到目标最短路径集合;Using the shortest path algorithm, search for the shortest target decision path of each decision tree according to the target classification result type, and obtain the shortest path set of the target;
    根据所述预设的逃避攻击策略对所述目标最短路径集合中的每个特征进行权值分配;Assigning weights to each feature in the shortest path set of the target according to the preset evasion attack strategy;
    对各个特征的权值进行累加并比较每个特征的累计权值,得到所述最优修改特征。The weights of each feature are accumulated and the accumulated weights of each feature are compared to obtain the optimal modified feature.
  8. 如权利要求7所述的针对集成树分类器的逃避攻击装置,其特征在于,所述目标最短路径集合包括第一目标最短路径集合和第二目标最短路径集合;The evasion attack device for the integrated tree classifier according to claim 7, wherein the shortest path set of the target includes a shortest path set of a first target and a shortest path set of a second target;
    所述利用最短路径算法,根据所述目标分类结果类型查找每一决策树的目标最短决策路径,得到目标最短路径集合,具体为:The shortest path algorithm is used to search the target shortest decision path of each decision tree according to the target classification result type to obtain the target shortest path set, specifically:
    根据所述目标分类结果类型将所述替代分类器分为第一类决策树和第二类决策树;其中,所述第一类决策树的决策值与所述目标分类结果类型不一致,所述第二类决策树的决策值与所述目标分类结果类型相一致;According to the target classification result type, the alternative classifier is divided into a first type decision tree and a second type decision tree; wherein the decision value of the first type decision tree is inconsistent with the target classification result type, the The decision value of the second type of decision tree is consistent with the target classification result type;
    利用最短路径算法,查找所述第一类决策树的第一类目标最短路径,得到第一目标最短路径集合,同时,利用最短路径算法,查找所述第二类决策树的第二类目标最短路径,得到第二目标最短路径集合。The shortest path algorithm is used to find the shortest path of the first target in the first type of decision tree to obtain the shortest path set of the first target. At the same time, the shortest path algorithm is used to find the shortest target of the second type of decision tree. Path, the shortest path set of the second target is obtained.
  9. 如权利要求8所述的针对集成树分类器的逃避攻击装置,其特征在于,所述第一目标最短路径集合中的特征被分配的权值为正数,所述第二目标最短路径集合中的特征被分配的权值为负数;The evasion attack device for the ensemble tree classifier according to claim 8, wherein the weights assigned to the features in the first target shortest path set are positive numbers, and the second target shortest path set is The assigned weight of the feature is negative;
    所述对各个特征的权值进行累加并比较每个特征的累计权值,得到所述最优修改特征,具体为:The step of accumulating the weight of each feature and comparing the cumulative weight of each feature to obtain the optimal modified feature is specifically:
    对各个特征的权值进行累加并比较每个特征的累计权值,查找出累计权值最大的特征作为所述最优修改特征。The weight of each feature is accumulated and the cumulative weight of each feature is compared, and the feature with the largest cumulative weight is found as the optimal modified feature.
  10. 如权利要求9所述的针对集成树分类器的逃避攻击装置,其特征在于,所述第一目标最短路径集合中的特征按照公式1/10 n-1进行权值分配,其中,n代表该特征相对于所在决策路径的位置次序。 The evasion attack device for the ensemble tree classifier according to claim 9, wherein the features in the shortest path set of the first target are weighted according to the formula 1/10 n-1 , where n represents the The position order of the feature relative to the decision path.
PCT/CN2019/097532 2019-07-24 2019-07-24 Evasion attack method and device for integrated tree classifier WO2021012220A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/097532 WO2021012220A1 (en) 2019-07-24 2019-07-24 Evasion attack method and device for integrated tree classifier

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/097532 WO2021012220A1 (en) 2019-07-24 2019-07-24 Evasion attack method and device for integrated tree classifier

Publications (1)

Publication Number Publication Date
WO2021012220A1 true WO2021012220A1 (en) 2021-01-28

Family

ID=74192984

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/097532 WO2021012220A1 (en) 2019-07-24 2019-07-24 Evasion attack method and device for integrated tree classifier

Country Status (1)

Country Link
WO (1) WO2021012220A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11568317B2 (en) * 2020-05-21 2023-01-31 Paypal, Inc. Enhanced gradient boosting tree for risk and fraud modeling
CN118247493A (en) * 2024-05-23 2024-06-25 杭州海康威视数字技术股份有限公司 Fake picture detection and positioning method and device based on segmentation integrated learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7546278B2 (en) * 2006-03-13 2009-06-09 Microsoft Corporation Correlating categories using taxonomy distance and term space distance
CN108764267A (en) * 2018-04-02 2018-11-06 上海大学 A kind of Denial of Service attack detection method integrated based on confrontation type decision tree
CN109086791A (en) * 2018-06-25 2018-12-25 阿里巴巴集团控股有限公司 A kind of training method, device and the computer equipment of two classifiers
CN109615020A (en) * 2018-12-25 2019-04-12 深圳前海微众银行股份有限公司 Characteristic analysis method, device, equipment and medium based on machine learning model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7546278B2 (en) * 2006-03-13 2009-06-09 Microsoft Corporation Correlating categories using taxonomy distance and term space distance
CN108764267A (en) * 2018-04-02 2018-11-06 上海大学 A kind of Denial of Service attack detection method integrated based on confrontation type decision tree
CN109086791A (en) * 2018-06-25 2018-12-25 阿里巴巴集团控股有限公司 A kind of training method, device and the computer equipment of two classifiers
CN109615020A (en) * 2018-12-25 2019-04-12 深圳前海微众银行股份有限公司 Characteristic analysis method, device, equipment and medium based on machine learning model

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11568317B2 (en) * 2020-05-21 2023-01-31 Paypal, Inc. Enhanced gradient boosting tree for risk and fraud modeling
US11893465B2 (en) 2020-05-21 2024-02-06 Paypal, Inc. Enhanced gradient boosting tree for risk and fraud modeling
CN118247493A (en) * 2024-05-23 2024-06-25 杭州海康威视数字技术股份有限公司 Fake picture detection and positioning method and device based on segmentation integrated learning

Similar Documents

Publication Publication Date Title
Weber et al. Rab: Provable robustness against backdoor attacks
Rabbani et al. A hybrid machine learning approach for malicious behaviour detection and recognition in cloud computing
CN112738015B (en) Multi-step attack detection method based on interpretable convolutional neural network CNN and graph detection
Idhammad et al. Semi-supervised machine learning approach for DDoS detection
CN112200243B (en) Black box countermeasure sample generation method based on low query image data
Singh et al. An efficient approach for intrusion detection in reduced features of KDD99 using ID3 and classification with KNNGA
WO2021012220A1 (en) Evasion attack method and device for integrated tree classifier
CN110458209B (en) Attack evasion method and device for integrated tree classifier
Do Xuan et al. Optimization of network traffic anomaly detection using machine learning.
Ao Using machine learning models to detect different intrusion on NSL-KDD
Desai et al. Iot bonet and network intrusion detection using dimensionality reduction and supervised machine learning
Harbola et al. Improved intrusion detection in DDoS applying feature selection using rank & score of attributes in KDD-99 data set
Guo et al. An IoT intrusion detection system based on TON IoT network dataset
Farrahi et al. KCMC: A hybrid learning approach for network intrusion detection using K-means clustering and multiple classifiers
Ensafi et al. Optimizing fuzzy k-means for network anomaly detection using pso
Bhuyan et al. Towards an unsupervised method for network anomaly detection in large datasets
Grill Combining network anomaly detectors
Sharma et al. Recent trend in Intrusion detection using Fuzzy-Genetic algorithm
Taylor et al. A smart system for detecting behavioural botnet attacks using random forest classifier with principal component analysis
Leevy et al. IoT attack prediction using big Bot-IoT data
Memon et al. A design and implementation of new hybrid system for anomaly intrusion detection system to improve efficiency
Tien et al. Automatic device identification and anomaly detection with machine learning techniques in smart factories
Sugi et al. Optimal feature selection in intrusion detection using SVM-CA
Li et al. MetaIoT: Few Shot Malicious Traffic Detection in Internet of Things Networks Based on HIN
Dantwala et al. A Novel Technique to Detect URL Phishing based on Feature Count

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19938606

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19938606

Country of ref document: EP

Kind code of ref document: A1