CN116886443B - Opponent action preference estimation method, device and electronic equipment for offensive and defensive games - Google Patents

Opponent action preference estimation method, device and electronic equipment for offensive and defensive games Download PDF

Info

Publication number
CN116886443B
CN116886443B CN202311123325.4A CN202311123325A CN116886443B CN 116886443 B CN116886443 B CN 116886443B CN 202311123325 A CN202311123325 A CN 202311123325A CN 116886443 B CN116886443 B CN 116886443B
Authority
CN
China
Prior art keywords
distribution
network
information set
action
attacker
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311123325.4A
Other languages
Chinese (zh)
Other versions
CN116886443A (en
Inventor
陈少飞
胡振震
袁唯淋
陆丽娜
吉祥
李鹏
陈佳星
陈璟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202311123325.4A priority Critical patent/CN116886443B/en
Publication of CN116886443A publication Critical patent/CN116886443A/en
Application granted granted Critical
Publication of CN116886443B publication Critical patent/CN116886443B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer And Data Communications (AREA)

Abstract

本申请涉及一种面向攻防博弈的对手行动偏好估计方法、装置和电子设备,该方法包括:统计前一轮次网络攻防过程中攻防双方的行动序列的类型和比例;根据所有的双方行动序列的比例,对行动序列的概率进行二维近似估计;根据行动序列的概率和攻防双方信息集的联合分布,确定前一轮次结束时网络攻击方的信息集的后验分布;根据光滑处理后的该后验分布和轮次之间信息集的函数关系,得到后一轮次开始时网络攻击方的信息集的分布;采用根据信息集分布建立的攻击方对手策略模型推测攻击方行动,防御方据此采取最优的对抗策略进行防御。该方法提高了预测攻击方行动的准确性,进而提高我方防御策略的针对性,从而提升网络防御的能力和效果。

This application relates to an opponent's action preference estimation method, device and electronic equipment for offensive and defensive games. The method includes: counting the types and proportions of action sequences of both offense and defense parties in the previous round of network attack and defense; based on all action sequences of both parties. Proportion, make a two-dimensional approximate estimate of the probability of the action sequence; according to the probability of the action sequence and the joint distribution of the information sets of both offense and defense, determine the posterior distribution of the information set of the network attacker at the end of the previous round; according to the smoothed The functional relationship between the posterior distribution and the information set between rounds is to obtain the distribution of the network attacker's information set at the beginning of the next round; the attacker's opponent strategy model established based on the information set distribution is used to infer the attacker's actions and the defender's Based on this, adopt the optimal confrontation strategy for defense. This method improves the accuracy of predicting the attacker's actions, thereby improving the pertinence of our defense strategy, thereby improving the capability and effectiveness of network defense.

Description

面向攻防博弈的对手行动偏好估计方法、装置和电子设备Opponent action preference estimation method, device and electronic equipment for offensive and defensive games

技术领域Technical field

本申请涉及网络安全技术领域,特别是涉及一种面向攻防博弈的对手行动偏好估计方法、装置和电子设备。The present application relates to the field of network security technology, and in particular to an opponent action preference estimation method, device and electronic device for offensive and defensive games.

背景技术Background technique

随着信息化程度的不断加强,网络给人们提供了更多便利,但是网络攻击也日趋频繁,给受攻击者造成巨大损失,因此网络信息攻防对抗成为了网络安全关注的重要问题之一。网络攻防是一个双方不完全信息多阶段随机重复博弈的过程,网络攻击能否成功,除了攻击能力的强弱外,针对性的防御措施也是重要的因素,因此网络攻防过程也是双方对抗、博弈的过程,重建对手的策略模型预测网络攻防博弈中网络攻击者的攻击策略具有挑战性。With the continuous strengthening of informatization, the network has provided people with more conveniences, but network attacks have become increasingly frequent, causing huge losses to the attackers. Therefore, network information attack and defense confrontation has become one of the important issues of network security concern. Network attack and defense is a process of multi-stage random repeated games with incomplete information on both sides. Whether a network attack can succeed depends not only on the strength of the attack capability, but also on targeted defense measures. Therefore, the process of network attack and defense is also a process of confrontation and game between the two sides. The process of reconstructing the opponent's strategy model to predict the attack strategy of the network attacker in the network attack and defense game is challenging.

在网络攻防过程由于信息的不完全,网络攻防双方不能完整地观测到对手的隐藏信息,使得在利用已结束博弈局观测数据显式地构建对手的策略模型(即行动概率模型)时,需要利用某种方法将隐藏信息未观测到的对手信息集和隐藏信息观测到的对手信息集与对手行动关联起来(在博弈中,任一博弈方的信息集是指其行动时由于对方部分信息不可知而导致不能确定的历史信息的集合。重建对手的策略模型实质是建立对手的信息集与对手行动的关联概率模型,而对手的信息集由对手的隐藏信息决定)。当前基于决策点的对手策略模型显式重建方法,将博弈中不同的信息集聚合在决策点中,利用其独立同分布性质,将隐藏信息未观测到的对手信息集和隐藏信息观测到的对手信息集统一成一个分布考虑,进而利用各行动对应的信息集的概率密度估计获得不同行动的概率模型。Due to incomplete information in the network attack and defense process, both network attackers and defenders cannot completely observe the opponent's hidden information, which makes it necessary to use the observation data of the completed game to explicitly build the opponent's strategy model (i.e., action probability model). Some method associates the opponent's information set that is not observed with hidden information and the opponent's information set that is observed with hidden information and the opponent's actions (in a game, the information set of any player refers to the unknowable part of the opponent's information when it acts) This leads to a collection of undetermined historical information. The essence of reconstructing the opponent's strategy model is to establish a correlation probability model between the opponent's information set and the opponent's actions, and the opponent's information set is determined by the opponent's hidden information). The current explicit reconstruction method of opponent strategy model based on decision points aggregates different information sets in the game in decision points, and uses its independent and identical distribution properties to combine the opponent information set with hidden information that has not been observed and the opponent with hidden information that has been observed. The information set is unified into a distribution and then the probability density estimation of the information set corresponding to each action is used to obtain the probability model of different actions.

然而该方法假设决策点内的信息集的分布是均匀分布,这一过强的假设会使得行动的概率密度估计存在偏差,从而使得重建的对手模型的准确性受限。这是因为(1)在一个阶段内部,博弈双方的行动是有偏好的,因而对手的当前行动受到博弈双方前面的行动的影响,使得对手当前决策点内的信息集的分布并不是均匀分布; (2) 在阶段转换后,由于随机性因素的存在,一定程度上缓解了当前阶段信息集的分布受前面阶段双方行动偏好的影响,但仍然不可完全忽略。这种双方行动偏好的影响所带来的对手信息集的不均匀分布使得采用均匀分布假设的对手行动概率模型存在一定的偏差。However, this method assumes that the distribution of the information set within the decision point is uniform. This too strong assumption will bias the probability density estimate of the action, thus limiting the accuracy of the reconstructed opponent model. This is because (1) within a stage, the actions of both parties in the game have preferences, so the opponent's current action is affected by the previous actions of both parties in the game, so that the distribution of the information set within the opponent's current decision point is not uniformly distributed; (2) After the stage transition, due to the existence of random factors, the distribution of the information set in the current stage is affected to a certain extent by the action preferences of both parties in the previous stage, but it still cannot be completely ignored. The uneven distribution of the opponent's information set caused by the influence of both sides' action preferences causes certain deviations in the opponent's action probability model that adopts the assumption of uniform distribution.

综上,现有基于决策点的对手策略显式重建方法由于决策点内信息集的均匀分布假设,与受偏好影响的对手信息集不均匀分布的实际情况不符,使得建立的对手策略模型准确性受限,导致对对手行动的推测产生偏差,使得防御方采用的防御策略的针对性较差,影响网络防御的能力和效果。In summary, the existing explicit reconstruction method of opponent strategy based on decision points is inconsistent with the actual situation of uneven distribution of opponent information sets affected by preferences due to the uniform distribution assumption of the information set within the decision point, which makes the accuracy of the established opponent strategy model Restrictions lead to deviations in the prediction of the opponent's actions, making the defense strategies adopted by the defender less targeted, affecting the ability and effectiveness of network defense.

发明内容Contents of the invention

基于此,有必要针对上述技术问题,提供一种面向攻防博弈的对手行动偏好估计方法、装置和电子设备。Based on this, it is necessary to provide an opponent action preference estimation method, device and electronic equipment for offensive and defensive games in response to the above technical problems.

一种面向攻防博弈的对手行动偏好估计方法,所述方法包括:An opponent action preference estimation method for offensive and defensive games, the method includes:

统计前一轮次网络攻防过程中网络攻防双方的行动序列的类型和比例。Count the types and proportions of action sequences of both network attackers and defenders in the previous round of network attack and defense.

根据所有的双方行动序列的比例,对行动序列的概率进行二维近似估计。Based on the proportion of all action sequences of both parties, a two-dimensional approximate estimate of the probability of the action sequence is made.

根据所述行动序列的概率和网络攻防双方信息集的联合分布,确定前一轮次结束时网络攻击方的信息集的后验分布。According to the probability of the action sequence and the joint distribution of the information sets of the network attacker and defender, the posterior distribution of the network attacker's information set at the end of the previous round is determined.

根据光滑处理后的所述网络攻击方的信息集的后验分布和轮次之间信息集的函数关系,得到后一轮次开始时网络攻击方的信息集的分布,并作为该轮次对手一阶决策点内的信息集分布。According to the smoothed posterior distribution of the network attacker's information set and the functional relationship between the information sets between rounds, the distribution of the network attacker's information set at the beginning of the next round is obtained, and is used as the opponent in that round Distribution of information sets within first-order decision points.

根据网络攻击方的信息集的分布,建立网络攻击方的对手策略模型,并采用网络攻击方的对手策略模型对网络攻击方行动进行推测,防御方根据网络攻击方行动采用最优的对抗策略进行防御。According to the distribution of the network attacker's information set, the network attacker's opponent strategy model is established, and the network attacker's opponent strategy model is used to speculate on the network attacker's actions. The defender adopts the optimal confrontation strategy based on the network attacker's actions. defense.

在其中一个实施例中,统计前一轮次网络攻防过程中网络攻防双方的行动序列的类型和比例,包括:In one embodiment, the types and proportions of action sequences of both network attackers and defenders in the previous round of network attack and defense are counted, including:

统计网络攻防双方多局历史对抗过程中观察到的数据中前一轮次网络攻防双方的行动序列的类型和数量;Statistics of the type and number of action sequences of both network attackers and defenders in the previous round in the data observed during multiple historical confrontations between network attackers and defenders;

根据所有所述行动序列的类型和数量,确定每类行动序列的比例为:Based on the type and number of all said action sequences, the proportion of each type of action sequence is determined as:

;

其中,为行动序列acts的数量占总行动序列数量的比例,/>为一类行动序列的数量;/>为总行动序列数量;/>为网络攻防双方信息集的联合分布下的行动序列条件概率;/>为网络攻防双方信息集的联合分布;/>分别为前一轮次网络攻防双方的信息集,下标P代表防御方,下标O代表网络攻击方即对手,下标pre表示前一轮次。in, is the ratio of the number of action sequence acts to the total number of action sequences,/> is the number of action sequences of a type;/> is the total number of action sequences;/> It is the conditional probability of the action sequence under the joint distribution of the information sets of the network attacker and defender;/> It is the joint distribution of information sets between network attackers and defenders;/> They are the information sets of the previous round of network attack and defense respectively. The subscript P represents the defender, the subscript O represents the network attacker or opponent, and the subscript pre represents the previous round.

在其中一个实施例中,根据所有的双方行动序列的比例,对前一轮次内的行动序列的概率进行二维近似估计,包括:In one embodiment, a two-dimensional approximate estimate of the probability of the action sequence in the previous round is performed based on the proportion of all action sequences of both parties, including:

根据双方行动的一般决策逻辑,设置行动序列在对应的信息集二维区间内的近似概率为:According to the general decision-making logic of the actions of both parties, the approximate probability of setting the action sequence within the two-dimensional interval of the corresponding information set is:

;

其中,为信息集二维空间上行动序列的近似概率,/>、/>分别为网络攻防双方的信息集;/>、/>分别为网络攻防双方行动序列/>所对应的信息集二维区间范围,每个行动序列acts的概率应满足:in, is the approximate probability of the action sequence in the two-dimensional space of the information set,/> ,/> They are the information sets of the network attacker and defender respectively;/> ,/> The sequence of actions of both network offensive and defensive parties/> The corresponding two-dimensional interval range of the information set, the probability of each action sequence acts should satisfy:

;

其中,为行动序列的比例,/>为前一轮次内网络攻防双方信息集的先验联合分布。in, is the proportion of the action sequence,/> It is the prior joint distribution of the information sets of the network attacker and defender in the previous round.

根据所有的双方行动序列的比例、行动的一般决策逻辑,由两式迭代求得网络攻防双方的每个行动序列所对应的信息集二维区间范围,进而获得完整信息集二维区间上的行动序列的近似概率。According to the proportion of all action sequences of both parties and the general decision-making logic of the actions, each action sequence of the network offensive and defensive parties is obtained by iterating with two equations The corresponding two-dimensional interval range of the information set is used to obtain the approximate probability of the action sequence on the two-dimensional interval of the complete information set.

在其中一个实施例中,根据所述行动序列的概率和网络攻防双方信息集的联合分布,确定前一轮次结束时网络攻击方的信息集的后验分布,包括:In one embodiment, based on the probability of the action sequence and the joint distribution of the information sets of the network attacker and defender, the posterior distribution of the network attacker's information set at the end of the previous round is determined, including:

根据信息集二维空间上行动序列的近似概率,以及能进入下一轮次的行动序列集合Acts,求得前一轮次结束时的双方信息集的后验联合分布:Approximate probability of action sequence in two-dimensional space based on information set , and the set of action sequences Acts that can enter the next round, and obtain the posterior joint distribution of the information sets of both parties at the end of the previous round:

;

其中,为前一轮次结束时的双方信息集的后验联合分布,/>为前一轮次内网络攻防双方信息集的先验联合分布,为信息集二维空间上行动序列的近似概率,/>为下一轮次的行动序列集合,/>为每个行动序列;in, is the posterior joint distribution of the information sets of both parties at the end of the previous round,/> is the prior joint distribution of the information sets of both network attackers and defenders in the previous round, is the approximate probability of the action sequence in the two-dimensional space of the information set,/> Set the action sequence for the next round,/> for each sequence of actions;

网络攻击方的信息集后验分布则根据联合后验分布的边缘分布得到:The posterior distribution of the information set of the network attacker is obtained based on the marginal distribution of the joint posterior distribution:

;

其中,为前一轮次网络攻击方的信息集分布。in, It is the information set distribution of the network attacker in the previous round.

在其中一个实施例中,根据光滑处理后的所述网络攻击方的信息集的后验分布和轮次之间信息集的函数关系,得到后一轮次开始时网络攻击方的信息集的分布,包括:In one embodiment, according to the smoothed posterior distribution of the network attacker's information set and the functional relationship between the information sets between rounds, the distribution of the network attacker's information set at the beginning of the next round is obtained. ,include:

将所述网络攻击方的信息集的后验分布采用预设核函数进行光滑处理,得到光滑的非线性密度函数为:The posterior distribution of the network attacker's information set is smoothed using a preset kernel function, and a smooth nonlinear density function is obtained:

;

其中,为前一轮次光滑的非线性密度函数,/>为预设核函数,下标j为对信息集均匀离散化后的索引,/>为前一轮次网络攻击方的信息集的后验分布,/>为前一轮次网络攻击方信息集。in, is the smooth nonlinear density function of the previous round,/> is the preset kernel function, and the subscript j is the index after uniform discretization of the information set,/> is the posterior distribution of the information set of the network attacker in the previous round,/> It is the information set of the network attacker in the previous round.

根据光滑处理后的网络攻击方的信息集的后验分布和轮次之间信息集的函数关系,得到后一轮次开始时网络攻击方的信息集的分布,并作为该轮次对手一阶决策点内的信息集分布:According to the smoothed posterior distribution of the network attacker's information set and the functional relationship between the information sets between rounds, the distribution of the network attacker's information set at the beginning of the next round is obtained, and is used as the first-order opponent of the round. Distribution of information sets within decision points:

;

其中,为后一轮次开始时网络攻击方的信息集的分布,为前一轮次和后一轮次之间信息集的函数关系,/>为后一轮次开始时网络攻击方的信息集。in, is the distribution of the network attacker’s information set at the beginning of the next round, is the functional relationship between the information set between the previous round and the next round,/> It is the information set of the network attacker at the beginning of the next round.

在其中一个实施例中,所述方法还包括二阶决策点内对手行动偏好估计步骤,具体包括:In one of the embodiments, the method further includes a step of estimating the opponent's action preference within the second-order decision point, specifically including:

统计二阶决策点前网络攻防过程中所有行动序列的类型和比例;Statistics of the types and proportions of all action sequences in the network attack and defense process before the second-order decision point;

根据当前轮次内二阶决策点前所有的决策点行动序列的比例,对网络攻防双方行动序列的概率进行二维近似估计;Based on the proportion of all decision point action sequences before the second-order decision point in the current round, a two-dimensional approximate estimate is made of the probability of the action sequences of both network attackers and defenders;

根据所述网络攻防双方行动序列的概率和网络攻防双方信息集的联合先验分布,确定受前面行动偏好影响的二阶决策点内的双方信息集联合后验分布,并根据联合后验分布的边缘分布确定网络攻击方二阶决策点内的信息集分布。在其中一个实施例中,根据当前轮次内二阶决策点前所有的行动序列的比例,对网络攻防双方行动序列的概率进行二维近似估计,包括:According to the probability of the action sequence of both sides of the network and the joint prior distribution of the information sets of both sides of the network, determine the joint posterior distribution of the information sets of both sides within the second-order decision point affected by the previous action preference, and according to the joint posterior distribution The marginal distribution determines the distribution of information sets within the second-order decision point of the network attacker. In one embodiment, based on the proportion of all action sequences before the second-order decision point in the current round, a two-dimensional approximate estimate is made of the probability of the action sequences of both the network attacker and defender, including:

根据双方行动的一般决策逻辑,设置行动序列在对应的信息集二维区间内为均匀分布,根据行动序列的比例,确定所有行动序列对应的二维信息集区间。According to the general decision-making logic of the actions of both parties, the action sequences are set to be uniformly distributed within the corresponding two-dimensional information set intervals. According to the proportion of the action sequences, the two-dimensional information set intervals corresponding to all action sequences are determined.

在其中一个实施例中,根据所述网络攻防双方行动序列的概率和网络攻防双方信息集的联合先验分布,确定受前面行动偏好影响的二阶决策点内的双方信息集联合后验分布,并根据联合后验分布的边缘分布确定网络攻击方二阶决策点内的信息集分布,包括:In one embodiment, based on the probability of the action sequence of the network attacker and defender and the joint prior distribution of the information sets of both network attackers and defenders, the joint posterior distribution of the information sets of both parties within the second-order decision point affected by the previous action preference is determined, And determine the information set distribution within the second-order decision point of the network attacker based on the marginal distribution of the joint posterior distribution, including:

根据能进入二阶决策点的行动序列,确定二阶决策点内的网络攻防双方信息集的联合后验分布:According to the action sequence that can enter the second-order decision point , determine the joint posterior distribution of the information sets of both network attackers and defenders within the second-order decision point:

;

其中,为二阶决策点内的网络攻防双方信息集的联合后验分布,/>为二阶决策点前的能进入二阶决策点的网络攻防双方行动序列的条件分布,/>为当前轮次开始时的双方信息集联合先验分布,/>为行动序列/>的全概率,等于/>in, is the joint posterior distribution of the information sets of both network attackers and defenders within the second-order decision point,/> is the conditional distribution of the action sequences of both network attackers and defenders before the second-order decision point that can enter the second-order decision point,/> is the joint prior distribution of the information sets of both parties at the beginning of the current round,/> For action sequences/> The total probability of is equal to/> ;

根据联合后验分布的边缘分布确定网络攻击方二阶决策点内的信息集分布:Determine the information set distribution within the second-order decision point of the network attacker based on the marginal distribution of the joint posterior distribution:

其中,为网络攻击方二阶决策点内的信息集分布。in, is the information set distribution within the second-order decision point of the network attacker.

一种面向攻防博弈的对手行动偏好估计装置,所述装置包括:An opponent action preference estimation device for offensive and defensive games, the device includes:

网络攻防双方的行动序列的比例确定模块,用于统计前一轮次网络攻防过程中网络攻防双方的行动序列的类型和比例。The ratio determination module of the action sequences of both network attackers and defenders is used to count the types and ratios of the action sequences of both network attackers and defenders in the previous round of network attack and defense.

网络攻防双方的行动序列的概率估计模块,用于根据所有的双方行动序列的比例,对前一轮次内的行动序列的概率进行二维近似估计。The probability estimation module of the action sequences of both sides of the network is used to make a two-dimensional approximate estimate of the probability of the action sequence in the previous round based on the proportion of all action sequences of both parties.

网络攻击方的当前轮次信息集的后验分布确定模块,用于根据所述行动序列的概率和网络攻防双方信息集的联合分布,确定前一轮次结束时网络攻击方的信息集的后验分布。The posterior distribution determination module of the current round information set of the network attacker is used to determine the posterior distribution of the network attacker's information set at the end of the previous round based on the probability of the action sequence and the joint distribution of the information sets of both network attackers and defenders. empirical distribution.

网络攻击方的后一轮次信息集分布确定模块,用于根据光滑处理后的所述网络攻击方的信息集的后验分布和轮次之间信息集的函数关系,得到后一轮次开始时网络攻击方的信息集的分布,并作为该轮次对手一阶决策点内的信息集分布。The module for determining the distribution of the information set of the network attacker in the next round is used to obtain the start of the next round based on the smoothed posterior distribution of the information set of the network attacker and the functional relationship between the information sets between rounds. The distribution of the information set of the network attacker at that time is used as the distribution of the information set within the first-order decision point of the opponent in this round.

对手行动推测模块,用于根据网络攻击方的决策点内的信息集分布,建立网络攻击方的对手策略模型,并采用网络攻击方的对手策略模型对网络攻击方行动进行推测,防御方根据网络攻击方行动采用最优的对抗策略进行防御。The adversary action prediction module is used to establish the adversary strategy model of the network attacker based on the information set distribution within the decision point of the network attacker, and use the adversary strategy model of the network attacker to speculate on the actions of the network attacker. The attacker adopts the optimal confrontation strategy for defense.

一种电子设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现上述任一所述的方法。An electronic device includes a memory and a processor. The memory stores a computer program. When the processor executes the computer program, it implements any of the above methods.

上述面向攻防博弈的对手行动偏好估计方法、装置和电子设备,所述方法包括:统计前一轮次网络攻防过程中网络攻防双方的行动序列的类型和比例;根据所有的双方行动序列的比例,对前一轮次内的行动序列的概率进行二维近似估计;根据行动序列的概率和网络攻防双方信息集的联合分布,确定前一轮次结束时网络攻击方的信息集的后验分布;根据光滑处理后的网络攻击方的信息集的后验分布和轮次之间信息集的函数关系,得到后一轮次开始时网络攻击方的信息集的分布,并作为该轮次对手一阶决策点内的信息集分布;根据网络攻击方的决策点内的信息集分布,建立网络攻击方的对手策略模型,并采用网络攻击方的对手策略模型对网络攻击方行动进行推测,防御方根据网络攻击方行动采用最优的对抗策略进行防御。该方法突破了现有决策点内信息集均匀分布假设的局限,能有效改善对手策略显式重建的准确性,提高了预测攻击方行动的准确性,进而提高我方采用防御策略的针对性,提升网络防御的能力和效果。The above-mentioned opponent action preference estimation method, device and electronic equipment for offensive and defensive games, the method includes: counting the types and proportions of action sequences of both sides of the network in the previous round of network attack and defense; based on the proportion of all action sequences of both parties, Make a two-dimensional approximate estimate of the probability of the action sequence in the previous round; determine the posterior distribution of the network attacker's information set at the end of the previous round based on the probability of the action sequence and the joint distribution of the information sets of the network attacker and defender; According to the smoothed posterior distribution of the network attacker's information set and the functional relationship between the information sets between rounds, the distribution of the network attacker's information set at the beginning of the next round is obtained, and is used as the first-order opponent of the round. The information set distribution within the decision point; based on the information set distribution within the decision point of the network attacker, the network attacker's opponent strategy model is established, and the network attacker's opponent strategy model is used to speculate on the network attacker's actions, and the defender is based on Network attackers use optimal countermeasures to defend themselves. This method breaks through the limitations of the existing assumption of uniform distribution of information sets within decision points, can effectively improve the accuracy of explicit reconstruction of the opponent's strategy, improves the accuracy of predicting the attacker's actions, and thereby improves the pertinence of our defensive strategy. Improve the capabilities and effectiveness of network defense.

附图说明Description of the drawings

图1为一个实施例中面向攻防博弈的对手行动偏好估计方法的流程示意图;Figure 1 is a schematic flow chart of an opponent's action preference estimation method for offensive and defensive games in one embodiment;

图2为一个实施例中网络攻防过程中对手二阶决策点内行动偏好估计方法的流程示意图;Figure 2 is a schematic flowchart of an opponent's action preference estimation method within the second-order decision point in the network attack and defense process in one embodiment;

图3为一个实施例中面向攻防博弈的对手行动偏好估计装置的结构框图;Figure 3 is a structural block diagram of an opponent action preference estimation device for offensive and defensive games in one embodiment;

图4为一个实施例中电子设备的内部结构图。Figure 4 is an internal structure diagram of an electronic device in one embodiment.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clear, the present application will be further described in detail below with reference to the drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application and are not used to limit the present application.

在网络攻防过程中,网络攻击方的决策实际是在信息集上做出的,重建网络攻击方的策略模型实质是建立网络攻击方的信息集与网络攻击方行动的关联概率模型。In the process of network attack and defense, the network attacker's decision is actually made based on the information set. The essence of reconstructing the network attacker's strategy model is to establish the correlation probability model between the network attacker's information set and the network attacker's actions.

由于网络攻防中随机性因素,一局网络攻防对抗开始时,网络攻击方的隐藏信息是均匀分布的,因而隐藏信息决定的信息集也是均匀分布的,所以网络攻防刚开始时,决策点内的信息集是均匀分布的。但当网络攻防双方开展一定的行动后,只要网络攻防双方的行动不是随机选择的,即存在一定的行动偏好,那么后续的决策点内的网络攻击方信息集分布必然受到影响,而不再是均匀分布的,因而要构建更准确的行动概率模型,必须要考虑这种偏好带来的影响。而这种影响的结果实质是决策点内的信息集的不均匀分布,若能准确估计出网络攻击方信息集的具体分布,则等价于估计了网络攻击方行动偏好的影响。Due to the random factors in network attack and defense, at the beginning of a network attack and defense confrontation, the network attacker's hidden information is evenly distributed, so the information set determined by the hidden information is also evenly distributed. Therefore, when the network attack and defense just begins, the decision point within the The information set is evenly distributed. However, when both the network attacker and defender carry out certain actions, as long as the actions of the network attacker and defender are not randomly selected, that is, there is a certain action preference, then the distribution of the network attacker's information set in subsequent decision points will inevitably be affected, instead of uniformly distributed, so to build a more accurate action probability model, the impact of this preference must be considered. The result of this influence is essentially the uneven distribution of the information set within the decision point. If the specific distribution of the network attacker's information set can be accurately estimated, it is equivalent to estimating the impact of the network attacker's action preferences.

将信息集视作随机变量,将行动偏好估计问题转换为决策点内对手的信息集分布的估计问题,区分一阶决策点和二阶决策点,考虑轮次转换时前一轮次的双方行动的行动偏好影响和一个轮次内二阶决策点前的双方行动的偏好影响,通过前一轮次的双方行动序列和当前轮次二阶决策点前的双方行动序列的概率统计,利用双方行动序列的二维近似概率估计,获得到达后一轮次或到达当前轮次二阶决策点的行动序列条件下的双方信息集联合后验分布,然后利用求解边缘分布获得对手的信息集分布,前者作为该轮次对手一阶决策点内的信息集分布,后者作为该轮次对手当前二阶决策点内的信息集分布。本方法利用贝叶斯建模的思路考虑了博弈中对手信息集分布受到双方行动偏好影响的现实问题,实现了对决策点内对手信息集分布的估计,突破了现有决策点内信息集均匀分布假设的局限,能有效改善对手策略显式重建的准确性。Treat the information set as a random variable, convert the action preference estimation problem into the estimation problem of the opponent's information set distribution within the decision point, distinguish the first-order decision point and the second-order decision point, and consider the actions of both parties in the previous round when the round is converted. The influence of action preferences and the preference influence of the actions of both parties before the second-order decision point in a round, through the probability statistics of the action sequence of both parties in the previous round and the action sequence of both parties before the second-order decision point in the current round, using the actions of both parties The two-dimensional approximate probability estimate of the sequence is used to obtain the joint posterior distribution of the information sets of both parties under the condition of the action sequence reaching the next round or the second-order decision point of the current round, and then using the solution marginal distribution to obtain the opponent's information set distribution. The former As the information set distribution within the first-order decision point of the opponent in this round, the latter is used as the information set distribution within the current second-order decision point of the opponent in this round. This method uses the idea of Bayesian modeling to consider the practical problem that the distribution of the opponent's information set in the game is affected by the action preferences of both parties, realizes the estimation of the distribution of the opponent's information set within the decision point, and breaks through the existing uniform information set within the decision point. The limitations of distribution assumptions can effectively improve the accuracy of explicit reconstruction of opponent strategies.

将信息集作为随机变量来考虑其分布,前一轮次的信息集分布可以用一个密度函数表示(由于信息集数量很大,所以可以用一个连续的随机变量I表示,其方法是将所有信息集根据其某一特征量(如期望赢率)的大小按顺序映射到0-1实数范围上),而后一个轮次的信息集分布也可以用一个密度函数/>表示。由于阶段之间存在随机性因素,后一轮次的信息集不等同于前一轮次的信息集,所以前一轮次的信息集和后一轮次的信息集之间存在一个函数关系/>,若该函数是单调的,则两个轮次信息集的分布之间存在关系为:/>,这个函数可以通过遍历信息集来获得。后一轮次起始时的信息集分布可以由该函数关系从前一轮次结束时的信息集分布得到。Consider the distribution of the information set as a random variable. The distribution of the information set in the previous round can be used as a density function Representation (Since the number of information sets is large, it can be represented by a continuous random variable I. The method is to map all information sets to the 0-1 real number range in order according to the size of a certain characteristic quantity (such as the expected winning rate). (above), and the information set distribution of the next round can also use a density function/> express. Due to the random factors between stages, the information set of the later round is not equal to the information set of the previous round, so there is a functional relationship between the information set of the previous round and the information set of the later round/ > , if the function is monotonic, the relationship between the distributions of the two rounds of information sets is:/> , this function can be obtained by traversing the information set. The information set distribution at the beginning of the next round can be obtained from the information set distribution at the end of the previous round by this functional relationship.

在前一轮次刚开始时的初始分布下,由于网络攻防双方行动的偏好影响,在该轮次结束时信息的分布已经变化为了/>。这种由于行动偏好的影响所产生的分布变化,可以用贝叶斯原理来建模。即轮次刚开始时的初始分布可以看作是先验,而轮次结束时的分布可以看作是后验,该轮次观测到的所有行动序列的概率可以看作是在先验下的条件概率。由于前一轮次内双方的某些行动可能导致当前博弈局提前结束而不会进入到下一轮次,因此,后一轮次的信息集分布是在前一轮次结束时能进入到后一轮次的行动序列观测的后验分布基础上转换而来。因而,我们需要根据能进入到后一轮次的行动序列的观测来得到前一轮次结束时对手信息集的后验分布。The initial distribution at the beginning of the previous round Under the influence of the action preferences of both network attackers and defenders, the distribution of information at the end of this round has changed to/> . This distribution change due to the influence of action preferences can be modeled using Bayes' principle. That is, the initial distribution at the beginning of the round can be regarded as the prior, and the distribution at the end of the round can be regarded as the posterior. The probabilities of all action sequences observed in the round can be regarded as under the prior. Conditional Probability. Since some actions of both parties in the previous round may cause the current game to end early and not enter the next round, the information set distribution of the later round is such that it can enter the later round at the end of the previous round. It is converted based on the posterior distribution of one round of action sequence observations. Therefore, we need to obtain the posterior distribution of the opponent's information set at the end of the previous round based on observations that can enter the action sequence of the next round.

由于一个轮次的行动序列是由网络攻防双方构成的,受网络攻防双方的影响,所以观测到某种行动序列的概率,是在网络攻防双方信息集的联合分布下/>的条件概率/>,其中下标P表示网络防御方,下标O表示网络攻击方,由于随机性因素,在一个轮次开始时,网络攻防双方的信息集分布可以看作是独立的,所以联合分布通过双方的信息集分布相乘得到。due to a turn sequence of actions It is composed of both network attackers and defenders, and is affected by both network attackers and defenders. Therefore, the probability of observing a certain action sequence is under the joint distribution of the information sets of both network attackers and defenders/> conditional probability/> , where the subscript P represents the network defender, and the subscript O represents the network attacker. Due to random factors, at the beginning of a round, the information set distribution of both network attackers and defenders can be regarded as independent, so the joint distribution is through the obtained by multiplying the information set distributions.

n个博弈局的观测中,可以观测到m种行动序列,其中部分行动序列(其集合为Acts)会使得前一轮次能进入到下一轮次,所以能进入到下一轮次的前一轮次结束时信息集后验分布为:In the observation of n game rounds, m kinds of action sequences can be observed. Some of the action sequences (the set of which is Acts) will enable the previous round to enter the next round, so those who can enter the next round The posterior distribution of the information set at the end of the previous round is:

(1) (1)

通过上述后验计算得到前一结束时能够进入下一轮次的双方信息集联合分布后,那么网络攻击方的信息集分布可以通过求边缘密度来获得:After the joint distribution of the information sets of both parties that can enter the next round at the end of the previous round is obtained through the above posterior calculation, the information set distribution of the network attacker can be obtained by finding the edge density:

(2) (2)

在获得前一轮次结束时的网络攻击方信息集分布后,可以通过轮次之间信息集的函数关系来得到后一轮次开始时的分布/>After obtaining the network attacker information set distribution at the end of the previous round, we can use the functional relationship between the information sets between rounds To get the distribution at the beginning of the next round/> .

基于上述分析可知,要从前一个轮次的网络攻击方的信息集初始分布,获得后一轮次的网络攻击方的信息集初始分布,重点是要获得前一轮次所有不同类的行动序列的条件概率,并在其基础上获得能进入下一轮次行动序列观测下的后验联合分布,最终得到网络攻击方的信息集的边缘分布。Based on the above analysis, it can be seen that the initial distribution of the network attacker's information set in the previous round must be obtained from the initial distribution of the network attacker's information set in the next round. The focus is to obtain the information set of all different types of action sequences in the previous round. Conditional Probability , and on this basis, the posterior joint distribution that can enter the next round of action sequence observation is obtained, and finally the marginal distribution of the network attacker's information set is obtained.

在一个实施例中,如图1所示,提供了一种面向攻防博弈的对手行动偏好估计方法,该方法包括以下步骤:In one embodiment, as shown in Figure 1, an opponent action preference estimation method for offensive and defensive games is provided. The method includes the following steps:

步骤100:统计前一轮次网络攻防过程中网络攻防双方的行动序列的类型和比例。Step 100: Count the types and proportions of action sequences of both network attackers and defenders in the previous round of network attack and defense.

具体的,网络攻防过程实质上是网络攻防双方参与的不完全信息多阶段(轮次)随机重复博弈的过程。Specifically, the network attack and defense process is essentially a process of multi-stage (round) random repeated games with incomplete information involving both network attack and defense parties.

在重建网络攻击方的策略模型过程中,使用决策点作为建模的基础,重建策略模型等价于重建决策点上的行动概率模型。决策点表示一类相似信息集所构成的相似决策场景,可以看做是同分布信息集的集合。决策点区分一阶决策点和二阶决策点。一阶决策点仅考虑决策点前的最后一个行动作为决策点的定义,因而不用考虑本阶段(也称为轮次)内双方行动的偏好的影响,而只需要考虑前一阶段双方行动偏好的影响,所以一阶决策点仅需考虑每个轮次开始时的对手的信息集分布。二阶决策点由于考虑决策点前的最后两个行动作为决策点的定义,则需考虑当前轮次内双方行动偏好的影响来确定其内的信息集分布。In the process of reconstructing the network attacker's strategy model, decision points are used as the basis for modeling. Reconstructing the strategy model is equivalent to reconstructing the action probability model at the decision point. Decision points represent similar decision-making scenarios composed of a type of similar information sets, which can be regarded as a collection of identically distributed information sets. Decision points distinguish between first-order decision points and second-order decision points. The first-order decision point only considers the last action before the decision point as the definition of the decision point. Therefore, it is not necessary to consider the influence of the action preferences of both parties in this stage (also called a round), but only needs to consider the influence of the action preferences of both parties in the previous stage. influence, so the first-order decision point only needs to consider the opponent's information set distribution at the beginning of each round. Since the second-order decision point considers the last two actions before the decision point as the definition of the decision point, it is necessary to consider the influence of the action preferences of both parties in the current round to determine the distribution of the information set within it.

根据网络攻防双方参与的不完全信息多阶段(轮次)随机重复博弈中影响对手决策的因素考虑,将网络攻防中最关键决策因素(通常是隐藏信息)是变量而其它决策因素是相等常量的信息集压缩为一个决策点,因而决策点内部的信息集的分布是由最关键决策因素所决定的。Based on the factors that affect the opponent's decision-making in a multi-stage (round) random repeated game with incomplete information involving both network attackers and defenders, the most critical decision-making factor (usually hidden information) in network attack and defense is a variable and other decision-making factors are equal constants. The information set is compressed into a decision point, so the distribution of the information set within the decision point is determined by the most critical decision factors.

步骤102:根据所有的双方行动序列的比例,对前一轮次内的行动序列的概率进行二维近似估计。Step 102: Based on the proportion of all action sequences of both parties, make a two-dimensional approximate estimate of the probability of the action sequence in the previous round.

步骤104:根据行动序列的概率和网络攻防双方信息集的联合分布,确定前一轮次结束时网络攻击方的信息集的后验分布。Step 104: Based on the probability of the action sequence and the joint distribution of the information sets of the network attacker and defender, determine the posterior distribution of the network attacker's information set at the end of the previous round.

步骤106:根据光滑处理后的网络攻击方的信息集的后验分布和轮次之间信息集的函数关系,得到后一轮次开始时网络攻击方的信息集的分布,并作为该轮次对手一阶决策点内的信息集分布。Step 106: According to the smoothed posterior distribution of the network attacker's information set and the functional relationship between the information sets between rounds, obtain the distribution of the network attacker's information set at the beginning of the next round and use it as the round Distribution of information sets within the opponent's first-order decision point.

步骤108:根据网络攻击方的信息集的分布,建立网络攻击方的对手策略模型,并采用网络攻击方的对手策略模型对网络攻击方行动进行推测,防御方根据网络攻击方行动采用最优的对抗策略进行防御。Step 108: Based on the distribution of the network attacker's information set, establish the network attacker's opponent strategy model, and use the network attacker's opponent strategy model to speculate on the network attacker's actions. The defender adopts the optimal strategy based on the network attacker's actions. Defend against adversarial strategies.

上述面向攻防博弈的对手行动偏好估计方法,所述方法包括:统计前一轮次网络攻防过程中网络攻防双方的行动序列的类型和比例;根据所有的双方行动序列的比例,对前一轮次内的行动序列的概率进行二维近似估计;根据行动序列的概率和网络攻防双方信息集的联合分布,确定前一轮次结束时网络攻击方的信息集的后验分布;根据光滑处理后的网络攻击方的信息集的后验分布和轮次之间信息集的函数关系,得到后一轮次开始时网络攻击方的信息集的分布,并作为该轮次对手一阶决策点内的信息集分布;根据网络攻击方的决策点内的信息集分布,建立网络攻击方的对手策略模型,并采用网络攻击方的对手策略模型对网络攻击方行动进行推测,防御方根据网络攻击方行动采用最优的对抗策略进行防御。该方法突破了现有决策点内信息集均匀分布假设的局限,能有效改善对手策略显式重建的准确性,提高了预测网络攻击方行动的准确性,进而提高我方采用防御策略的针对性,提高网络防御的能力和效果。The above-mentioned opponent action preference estimation method for offensive and defensive games, the method includes: counting the types and proportions of action sequences of both network attackers and defenders in the previous round of network attack and defense; based on the proportion of all action sequences of both parties, the previous round Two-dimensional approximate estimation of the probability of the action sequence within The functional relationship between the posterior distribution of the network attacker's information set and the information set between rounds is obtained, and the distribution of the network attacker's information set at the beginning of the next round is obtained, and is used as the information within the first-order decision point of the opponent in that round. Set distribution; based on the information set distribution within the network attacker's decision point, establish the network attacker's opponent strategy model, and use the network attacker's opponent strategy model to speculate on the network attacker's actions. The defender adopts the strategy based on the network attacker's actions. Optimal countermeasures for defense. This method breaks through the limitations of the existing assumption of uniform distribution of information sets within decision points, can effectively improve the accuracy of explicit reconstruction of the opponent's strategy, improves the accuracy of predicting the actions of the network attacker, and thereby improves the pertinence of our defense strategy. , improve the capabilities and effectiveness of network defense.

在其中一个实施例中,步骤100包括:统计网络攻防双方多局历史对抗过程中观察到的数据中前一轮次网络攻防双方的行动系列的类型和数量;根据所有行动序列的类型和数量,确定每类行动序列的比例为:In one embodiment, step 100 includes: counting the type and number of action series of the network attacker and defender in the previous round in the data observed during multiple historical confrontations between the network attacker and defender; based on the types and numbers of all action sequences, Determine the proportion of each type of action sequence as:

(3) (3)

为行动序列acts的数量占总行动序列数量的比例,/>为一类行动序列的数量;/>为总行动序列数量;/>为网络攻防双方信息集的联合分布下的行动序列条件概率;/>为网络攻防双方信息集的联合分布;分别为前一轮次网络攻防双方的信息集,下标P代表防御方,下标O代表网络攻击方即对手,下标pre表示前一轮次。 is the ratio of the number of action sequence acts to the total number of action sequences,/> is the number of action sequences of a type;/> is the total number of action sequences;/> It is the conditional probability of the action sequence under the joint distribution of the information sets of the network attacker and defender;/> It is the joint distribution of information sets between network attackers and defenders; They are the information sets of the previous round of network attack and defense respectively. The subscript P represents the defender, the subscript O represents the network attacker or opponent, and the subscript pre represents the previous round.

统计所有类型的行动序列,那么一类行动序列期望概率,就可以用该行动序列的数量占总行动序列数量的比例得到。By counting all types of action sequences, the expected probability of a type of action sequence can be obtained by using the ratio of the number of that action sequence to the total number of action sequences.

这一比例实际上也是条件概率的积分。由于式(3)中的左侧项已知,因此右侧项积分中的条件概率可以通过一定的假设将其近似的估计出来。This ratio is actually also a conditional probability of points. Since the left-hand term in equation (3) is known, the conditional probability in the integral of the right-hand term can be approximately estimated through certain assumptions.

在其中一个实施例中,步骤102包括:设置行动的一般决策逻辑和行动序列在对应的信息集二维区间内为均匀;根据所有的决策点行动序列的比例、行动的一般决策逻辑和行动序列在对应的信息集二维区间内的分布,估计行动序列的概率。In one embodiment, step 102 includes: setting the general decision-making logic and action sequence of the action to be uniform within the corresponding two-dimensional interval of the information set; according to the proportion of the action sequence of all decision points, the general decision-making logic and action sequence of the action The distribution within the corresponding two-dimensional interval of the information set estimates the probability of the action sequence.

具体的,由于双方决策时是根据自身的信息集来做决策,所以不同的行动选择往往代表了不同的自身信息集。考虑到如果双方的信息集的取值范围已知,且已知,若能根据一般的行动决策逻辑可对行动序列的条件概率做估计则能够获得整个信息集取值范围上的行动概率估计。Specifically, since both parties make decisions based on their own information sets, different action choices often represent different own information sets. Considering that if the value range of the information sets of both parties is known, and It is known that if the conditional probability of an action sequence can be estimated based on general action decision-making logic, the action probability estimate can be obtained over the value range of the entire information set.

在一般的行动决策逻辑中,网络攻防双方不同的行动根据信息集的赢率(即行动的成功率)大小来选择,当信息集赢率大时,则更容易采取更高风险的行动,反之则更容易低风险的行动。若考虑每一种行动序列都会对应双方特定的信息集,即用一个的区间/>内来的均匀分布(即在区间内行动序列的概率为1)来近似/>,那么在完整的/>取值范围上,就由一个个不同的行动序列所对应的区间所覆盖:In the general action decision-making logic, different actions of the network attacker and defender are selected based on the win rate of the information set (i.e., the success rate of the action). When the win rate of the information set is large, it is easier to take higher-risk actions, and vice versa. It is easier to take low-risk actions. If we consider that each action sequence will correspond to a specific information set of both parties, that is, use a interval/> Approximately/> , then in complete/> The value range is covered by the intervals corresponding to different action sequences:

(4) (4)

而各行动序列的出现概率等于是在对应于该行动序列的特定区间范围的积分:The occurrence probability of each action sequence is equal to the integral within the specific interval corresponding to the action sequence:

(5) (5)

考虑到式(5)中联合分布已知,条件概率/>也由每种行动序列的区间所确定,那么根据/>就可以得到该序列的二维区间的面积大小。因而只要根据所有序列中各行动的顺序,以每种行动对应的信息集的偏好对所有序列对应的区间做排列(即根据信息集赢率的大小做排列),就能获得信息集二维完整取值范围上的行动序列的概率。Considering the joint distribution in equation (5) Known, conditional probability/> It is also determined by the interval of each action sequence, then according to/> You can get the area size of the two-dimensional interval of the sequence. Therefore, as long as the intervals corresponding to all sequences are arranged according to the order of actions in all sequences and the preferences of the information set corresponding to each action (that is, arranged according to the win rate of the information set), the two-dimensional completeness of the information set can be obtained The probability of a sequence of actions over a range of values.

因此当所有的行动序列划分完毕后,就能够获得在网络防御方和攻击方的信息集完整取值范围上的所有行动序列的条件概率Therefore, when all action sequences are divided, the conditional probabilities of all action sequences within the complete value range of the network defender's and attacker's information sets can be obtained .

在其中一个实施例中,步骤104包括:根据所述行动序列的概率和网络攻防双方信息集的联合分布,确定前一轮次结束时网络攻击方的信息集的后验分布,包括:根据信息集二维空间上行动序列的近似概率,以及能进入下一轮次的行动序列集合Acts,求得前一轮次结束时的双方信息集的后验联合分布:In one embodiment, step 104 includes: determining the posterior distribution of the information set of the network attacker at the end of the previous round based on the probability of the action sequence and the joint distribution of the information sets of both network attackers and defenders, including: according to the information Approximate probabilities of action sequences in set two-dimensional space , and the set of action sequences Acts that can enter the next round, and obtain the posterior joint distribution of the information sets of both parties at the end of the previous round:

(6) (6)

其中,为前一轮次结束时的双方信息集的后验联合分布,/>为前一轮次内网络攻防双方信息集的先验联合分布,为信息集二维空间上行动序列的近似概率,/>为下一轮次的行动序列集合,/>为每个行动序列;in, is the posterior joint distribution of the information sets of both parties at the end of the previous round,/> is the prior joint distribution of the information sets of both network attackers and defenders in the previous round, is the approximate probability of the action sequence in the two-dimensional space of the information set,/> Set the action sequence for the next round,/> for each sequence of actions;

网络攻击方的信息集后验分布则根据联合后验分布的边缘分布得到:The posterior distribution of the information set of the network attacker is obtained based on the marginal distribution of the joint posterior distribution:

(7) (7)

其中,为前一轮次结束时网络攻击方的信息集分布。in, It is the information set distribution of the network attacker at the end of the previous round.

在其中一个实施例中,步骤106包括:将网络攻击方的信息集的后验分布采用预设核函数进行光滑处理,得到光滑的非线性密度函数为:In one embodiment, step 106 includes: smoothing the posterior distribution of the network attacker's information set using a preset kernel function to obtain a smooth nonlinear density function:

(8) (8)

其中,为前一轮次光滑的非线性密度函数,/>为预设核函数,下标j为对信息集均匀离散化后的索引,/>为前一轮次网络攻击方的信息集的后验分布,/>为前一轮次网络攻击方信息集。in, is the smooth nonlinear density function of the previous round,/> is the preset kernel function, and the subscript j is the index after uniform discretization of the information set,/> is the posterior distribution of the information set of the network attacker in the previous round,/> It is the information set of the network attacker in the previous round.

具体来说,由于得到行动序列的二维近似假设是区间内均匀分布,所以得到的对手O的信息集的后验分布是分段线性函数,所以通过一个光滑处理使得该分布的密度函数变成光滑的非线性密度函数,如式(8)所示。Specifically, since the two-dimensional approximation of the action sequence is assumed to be uniformly distributed within the interval, the posterior distribution of the information set of the opponent O is a piecewise linear function, so through a smoothing process, the density function of the distribution becomes Smooth nonlinear density function, as shown in equation (8).

而后可以通过轮次之间信息集之间的函数关系来得到后一轮次开始时的分布/>,并作为该轮次对手一阶决策点内的信息集分布:Then we can use the functional relationship between the information sets between rounds To get the distribution at the beginning of the next round/> , and serve as the information set distribution within the first-order decision point of the opponent in this round:

(9) (9)

其中,为后一轮次开始时网络攻击方的信息集的分布,为前一轮次和后一轮次之间信息集的函数关系,/>为后一轮次开始时网络攻击方的信息集。in, is the distribution of the network attacker’s information set at the beginning of the next round, is the functional relationship between the information set between the previous round and the next round,/> It is the information set of the network attacker at the beginning of the next round.

在其中一个实施例中,如图2所示,所述方法还包括一个轮次内二阶决策点的对手行动偏好估计步骤,具体包括如下步骤:In one of the embodiments, as shown in Figure 2, the method also includes a step of estimating the opponent's action preference at the second-order decision point within a round, which specifically includes the following steps:

步骤200:统计二阶决策点前网络攻防过程中所有行动序列的类型和比例。Step 200: Count the types and proportions of all action sequences in the network attack and defense process before the second-order decision point.

具体的,二阶决策点将当前决策点前的两个行动作为决策点的定义,因而需要考虑当前阶段内决策点前双方行动偏好的影响。Specifically, the second-order decision point uses the two actions before the current decision point as the definition of the decision point, so it is necessary to consider the influence of the action preferences of both parties before the decision point in the current stage.

步骤202:根据当前轮次内二阶决策点前所有的行动序列的比例,对网络攻防双方行动序列的概率进行二维近似估计。Step 202: Based on the proportion of all action sequences before the second-order decision point in the current round, make a two-dimensional approximate estimate of the probability of the action sequences of both the network attacker and defender.

步骤204:根据网络攻防双方行动序列的概率和网络攻防双方信息集的联合先验分布,确定受前面行动偏好影响的二阶决策点内的双方信息集联合后验分布,并根据联合后验分布的边缘分布确定网络攻击方二阶决策点内的信息集分布。Step 204: Based on the probability of the action sequence of both network attackers and defenders and the joint prior distribution of the information sets of both network attackers and defenders, determine the joint posterior distribution of the information sets of both parties within the second-order decision point affected by the previous action preference, and based on the joint posterior distribution The marginal distribution of determines the information set distribution within the second-order decision point of the network attacker.

在其中一个实施例中,步骤202包括:根据双方行动的一般决策逻辑,设置行动序列在对应的信息集二维区间内为均匀分布,根据行动序列的比例,确定所有行动序列对应的二维信息集区间。In one embodiment, step 202 includes: according to the general decision-making logic of the actions of both parties, setting the action sequences to be uniformly distributed within the two-dimensional interval of the corresponding information set, and determining the two-dimensional information corresponding to all action sequences according to the proportion of the action sequences. Set interval.

在其中一个实施例中,步骤204包括:根据能进入二阶决策点的行动序列,确定二阶决策点内的网络攻防双方信息集的联合后验分布:In one embodiment, step 204 includes: based on the sequence of actions that can enter the second-order decision point , determine the joint posterior distribution of the information sets of both network attackers and defenders within the second-order decision point:

(10) (10)

其中,为二阶决策点内的网络攻防双方信息集的联合后验分布,/>为二阶决策点前的能进入二阶决策点的网络攻防双方行动序列的条件分布,/>为当前轮次开始时的双方信息集联合先验分布,/>为行动序列/>的全概率,等于/>in, is the joint posterior distribution of the information sets of both network attackers and defenders within the second-order decision point,/> is the conditional distribution of the action sequences of both network attackers and defenders before the second-order decision point that can enter the second-order decision point,/> is the joint prior distribution of the information sets of both parties at the beginning of the current round,/> For action sequences/> The total probability of is equal to/> ;

根据联合后验分布的边缘分布确定网络攻击方二阶决策点内的信息集分布:Determine the information set distribution within the second-order decision point of the network attacker based on the marginal distribution of the joint posterior distribution:

(11) (11)

其中,为网络攻击方二阶决策点内的信息集分布。in, is the information set distribution within the second-order decision point of the network attacker.

二阶决策点的网络攻击方的信息集分布估计,与轮次转换时对前一轮次的双方行动的偏好估计类似,但不用考虑整个阶段的双方行动序列,只要考虑二阶决策点前的双方行动序列,而且不需要利用轮次之间信息集的函数关系对信息集进行变量变换。The information set distribution estimation of the network attacker at the second-order decision point is similar to the preference estimate for the actions of both parties in the previous round when the round is switched, but it does not need to consider the action sequence of both parties in the entire stage, as long as the actions before the second-order decision point are considered The action sequence of both parties, and there is no need to use the functional relationship of the information set between rounds to perform variable transformation on the information set.

应该理解的是,虽然图1、图2的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图1、图2中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the steps in the flowcharts of Figures 1 and 2 are shown in sequence as indicated by arrows, these steps are not necessarily executed in the order indicated by arrows. Unless explicitly stated in this article, there is no strict order restriction on the execution of these steps, and these steps can be executed in other orders. Moreover, at least some of the steps in Figures 1 and 2 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but may be executed at different times. These sub-steps or The execution order of the stages is not necessarily sequential, but may be performed in turn or alternately with other steps or sub-steps of other steps or at least part of the stages.

在一个实施例中,如图3所示,提供了一种面向攻防博弈的对手行动偏好估计装置,包括:网络攻防双方的行动序列的比例确定模块301、网络攻防双方的行动序列的概率估计模块302、网络攻击方的当前轮次信息集的后验分布确定模块303和网络攻击方的后一轮次信息集分布确定模块304、对手行动推测模块305,其中:In one embodiment, as shown in Figure 3, an opponent action preference estimation device for offensive and defensive games is provided, including: a ratio determination module 301 for the action sequences of both network attackers and defenders, and a probability estimation module for the action sequences of both network attackers and defenders. 302. The posterior distribution determination module 303 of the current round information set of the network attacker, the distribution determination module 304 of the network attacker's next round information set, and the opponent action estimation module 305, where:

网络攻防双方的行动序列的比例确定模块301,用于统计前一轮次网络攻防过程中网络攻防双方的行动序列的类型和比例。The proportion determination module 301 of the action sequences of both network attackers and defenders is used to count the types and proportions of the action sequences of both network attackers and defenders in the previous round of network attack and defense.

网络攻防双方的行动序列的概率估计模块302,用于根据所有双方行动序列的比例,对前一轮次内的行动序列的概率进行二维近似估计。The probability estimation module 302 of the action sequences of both sides of the network is used to make a two-dimensional approximate estimate of the probability of the action sequence in the previous round based on the proportion of all action sequences of both parties.

网络攻击方的当前轮次信息集的后验分布确定模块303,用于根据所述行动序列的概率和网络攻防双方信息集的联合分布,确定前一轮次结束时网络攻击方的信息集的后验分布。The posterior distribution determination module 303 of the current round information set of the network attacker is used to determine the posterior distribution of the network attacker's information set at the end of the previous round based on the probability of the action sequence and the joint distribution of the information sets of both network attackers and defenders. posterior distribution.

网络攻击方的后一轮次信息集分布确定模块304,用于根据光滑处理后的所述网络攻击方的信息集的后验分布和轮次之间信息集的函数关系,得到后一轮次开始时网络攻击方的信息集的分布,并作为该轮次对手一阶决策点内的信息集分布。The next round information set distribution determination module 304 of the network attacker is used to obtain the next round based on the smoothed posterior distribution of the network attacker's information set and the functional relationship between the information sets between rounds. The distribution of the information set of the network attacker at the beginning is used as the distribution of the information set within the first-order decision point of the opponent in this round.

对手行动推测模块305,用于根据网络攻击方的决策点内的信息集分布,建立网络攻击方的对手策略模型,并采用网络攻击方的对手策略模型对网络攻击方行动进行推测,防御方根据网络攻击方行动采用最优的对抗策略进行防御。The adversary action prediction module 305 is used to establish the adversary strategy model of the network attacker based on the information set distribution within the decision point of the network attacker, and use the adversary strategy model of the network attacker to speculate on the actions of the network attacker. Network attackers use optimal countermeasures to defend themselves.

在其中一个实施例中,网络攻防双方的行动序列的比例确定模块301,还用于统计网络攻防双方多局历史对抗过程中观察到的数据中前一轮次网络攻防双方的行动系列的类型和数量;根据所有行动序列的类型和数量,确定每类行动序列的比例如式(3)所示。In one embodiment, the ratio determination module 301 of the action sequences of the network attacker and defender is also used to count the types and types of the action series of the network attacker and defender in the previous round in the data observed during the multi-game historical confrontation between the network attacker and defender. Quantity; based on the types and quantities of all action sequences, determine the proportion of each type of action sequence as shown in Equation (3).

在其中一个实施例中,网络攻防双方的行动序列的概率估计模块302,还用于设置行动的一般决策逻辑和行动序列在对应的信息集二维区间内为均匀;根据所有的双方行动序列的比例、行动的一般决策逻辑和行动序列在对应的信息集二维区间内的分布,估计行动序列的概率。In one embodiment, the probability estimation module 302 of the action sequences of both sides of the network is also used to set the general decision-making logic of the action and the action sequence to be uniform within the two-dimensional interval of the corresponding information set; according to all action sequences of both parties The probability of the action sequence is estimated based on the proportion, the general decision logic of the action and the distribution of the action sequence within the corresponding two-dimensional interval of the information set.

根据双方行动的一般决策逻辑,设置行动序列在对应的信息集二维区间内的近似概率为:According to the general decision-making logic of the actions of both parties, the approximate probability of setting the action sequence within the two-dimensional interval of the corresponding information set is:

(12) (12)

其中,为信息集二维空间上行动序列的近似概率,/>、/>分别为网络攻防双方的信息集;/>、/>分别为网络攻防双方行动序列acts所对应的信息集二维区间范围,每个行动序列acts的概率应满足:in, is the approximate probability of the action sequence in the two-dimensional space of the information set,/> ,/> They are the information sets of the network attacker and defender respectively;/> ,/> They are the two-dimensional interval range of the information set corresponding to the action sequences of the network attacker and defender respectively. The probability of each action sequence of acts should satisfy:

(13) (13)

其中,为行动序列的比例,/>为前一轮次内网络攻防双方信息集的先验联合分布;in, is the proportion of the action sequence,/> It is the prior joint distribution of the information sets of the network attacker and defender in the previous round;

根据所有的双方行动序列的比例、行动的一般决策逻辑,由两式迭代求得网络攻防双方的每个行动序列acts所对应的信息集二维区间范围,进而获得完整信息集二维区间上的行动序列的近似概率。According to the proportion of all action sequences of both parties and the general decision-making logic of the actions, the two-dimensional interval range of the information set corresponding to each action sequence act of both sides of the network is iteratively obtained by two equations, and then the two-dimensional interval range of the complete information set on the two-dimensional interval is obtained Approximate probabilities for action sequences.

在其中一个实施例中,网络攻击方的当前轮次信息集的后验分布确定模块303,还用于对于前一轮次结束时网络攻击方的信息集的后验分布,根据信息集二维空间上行动序列的近似概率,以及能进入下一轮次的行动序列集合Acts,求得前一轮次结束时的双方信息集的后验联合分布如式(6)所示。网络攻击方的信息集后验分布则根据联合后验分布的边缘分布得到,网络攻击方的信息集后验分布如式(7)所示。In one of the embodiments, the posterior distribution determination module 303 of the current round information set of the network attacker is also used to determine the posterior distribution of the network attacker's information set at the end of the previous round according to the two-dimensional information set. Approximate probabilities of action sequences in space , and the action sequence set Acts that can enter the next round. The posterior joint distribution of the information sets of both parties at the end of the previous round is obtained as shown in Equation (6). The posterior distribution of the network attacker's information set is obtained based on the marginal distribution of the joint posterior distribution. The posterior distribution of the network attacker's information set is shown in Equation (7).

在其中一个实施例中,网络攻击方的后一轮次信息集分布确定模块304,还用于将所述网络攻击方的信息集的后验分布采用预设核函数进行光滑处理,得到光滑的非线性密度函数如式(8)所示。根据光滑处理后的所述网络攻击方的信息集的后验分布和轮次之间信息集的函数关系,得到后一轮次开始时网络攻击方的信息集的分布,并作为该轮次对手一阶决策点内的信息集分布,如式(9)所示。In one embodiment, the network attacker's information set distribution determination module 304 is also used to smooth the posterior distribution of the network attacker's information set using a preset kernel function to obtain a smooth The nonlinear density function is shown in equation (8). According to the smoothed posterior distribution of the network attacker's information set and the functional relationship between the information sets between rounds, the distribution of the network attacker's information set at the beginning of the next round is obtained, and is used as the opponent in that round The information set distribution within the first-order decision point is shown in Equation (9).

在其中一个实施例中,所述装置还包括一个轮次内二阶决策点的对手行动偏好估计模块,用于统计二阶决策点前网络攻防过程中所有行动序列的类型和比例;根据当前轮次内二阶决策点前所有的行动序列的比例,对网络攻防双方行动序列的概率进行二维近似估计;根据所述网络攻防双方行动序列的概率和网络攻防双方信息集的联合先验分布,确定受前面行动偏好影响的二阶决策点内的双方信息集联合后验分布,并根据联合后验分布的边缘分布确定网络攻击方二阶决策点内的信息集分布。In one embodiment, the device further includes an opponent action preference estimation module at a second-order decision point within a round, which is used to count the types and proportions of all action sequences in the network attack and defense process before the second-order decision point; according to the current round The proportion of all action sequences before the intra-second order decision point is used to make a two-dimensional approximate estimate of the probability of the action sequences of both network attackers and defenders; based on the probability of the action sequences of both network attackers and defenders and the joint prior distribution of the information sets of both network attackers and defenders, Determine the joint posterior distribution of the information sets of both parties within the second-order decision point affected by the previous action preference, and determine the distribution of the information set within the second-order decision point of the network attacker based on the marginal distribution of the joint posterior distribution.

在其中一个实施例中,一个轮次内二阶决策点的对手行动偏好估计模块,还用于根据双方行动的一般决策逻辑,设置行动序列在对应的信息集二维区间内为均匀,根据行动序列的比例,确定所有行动序列对应的二维信息集区间In one of the embodiments, the opponent action preference estimation module of the second-order decision point within a round is also used to set the action sequence to be uniform within the two-dimensional interval of the corresponding information set based on the general decision-making logic of the actions of both parties. According to the action The proportion of the sequence determines the two-dimensional information set interval corresponding to all action sequences. .

在其中一个实施例中,一个轮次内二阶决策点的对手行动偏好估计模块,还用于根据能进入二阶决策点的行动序列,确定二阶决策点内的网络攻防双方信息集的联合后验分布如式(10)所示。根据联合后验分布的边缘分布确定二阶决策点内网络攻击方的信息集分布如式(11)所示。In one embodiment, the opponent's action preference estimation module for the second-order decision point within a round is also used to determine the action sequence that can enter the second-order decision point. , determine the joint posterior distribution of the information sets of both network attackers and defenders within the second-order decision point, as shown in Equation (10). According to the marginal distribution of the joint posterior distribution, the information set distribution of the network attacker within the second-order decision point is determined as shown in Equation (11).

关于面向攻防博弈的对手行动偏好估计装置的具体限定可以参见上文中对于面向攻防博弈的对手行动偏好估计方法的限定,在此不再赘述。上述面向攻防博弈的对手行动偏好估计装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。Regarding the specific limitations of the opponent action preference estimation device for offensive and defensive games, please refer to the limitations of the opponent action preference estimation method for offensive and defensive games mentioned above, which will not be described again here. Each module in the above-mentioned opponent action preference estimation device for offensive and defensive games can be implemented in whole or in part by software, hardware, and combinations thereof. Each of the above modules may be embedded in or independent of the processor of the computer device in the form of hardware, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

在一个实施例中,提供了一种电子设备,该电子设备可以是终端,其内部结构图可以如图4所示。该电子设备包括通过系统总线连接的处理器401、存储器402、网络接口403、显示屏404和输入装置405。其中,该电子设备的处理器用于提供计算和控制能力。该电子设备的存储器402包括非易失性存储介质4022、内存储器4021。该非易失性存储介质4022存储有操作系统和计算机程序。该内存储器4021为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该电子设备的网络接口403用于与外部的终端通过网络连接通信。该计算机程序被处理器401执行时以实现一种面向攻防博弈的对手行动偏好估计方法。该电子设备的显示屏404可以是液晶显示屏或者电子墨水显示屏,该电子设备的输入装置405可以是显示屏404上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。In one embodiment, an electronic device is provided. The electronic device may be a terminal, and its internal structure diagram may be as shown in FIG. 4 . The electronic device includes a processor 401, a memory 402, a network interface 403, a display screen 404 and an input device 405 connected through a system bus. Among them, the processor of the electronic device is used to provide computing and control capabilities. The memory 402 of the electronic device includes a non-volatile storage medium 4022 and an internal memory 4021. The non-volatile storage medium 4022 stores operating systems and computer programs. The internal memory 4021 provides an environment for the execution of operating systems and computer programs in non-volatile storage media. The network interface 403 of the electronic device is used to communicate with an external terminal through a network connection. When the computer program is executed by the processor 401, it implements an opponent's action preference estimation method for offensive and defensive games. The display screen 404 of the electronic device may be a liquid crystal display or an electronic ink display, and the input device 405 of the electronic device may be a touch layer covered on the display screen 404, or may be a button, trackball, or The touchpad can also be an external keyboard, trackpad or mouse.

本领域技术人员可以理解,图4中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in Figure 4 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Specific computer equipment can May include more or fewer parts than shown, or combine certain parts, or have a different arrangement of parts.

在一个实施例中,提供了一种电子设备,包括存储器和处理器,该存储器存储有计算机程序,该处理器执行计算机程序时实现上述任意方法实施例中的步骤。In one embodiment, an electronic device is provided, including a memory and a processor. The memory stores a computer program. When the processor executes the computer program, the steps in any of the above method embodiments are implemented.

以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined in any way. To simplify the description, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, all possible combinations should be used. It is considered to be within the scope of this manual.

以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-described embodiments only express several implementation modes of the present application, and their descriptions are relatively specific and detailed, but they should not be construed as limiting the scope of the invention patent. It should be noted that, for those of ordinary skill in the art, several modifications and improvements can be made without departing from the concept of the present application, and these all fall within the protection scope of the present application. Therefore, the protection scope of this patent application should be determined by the appended claims.

Claims (10)

1. An opponent action preference estimation method facing attack and defense games is characterized by comprising the following steps:
counting the types and the proportions of action sequences of both network attack and defense parties in the previous round of network attack and defense process;
according to the proportion of the action sequences of all the two parties, carrying out two-dimensional approximate estimation on the probability of the action sequence in the previous round;
determining posterior distribution of the information set of the network attack party at the end of the previous round according to the probability of the action sequence and the joint distribution of the information sets of the network attack and defense parties;
obtaining the distribution of the information set of the network attacker at the beginning of the next round according to the posterior distribution of the information set of the network attacker after the smoothing and the functional relation of the information set between rounds, and taking the distribution as the information set distribution in the first-order decision point of the opponent of the round;
according to information set distribution in decision points of the network attacker, an opponent strategy model of the network attacker is established, the opponent strategy model of the network attacker is adopted to infer the action of the network attacker, and the defender defends by adopting an optimal countermeasure strategy according to the action of the network attacker.
2. The method of claim 1, wherein counting the type and proportion of the sequence of actions of both network attacks and defenses in the previous round of network attack and defenses comprises:
Counting the types and the quantity of action sequences of the network attack and defense parties in the previous round in the data observed in the multi-office historical countermeasure process of the network attack and defense parties;
according to the types and the quantity of all the action sequences, determining the proportion of each type of action sequences as follows:
wherein,for the ratio of the number of action sequences acts to the total number of action sequences, +.>Number of act sequences for a class; />Is the total number of moving sequences; />The method comprises the steps of providing action sequence conditional probability under the joint distribution of information sets of both the attack and the defense of the network; />The method is the joint distribution of information sets of both the network attack and defense parties; />Respectively information sets of the previous round of network attack and defense parties, subscriptsPRepresenting the defending party, subscriptORepresenting network aggressors, i.e. adversaries, subscriptspreRepresenting the previous round.
3. The method of claim 1, wherein estimating the probability of the action sequence in the previous round in two-dimensional approximation based on the proportions of all the two-party action sequences comprises:
according to general decision logic of actions of both sides, setting approximate probability of the action sequence in the corresponding two-dimensional interval of the information set as follows:
wherein,approximately probability of action sequence in two-dimensional space for information set,/->、/>Respectively information sets of both network attack and defense parties; / >Action sequences of both network attack and defense parties respectivelyactsCorresponding two-dimensional interval range of information set, each action sequenceactsThe probability of (2) should satisfy:
wherein,for the proportion of action sequences, +.>The prior joint distribution of the information sets of the two attack and defense parties of the previous round of internal network is realized;
according to the proportion of all the action sequences of both parties and the general decision logic of actions, each action sequence of both network attack and defense parties is obtained by two-mode iterationactsThe corresponding two-dimensional interval range of the information set further obtains the approximate probability of the action sequence on the two-dimensional interval of the complete information set.
4. The method of claim 1, wherein determining the posterior distribution of the information set of the network attacker at the end of the previous round based on the probability of the sequence of actions and the joint distribution of the information set of both the network attacks and the defenders, comprises:
according to the approximate probability of the action sequence on the two-dimensional space of the information set and the action sequence set which can enter the next round, obtaining posterior joint distribution of the information sets of the two parties at the end of the previous round:
wherein (1)>For posterior joint distribution of both sets of information at the end of the previous round,for the prior joint distribution of the information sets of the two parties of the network attack and defense in the previous round, the method comprises the step of +. >Respectively information sets of the previous round of network attack and defense parties, subscriptsPRepresenting the defending party, subscriptORepresenting network aggressors, i.e. adversaries, subscriptspreRepresenting the previous round;
the posterior distribution of the information set of the network attacker is obtained according to the edge distribution of the joint posterior distribution:
wherein,and distributing information sets of network aggressors at the end of the previous round.
5. The method according to claim 1, wherein obtaining the distribution of the information set of the network attacker at the beginning of the next round as the information set distribution in the first-order decision point of the opponent of the round according to the posterior distribution of the information set of the network attacker after smoothing and the functional relation of the information set between rounds includes:
the posterior distribution of the information set of the network attacker is smoothed by adopting a preset kernel function, and a smooth nonlinear density function is obtained as follows:
wherein,for a nonlinear density function of the previous round of smoothing, +.>For a preset kernel function, subscriptsjFor the index after uniform discretization of the information set, < >>Posterior distribution of information set for previous round network aggressor, ++>The information set is the information set of the network attacker of the previous round;
according to the posterior distribution of the information set of the network attacker after the smoothing treatment and the functional relation of the information set between turns, the distribution of the information set of the network attacker at the beginning of the next turn is obtained and is used as the information set distribution in the first-order decision point of the opponent of the turn:
Wherein,for the distribution of the information set of the network aggressor at the beginning of the next round +.>For the functional relation of the information set between the previous round and the following round,/for the information set between the previous round and the following round>Is the information set of the network attacker at the beginning of the next round.
6. The method of claim 1, further comprising an opponent action preference estimation within a second order decision point within the current round, the steps comprising:
counting the types and proportions of all action sequences in the network attack and defense process before the second-order decision point;
according to the proportion of all the action sequences of the two parties before the second-order decision point in the current round, carrying out two-dimensional approximate estimation on the probability of the action sequences of the network attack and defense parties;
and determining the joint posterior distribution of the information sets of the two parties in the second-order decision point influenced by the previous action preference according to the probability of the action sequences of the two parties and the joint prior distribution of the information sets of the two parties, and determining the information set distribution in the second-order decision point of the network attacker according to the edge distribution of the joint posterior distribution.
7. The method of claim 6, wherein performing a two-dimensional approximate estimation of the probability of the network attack and defense party action sequences based on the proportion of all action sequences preceding the second-order decision point in the current round comprises:
According to the general decision logic of the actions of the two parties, the action sequences are set to be uniformly distributed in the two-dimensional intervals of the corresponding information sets, and according to the proportion of the action sequences, the two-dimensional information set intervals corresponding to all the action sequences are determined.
8. The method of claim 6, wherein determining a joint posterior distribution of the two party information sets within the second order decision points affected by the previous action preference based on the probability of the network attack and defense party action sequence and the joint prior distribution of the network attack and defense party information sets, and determining an information set distribution within the second order decision points of the network attack party based on an edge distribution of the joint posterior distribution, comprises:
based on a sequence of actions that can enter a second order decision pointDetermining joint posterior distribution of information sets of both the network attack and defense parties in a second-order decision point:
wherein,for the joint posterior distribution of information sets of both the network attack and defense parties in the second-order decision point, the ++>、/>Information sets of both network attack and defense parties respectively, < ->Is the conditional distribution of the action sequences of the network attack and defense parties which can enter the second-order decision point before the second-order decision point, +.>Combine a priori distribution for both party information sets at the beginning of the current round,/->For action sequence->Is equal to the full probability of
Determining information set distribution in second-order decision points of network attack parties according to edge distribution of joint posterior distribution:
wherein,is the information set distribution in the second order decision point of the network attacker.
9. An opponent action preference estimation device facing an attack and defense game, which is characterized by comprising:
the proportion determining module of the action sequences of the network attack and defense parties is used for counting the types and proportions of the action sequences of the network attack and defense parties in the previous round of network attack and defense process;
the probability estimation module of the action sequences of the network attack and defense parties is used for carrying out two-dimensional approximate estimation on the probability of the action sequences according to the proportion of all the action sequences of the previous round or all the action sequences before the second-order decision point of the current round;
the posterior distribution determining module of the current round information set of the network attacker is used for determining the posterior distribution of the information set of the network attacker on the current round second-order decision point or the posterior distribution of the information set of the network attacker at the end of the previous round through edge distribution solving according to the probability of the action sequence and the joint posterior distribution of the information sets of the network attack and defense parties;
the next round of information set distribution determining module of the network attacker is used for obtaining the distribution of the information set of the network attacker at the beginning of the next round according to the posterior distribution of the information set at the end of the previous round of the network attacker after the smoothing and the functional relation of the information sets between the rounds, and taking the distribution as the information set distribution in the first-order decision point of the opponent of the round;
And the opponent action presumption module is used for establishing an opponent strategy model of the network attacker according to the distribution of the information sets in the decision points of the network attacker, presuming the action of the network attacker by adopting the opponent strategy model of the network attacker, and defending by adopting an optimal countermeasure strategy according to the action of the network attacker.
10. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the method of any one of claims 1 to 8 when executing the computer program.
CN202311123325.4A 2023-09-01 2023-09-01 Opponent action preference estimation method, device and electronic equipment for offensive and defensive games Active CN116886443B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311123325.4A CN116886443B (en) 2023-09-01 2023-09-01 Opponent action preference estimation method, device and electronic equipment for offensive and defensive games

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311123325.4A CN116886443B (en) 2023-09-01 2023-09-01 Opponent action preference estimation method, device and electronic equipment for offensive and defensive games

Publications (2)

Publication Number Publication Date
CN116886443A CN116886443A (en) 2023-10-13
CN116886443B true CN116886443B (en) 2023-11-10

Family

ID=88257136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311123325.4A Active CN116886443B (en) 2023-09-01 2023-09-01 Opponent action preference estimation method, device and electronic equipment for offensive and defensive games

Country Status (1)

Country Link
CN (1) CN116886443B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2271047A1 (en) * 2009-06-22 2011-01-05 Deutsche Telekom AG Game theoretic recommendation system and method for security alert dissemination
CN108322478A (en) * 2018-03-05 2018-07-24 西安邮电大学 A kind of website defence policies choosing method based on attacking and defending game
CN108833401A (en) * 2018-06-11 2018-11-16 中国人民解放军战略支援部队信息工程大学 Method and device for network active defense strategy selection based on Bayesian evolutionary game
US10320813B1 (en) * 2015-04-30 2019-06-11 Amazon Technologies, Inc. Threat detection and mitigation in a virtualized computing environment
CN114024738A (en) * 2021-11-03 2022-02-08 哈尔滨理工大学 Network defense method based on multi-stage attack and defense signals
CN115282604A (en) * 2022-07-25 2022-11-04 中国人民解放军国防科技大学 On-line explicit reconstruction method and device for adversary strategy of incomplete information repeated game
CN115293352A (en) * 2022-07-25 2022-11-04 中国人民解放军国防科技大学 Adversary hidden information estimation method and device for incomplete information online game
CN115348064A (en) * 2022-07-28 2022-11-15 南京邮电大学 Design method of distribution network defense strategy based on dynamic game under network attack
CN116167723A (en) * 2023-03-03 2023-05-26 中国人民解放军国防科技大学 Multi-party-camp game weapon equipment development planning strategy selection method and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10805343B2 (en) * 2018-10-22 2020-10-13 Booz Allen Hamilton Inc. Network security using artificial intelligence and high speed computing
US11632386B2 (en) * 2019-07-19 2023-04-18 Rochester Institute Of Technology Cyberattack forecasting using predictive information
WO2020098823A2 (en) * 2019-12-12 2020-05-22 Alipay (Hangzhou) Information Technology Co., Ltd. Determining action selection policies of an execution device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2271047A1 (en) * 2009-06-22 2011-01-05 Deutsche Telekom AG Game theoretic recommendation system and method for security alert dissemination
US10320813B1 (en) * 2015-04-30 2019-06-11 Amazon Technologies, Inc. Threat detection and mitigation in a virtualized computing environment
CN108322478A (en) * 2018-03-05 2018-07-24 西安邮电大学 A kind of website defence policies choosing method based on attacking and defending game
CN108833401A (en) * 2018-06-11 2018-11-16 中国人民解放军战略支援部队信息工程大学 Method and device for network active defense strategy selection based on Bayesian evolutionary game
CN114024738A (en) * 2021-11-03 2022-02-08 哈尔滨理工大学 Network defense method based on multi-stage attack and defense signals
CN115282604A (en) * 2022-07-25 2022-11-04 中国人民解放军国防科技大学 On-line explicit reconstruction method and device for adversary strategy of incomplete information repeated game
CN115293352A (en) * 2022-07-25 2022-11-04 中国人民解放军国防科技大学 Adversary hidden information estimation method and device for incomplete information online game
CN115348064A (en) * 2022-07-28 2022-11-15 南京邮电大学 Design method of distribution network defense strategy based on dynamic game under network attack
CN116167723A (en) * 2023-03-03 2023-05-26 中国人民解放军国防科技大学 Multi-party-camp game weapon equipment development planning strategy selection method and system

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
Adversarial Hiding Deception Strategy and Network Optimization Method for Heterogeneous Network Defense;Wang Chen等;《Electronics 2021》;全文 *
Opponent-Restricted Response Solving on Texas Hold’em Poker;Yuan Weilin;《2021 China Automation Congress (CAC)》;全文 *
基于不完全信息动态博弈的网络攻防态势感知;王纯子;黄光球;;计算机工程(20);全文 *
基于不完全信息动态博弈的网络攻防态势感知;王纯子等;《计算机工程》;第36卷(第20期);全文 *
基于模糊静态贝叶斯博弈的网络主动防御策略选取;陈永强;吴晓平;付钰;罗晓东;;计算机应用研究(03);全文 *
智能博弈对抗中的对手建模方法及其应用综述;魏婷婷等;《计算机工程与应用》;第58卷(第9期);全文 *

Also Published As

Publication number Publication date
CN116886443A (en) 2023-10-13

Similar Documents

Publication Publication Date Title
Hou et al. SquirRL: Automating attack analysis on blockchain incentive mechanisms with deep reinforcement learning
CN107135224B (en) Network defense strategy selection method and device based on Markov evolution game
Wu et al. Adversarial policy training against deep reinforcement learning
CN108833401A (en) Method and device for network active defense strategy selection based on Bayesian evolutionary game
CN110099045B (en) Network security threat early warning method and device based on qualitative differential gaming and evolutionary gaming
Zhang et al. A game-theoretic method for defending against advanced persistent threats in cyber systems
CN110417733B (en) Attack prediction method, device and system based on QBD attack and defense random evolution game model
CN111245828A (en) A Defensive Strategy Generation Method Based on Three-Party Dynamic Game
CN112651110B (en) Malignant data injection attack defense method based on multi-stage dynamic game
Wu et al. A privacy-preserving game model for local differential privacy by using information-theoretic approach
CN113360917A (en) Deep reinforcement learning model security reinforcement method and device based on differential privacy
Lu et al. Stochastic graphical bandits with adversarial corruptions
CN116886443B (en) Opponent action preference estimation method, device and electronic equipment for offensive and defensive games
CN115134114B (en) Longitudinal federal learning attack defense method based on discrete confusion self-encoder
CN115134174A (en) Adaptive threat mitigation method and system under SDN based on improved Actor-Critic algorithm
Kawamura et al. Neural fictitious self-play on ELF mini-rts
CN113132398A (en) Array honeypot system defense strategy prediction method based on Q learning
Bartoletti et al. A theoretical basis for blockchain extractable value
Zheng et al. Continuous-observation one-sided two-player zero-sum partially observable stochastic game with public actions
CN113866723A (en) Anti-interference decision method applied to cognitive radar
CN111770111A (en) A Quantitative Analysis Method of Attack Defense Tree
CN108377238B (en) Device and method for learning power information network security strategy based on offense and defense confrontation
Belardinelli et al. Verification of stochastic multi-agent systems with forgetful strategies
CN114666107B (en) An Advanced Persistent Threat Defense Approach in Mobile Fog Computing
CN117151210A (en) A method, system, equipment and medium for building a robust federated learning model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant