CN117312810A - Incomplete information attack and defense game opponent identification method based on game history tree - Google Patents

Incomplete information attack and defense game opponent identification method based on game history tree Download PDF

Info

Publication number
CN117312810A
CN117312810A CN202311618095.9A CN202311618095A CN117312810A CN 117312810 A CN117312810 A CN 117312810A CN 202311618095 A CN202311618095 A CN 202311618095A CN 117312810 A CN117312810 A CN 117312810A
Authority
CN
China
Prior art keywords
game history
node
game
opponent
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311618095.9A
Other languages
Chinese (zh)
Other versions
CN117312810B (en
Inventor
陈少飞
胡振震
李鹏
陈佳星
陆丽娜
吉祥
刘鸿福
陈璟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202311618095.9A priority Critical patent/CN117312810B/en
Publication of CN117312810A publication Critical patent/CN117312810A/en
Application granted granted Critical
Publication of CN117312810B publication Critical patent/CN117312810B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer And Data Communications (AREA)

Abstract

The method designs a game history tree, extracts graph structure information in the game history tree by using a graph neural network as opponent characteristics, and realizes online opponent identification by adopting an offline training and online identification framework; constructing an opponent identifier based on a graph neural network model, constructing a game history tree and a graph data set thereof through game history data of different opponents, and performing offline training on the identifier; and constructing a game history tree and a graph model of the current opponent by collecting data of the game office of which the online game is finished, inputting the graph model data into an off-line trained opponent identifier to obtain an opponent identification result, and carrying out network defense by a defender according to the opponent identification result in a subsequent network attack and defense game. The method can rapidly and accurately identify opponents, so that the defending party can defend by adopting a targeted strategy as early as possible, and the network defending efficiency can be greatly improved.

Description

Incomplete information attack and defense game opponent identification method based on game history tree
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a method for identifying incomplete information attack and defense game opponents based on a game history tree.
Background
In the incomplete information network attack and defense game, a network attacker or a network defender utilizes a targeted strategy of a specific opponent to conduct actions, which is an effective opponent utilization method, and can greatly improve the benefits of the game. When a game is played with an opponent in a known range, the opponent utilization is carried out by utilizing a targeted strategy, and the game is often implemented by adopting a framework of offline training of the targeted strategy and online strategy application. One key point of this framework is that it should be able to determine which opponent is the specific one within the range of known opponents when the online policy is applied. Thus, online accurate identification of opponents is a precondition for online application of targeted strategies in gaming.
The online recognition essence of the opponent is to infer the opponent by utilizing the opponent information observed in the network attack and defense game. According to the Bayesian strategy reuse method of game income information, an opponent is determined by utilizing Bayesian reasoning according to the income observation of online games by utilizing a income model obtained in an offline training stage, however, when randomness factors exist in the games, the income fluctuation of short-term (shorter time) observation can cause inaccurate reasoning. The strategy similarity judging method utilizing strategy information needs to utilize an opponent strategy model in an offline training stage to carry out similarity comparison with an opponent strategy model constructed in online game. This approach first faces the challenge of partial adversary information set uncertainty, and even if some assumptions are adopted to circumvent this problem, it still faces the problem that short-term observations cannot obtain enough data to construct an accurate policy model. The method based on the characteristic engineering is mostly based on the statistical information of actions, and usually, an identification model is constructed based on the characteristic statistics of the actions of the opponent, which are designed manually, so that the method has wide application in reality, but because the characteristic statistics constructed by relying on the knowledge of human experience only can extract part of opponent information, and the statistics of the characteristic quantity often needs enough data to ensure accuracy, the problem of limited accuracy is faced when the opponent identification is carried out by utilizing the statistical characteristics obtained by short-term observation.
In summary, the existing online opponent identification method for the incomplete information network attack and defense game cannot realize accurate opponent identification by utilizing opponent information observed in a short period due to the influence of randomness factors and incomplete information characteristics and larger requirements on data quantity, and therefore effective identification of opponents cannot be realized in a short period after the online network attack and defense game starts, which obviously influences the network defense efficiency.
Disclosure of Invention
Based on the above, it is necessary to provide a method for identifying incomplete information attack and defense game opponents based on game history tree to realize more accurate end-to-end online opponent identification.
An incomplete information attack and defense game opponent identification method based on a game history tree, comprising the following steps:
and acquiring multi-game historical data of games between the defending party and different attacking parties in the finished network attack and defense game.
And constructing a game history chain according to the game history data of each office.
And constructing a game history tree according to a game history chain set of the preset number of offices to obtain a directed homography data set.
Constructing an opponent identifier based on a game history tree, wherein the opponent identifier comprises a graph neural network, READOUT functions and a classification network; the graph neural network uses a graph message transmission model as a framework, uses a graph isomorphic neural network as a basic model and uses multi-layer message transmission.
Training the adversary identifier by adopting the directed homograph data set to obtain a trained adversary identifier, and storing the neural network parameters of the bottom layer of the adversary identifier.
And constructing an online game history tree directed graph according to the attack and defense action data of the two parties with the current attacker in the game office where the online game is finished.
Inputting the online game history tree directed graph into a trained opponent identifier, identifying the opponent according to the output probability that the current opponent belongs to different known opponents, and carrying out network defense by a defending party according to the opponent identification result in the subsequent network attack and defense game.
In one embodiment, constructing a gaming history chain from gaming history data for each of the plays includes:
action history data is extracted from the game history data of one game, and a chain taking the game starting state as a starting node is constructed.
Traversing action data of each turn in the action history data, if the current action is the action at the beginning of the turn, adding a natural node representing the change of the turn, then adding an action node representing the current action, and recording corresponding node information; if the current action is not the action at the beginning of the turn, directly adding an action node, and recording corresponding node information; the node information comprises a corresponding game party type, a game position corresponding to the game party, a current round, and action sequences of the attack and defense parties of each round before the current node.
And traversing all the action data to obtain a game history chain, wherein the game history chain is a chain which is connected in a directed manner from a starting node to leaf nodes through a plurality of intermediate nodes according to the sequence of actions.
In one embodiment, a game history tree is constructed according to a game history chain set of a preset number of rounds, so as to obtain a homogeneous map data set, which includes:
and acquiring action data of both the attack and defense parties of the preset number of offices, and constructing a game history chain by combining action sequences of each office with the change of the rounds to construct a set of game history chains.
Distinguishing the action sequence of the attacking and defending parties, and constructing two game history trees with root nodes only, wherein the first game history tree is a game history tree of the first action of the attacking party, and the second game history tree is a game history tree of the first action of the defending party; the root node is a natural node.
Traversing each game history chain, and selecting a corresponding game history tree to expand according to the action sequence of the attack and defense parties of each game history chain to obtain two complete game history trees.
And processing node information of the two complete game history trees, and splicing the two game history trees into a complete directed homogeneous graph model through node serial number change.
Traversing all game history data according to the preset office number to obtain a plurality of game history trees and corresponding directed homogeneous graphs thereof, and forming a graph data set.
In one embodiment, traversing each game history chain, selecting a corresponding game history tree to expand according to the action sequence of the attack and defense parties of each game history chain to obtain two complete game history trees, including:
searching for a corresponding node from a root node in the game history tree by traversing each node in the game history chain;
if the corresponding node exists, increasing the occurrence number of the node by 1 and recording other information of the node;
if the node does not exist in the game history tree, expanding the node from the previous corresponding node, recording the information of the node and the occurrence number of the node as 1, and continuously expanding the leaf nodes until the game history chain is ended;
after traversing all game history chains, obtaining a complete game history tree of the first action of an attacker and a game history tree of the first action of a defender.
In one embodiment, processing node information of two complete game history trees, and splicing the two game history trees into a complete directed homography model through node serial number change includes:
Normalizing the occurrence times of nodes of two complete game history trees, namely taking the ratio of the occurrence times of the current nodes to the root nodes as one of node characteristic information.
And obtaining the number of various actions of the two game parties and taking the number as one of the node characteristic information according to the action sequence of each turn before the node recorded by the node.
The expansion relation from the root node to the leaf node is expressed as directed connection of the edge, and the connection information of the edge is recorded to form an edge list.
And taking the node information and the edge list as picture elements to form a directed homogeneous picture model corresponding to the game history tree.
Changing the node serial numbers of the graph model of the game history tree of the first action of the defender, adding the serial numbers of all nodes on the graph with the total number of the nodes of the graph model corresponding to the game history tree of the first action of the attacker as a new node serial number, and splicing the graph model with the node information matrix and the edge list of the graph model of the game history tree of the first action of the attacker to form a complete directed homogeneous graph model.
In one embodiment, the characteristic update formula of the node in the message passing process of each layer in the graph neural network is as follows:
Wherein,is a feature mapping function, < >>Is node sequence number,/->Is node->Feature vector of>For updated nodes->Feature vector of>For weight parameter, ++>For node->Is a neighbor of (c).
The READOUT function is:
wherein,feature vector for representing the entire graph +.>Is>Vitamin characteristic quantity (I)>The function is +.>The dimension features take the maximum value +.>Is->Feature vector of individual node->Is>And (5) maintaining the feature quantity.
The classification network is a fully connected multi-layer network with Dropout.
In one embodiment, inputting the trained opponent identifier into the online game history tree directed graph, identifying the opponent according to the output probability that the current opponent belongs to different known opponents, and performing network defense by a defender according to the opponent identification result in the subsequent network attack and defense game by adopting a targeted strategy, wherein the method comprises the following steps:
inputting the online game history tree directed graph into a trained opponent identifier to obtain a probability distribution output, and representing the probability that the current opponent belongs to each opponent in a known range.
If the probability of a specific opponent in the probability distribution exceeds a given threshold, the current opponent is considered as the specific opponent, opponent class information is provided for a target strategy adopted later, and a defending party performs network defense according to the opponent class information in the subsequent network attack and defense game.
If the probability of the given threshold value is not exceeded in all the outputs, continuing to collect data in the subsequent game process to update the online game history tree directed graph, inputting the updated online game history tree directed graph into the identifier, and then identifying the opponent according to the output.
According to the incomplete information attack and defense game opponent identification method based on the game history tree, a game history tree model is designed, graph structure information in the game history tree model is extracted as opponent characteristics by using a graph neural network model on the basis of the game history tree model, and online opponent identification is realized by adopting an offline training and online identification framework; and constructing an opponent identifier based on the graph neural network model, constructing a game history tree and a graph data set thereof through game history data of different opponents, and performing offline training on the identifier. In the online application of the identifier, the game history tree of the current opponent and the graph model thereof are constructed by collecting the data of the game office of which the online game is finished, the graph model data is input into the opponent identifier to obtain an opponent identification result, and the defender carries out network defense according to the opponent identification result in the subsequent network attack and defense game. The method realizes the identification of the adversary in an end-to-end mode, and does not need to use a performance model for reasoning or construct a strategy model for similarity comparison. The method can quickly and accurately identify the opponent, so that the defender can defend by adopting a targeted strategy as early as possible, and the network defending efficiency can be greatly improved.
Drawings
Fig. 1 is a flowchart of a method for identifying an incomplete information attack and defense game opponent based on a game history tree in an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
In one embodiment, as shown in fig. 1, there is provided a method for identifying an incomplete information attack game opponent based on a game history tree, the method comprising the steps of:
step 100: and acquiring multi-game historical data of games between the defending party and different attacking parties in the finished network attack and defense game.
Specifically, multi-game historical data of the attack and defense games of the defending party and various different opponents in the finished network attack and defense game process are obtained, and the opponents are regarded as opponents in a known range.
For these attacker opponents within a known range, the defender builds a corresponding targeted policy based on experience and related algorithms.
Step 102: and constructing a game history chain according to the game history data of each office.
Specifically, the defending party can really feel the difference of the opponents in the network attack and defense game process, the difference is different movement choices of the opponents, different strategies of the opponents are reflected, and the strategy essence of the opponents is a set of movement probability distribution of the opponents under different information sets. The use of these action probability distributions as feature information for end-to-end opponent identification is the most straightforward method, neither for bayesian probabilistic reasoning nor for the strategy of reconstructing opponents, as long as a reasonable way can be found to characterize these action probability distributions and extract features from them.
Considering that different choices of actions can be represented by one branching structure, the probability distribution of an adversary about action choices can be considered to a certain extent as a structural feature. This branching structure is a typical graph connection structure in the graph data. In practice this branching structure also occurs in the game tree extension used in the game strategy search. Therefore, inspired by the game tree, we want to construct a tree structure diagram to represent opponent strategy information under different scenes, called game history tree, from the data of the game office where the online game process has ended.
The objective of building a game history tree is to characterize the strategic features of an opponent through a suitable branching structure, the design of which is inspired by the game tree. In any one of the partial information repetition games, the game tree is used for solving the strategy as a model for representing the interaction process of the two game parties. The nodes in the game tree represent the states of the gaming parties, and the edges of the game tree represent the possible actions of the current node gaming party. The game tree continues to grow from the root node to the leaf nodes due to incomplete information, randomness factors, and unknowns of follow-up actions of both parties. In the problem of smaller state space, nash equilibrium solution can be obtained as a strategy by fully expanding the game tree. However, in the problem of large state space, it is difficult to fully expand the game tree to find the nash equilibrium solution due to the limitation of computational complexity, but only a game tree expansion with limited depth is usually performed to search for an approximate equilibrium solution as a strategy. The game tree has a plurality of branch structures for representing the choices of both game parties, but the game tree can only be used for strategy searching in an unfinished office and cannot be used as a basis for identification, because the game tree needs to be expanded according to a plurality of unknown information and is an uncertain object. But we can get elicitations from this tree structure that can be used to develop recognition if a certain tree structure diagram can be constructed from game history data that characterizes opponent strategies to some extent.
In the process of repeating N games, when one game is finished, unknown follow-up actions and unknown random factors considered by two parties when expanding a game tree in a previous game are all become known facts, if hidden information of the other party is not considered, the previously expanded game tree can be reduced to a determined branch from a starting root node to a game finishing leaf node. I.e., the action from the beginning of the game to the end of the game can be seen as a chain of root nodes to leaf nodes, which is called the game history chain, because all the information on the chain has become a realistic history. After we get more chains from more ending game bureaus, if we can unify the chains into a tree, we can statistically describe the opponent's action choices, equivalent to describing opponent policy features. The game history chain is thus first constructed from the collected game history data before the game history tree is constructed.
Step 104: and constructing a game history tree according to a game history chain set of the preset number of offices to obtain a directed homography data set.
Specifically, the game history tree is used for describing action characteristics of two parties in the incomplete information game process, the action information of the two parties of each game office in the attack and defense process is integrated in a tree through a game history chain, a root node is used for representing a natural state of each game office when starting, a game party is used for selecting actions made by the game party on a current node, a subsequent node is used for representing the state after the game party selects actions, information such as the type of the previous action of the current node, the game party corresponding to the action, the game position corresponding to the game party, the number of various actions before the current node, the proportion of the occurrence times of the current node to the occurrence times of the root node and the like is used as node information, and the game history tree is used as a homogeneous directed graph model.
Compared with the method of feature engineering, the method only can select statistics of part of action information to construct limited features, the construction of the game history tree completely utilizes action sequence information of both parties in games in each game office, statistically describes action selection of both parties, can completely characterize characteristics of opponent action probability distribution under the action interaction influence of both parties under the condition of not depending on observable hidden information of opponents, and utilizes a graph neural network to extract structural features of the game history tree to construct an opponent identification model, thereby realizing the purpose of completely and accurately extracting the characteristics of opponent actions by utilizing short-term observation data in online games, effectively improving efficiency and accuracy of online opponent identification, and further rapidly applying a targeted strategy to obtain larger profits.
Therefore, a game history tree is built according to a game history chain set of the preset number of offices, and then a homogeneous map data set is obtained so as to develop training of a follow-up identifier.
Step 106: constructing an opponent identifier based on a game history tree, wherein the opponent identifier comprises a graph neural network, READOUT functions and a classification network; the graph neural network uses a graph message transmission model as a framework, uses a graph isomorphic neural network as a base model and uses multi-layer message transmission.
Specifically, since we describe the opponent's features as a whole of the game history tree, the opponent identification task based on the game history tree is a graph classification task in units of the whole graph, not a node classification task in units of nodes. The graph classification can be realized based on similarity measurement between the graphs, one class is to construct measurement indexes based on graph editing distance between two graphs, graph isomorphism relation, attribute parameters of the graphs and the like, and the other class is to map the graphs to a high-dimensional space based on a graph kernel method, so that the inner product similarity of the two graphs in the high-dimensional space representation can be calculated by using a kernel function. By using the similarity measurement between any two graphs and combining optimization targets such as maximum intervals among the classes, the effective distinction among different classes can be realized by using a classifier such as a support vector machine. However, as known from the classification decision function of a single sample based on a similarity classifier, classifying a new sample requires the computation of a kernel function between the sample and all training samples, and thus the computational complexity of this approach is relatively high. In addition, similar feature representation and classification processes in the methods are separated, and cannot be optimized uniformly, so that the final performance is limited.
The method of using the graph neural network is therefore chosen to construct a graph classifier to achieve adversary identification. It has two advantages: (1) Model training is carried out in an end-to-end mode, optimization of feature representation and classification processes can be integrated, and accuracy of graph classification is improved. (2) The method has the advantages that the single graph data is used as input, the probability of all possible categories is used as output, similarity calculation among graphs is avoided, and the reasoning calculation complexity of the model can be effectively reduced.
The graph messaging model is adopted as the framework of the graph neural network, the graph isomorphic model is adopted as the basic model, and 5-layer messaging is adopted. Obtaining an overall feature representation of the input graph using node feature maximization as a READOUT function (and without consideration of the jump knowledgemechanism); finally, the feature vector of the graph is connected to the full-connection multi-layer network with Dropout to realize classification. The graph neural network is used for extracting graph structural features on the game history tree corresponding to the action probability distribution.
Step 108: training the adversary identifier by adopting the directed homography data set to obtain a trained adversary identifier, and storing the neural network parameters of the bottom layer of the adversary identifier.
Specifically, a game history tree directed graph dataset is built according to game history data of different opponent attack and defense games in a preset number of bureaus, an opponent identifier based on the game history tree is trained offline by utilizing the directed graph dataset, and an opponent identifier in a known range is obtained and is applied to online opponent identification.
The adversary identifier building training process is as follows:
(1) A graph dataset based on the game history tree is constructed from game history data of opponents of known scope.
(2) And constructing an opponent identifier based on the graph neural network model facing the graph classification task according to the data characteristics of the graph dataset.
(3) Training the opponent identifier until convergence by using the graph data set, and storing the neural network model parameters of the bottom layer of the identifier. Specifically, in the training process of the adversary identifier, the graph data set is divided into a training set and a testing set, the cross entropy of classification is used as a loss function, the random optimizer Adam is used for backward propagation of loss, and multiple training is carried out by adopting the variable batch size and the learning rate to improve the classification performance. And after training, saving model parameters, and evaluating the final performance of the model by taking the classification accuracy as a final index.
Step 110: and constructing an online game history tree directed graph according to the attack and defense action data of the defender and the current attacker in the game office of which the online game is finished.
Specifically, because the current opponent is in the known opponent range during online game, the current opponent is not determined, at the moment, a game history tree and a corresponding directed homogeneity map model can be constructed according to the action data of the two parties in the game office which is ended in the current opponent attack and defense game, and the game history tree and the directed homogeneity map model are used as input of a recognizer to perform inference calculation so as to determine the opponent according to probability distribution output by the recognizer.
Step 112: inputting the online game history tree directed graph into a trained opponent identifier, identifying the opponent according to the output probability that the current opponent belongs to different known opponents, and carrying out network defense by a defender according to the opponent identification result in the subsequent network attack and defense game by adopting a targeted strategy.
In particular, when a defender plays a game with one of the opponents within a known range, a key issue is how to quickly identify which opponent is currently within the known range. Once it is determined which opponents are currently, the targeted strategies can be quickly applied to achieve better gaming results.
The recognition (classification) device of the opponent in the known range is built from the attack and defense game history data of different opponents through offline training, and then the recognition device is utilized in online games, so that the opponents are rapidly and accurately recognized according to the data acquired from the game bureau which is already finished in the current attack and defense game process of the opponents in the shortest time (namely as few observation data as possible), thereby applying the targeted strategy as soon as possible and improving the network defense efficiency.
According to the incomplete information attack and defense game opponent identification method based on the game history tree, a game history tree model is designed, graph structure information in the game history tree model is extracted as opponent characteristics by using a graph neural network model on the basis of the game history tree model, and online opponent identification is realized by adopting an offline training and online identification framework; and constructing an opponent identifier based on the graph neural network model, constructing a game history tree and a graph data set thereof through game history data of different opponents, and performing offline training on the identifier. In the online application of the identifier after the offline training is completed, the online game is collected to construct a game history tree of the current opponent and a graph model thereof, graph model data are input into the opponent identifier to obtain an opponent identification result, and a defender carries out network defense according to the opponent identification result in the subsequent network attack and defense game by adopting a targeted strategy. The method realizes the identification of the adversary in an end-to-end mode, and does not need to use a performance model for reasoning or construct a strategy model for similarity comparison. The method can quickly and accurately identify the opponent, so that the defender can defend by adopting a targeted strategy as early as possible, and the network defending efficiency can be greatly improved.
In one embodiment, step 102 includes: extracting action history data from one game history data, and constructing a chain taking a game starting state as a starting node; traversing action data of each turn in the action history data, if the current action is the action at the beginning of the turn, adding a natural node representing the change of the turn, then adding an action node representing the current action, and recording corresponding node information; if the current action is not the action at the beginning of the turn, directly adding an action node, and recording corresponding node information; the node information comprises a corresponding game party type, a game position corresponding to a game party, a current round, and action sequences of the attack and defense parties of each round before the current node; and traversing all the action data to obtain a game history chain, wherein the game history chain is a chain which is connected in a directed manner from a starting node to leaf nodes through a plurality of intermediate nodes according to the sequence of actions. Nodes in the game history chain are classified into three categories: one is a natural node representing the start or turn of a game, the second is a defender node, and the third is an attacker node.
In one embodiment, step 104 includes the steps of:
step 200: and acquiring action data of both the attack and defense parties of the preset number of offices, and constructing a game history chain by combining action sequences of each office with the change of the rounds to construct a set of game history chains.
Step 202: distinguishing the action sequence of the attack and defense parties, and constructing two game history trees with root nodes only; the first game history tree is a game history tree of which the attacker acts first, and the second game history tree is a game history tree of which the defender acts first; the root node is a natural node.
Step 204: traversing each game history chain, and selecting a corresponding game history tree to expand according to the action sequence of the attack and defense parties of each game history chain to obtain two complete game history trees.
Step 206: and processing node information of the two game history trees, and splicing the two game history trees into a complete directed homogeneous graph model through node serial number change.
Step 208: traversing all game history data according to the preset office number to obtain a plurality of game history trees and corresponding directed homogeneous graphs thereof, and forming a graph data set.
In one embodiment, step 204 includes: searching for a corresponding node from a root node in the game history tree by traversing each node in the game history chain; if the corresponding node exists, increasing the occurrence number of the node by 1 and recording other node information; if the node does not exist in the game history tree, expanding the node from the previous corresponding node, recording the information of the node and the occurrence number of the node as 1, and continuously expanding the leaf nodes until the game history chain is ended; after traversing all game history chains, obtaining a complete game history tree of the first action of an attacker and a game history tree of the first action of a defender.
In one embodiment, step 206 includes: normalizing the node occurrence times of the two game history trees, namely taking the ratio of the current node occurrence times to the root node occurrence times as one of node characteristic information; according to the action sequence of each turn before the node recorded by the node, the number of various actions of both game parties is obtained and is used as one of the node characteristic information; representing the expansion relation from the root node to the leaf node as directed connection of the edge, and recording the connection information of the edge to form an edge list; taking the node information and the edge list as picture elements to form a directed homogeneous picture model corresponding to the game history tree; changing the node serial numbers of the graph model of the game history tree of the first action of the defender, taking the serial numbers of all nodes on the graph plus the total number of nodes of the graph model corresponding to the game history tree of the first action of the attacker as new node serial numbers, and splicing the graph model with the node information matrix and the edge list of the graph model of the game history tree of the first action of the attacker to form a complete directed homogeneous graph model.
In one embodiment, the feature update formula of the node in the message passing process of each layer in the graph neural network in the opponent identifier based on the game history tree in step 106 is:
Wherein,is a feature mapping function, < >>Is node sequence number,/->Is node->Feature vector of>For updated nodes->Feature vector of>Taking a constant as a weight parameter, +.>For node->Is a neighbor of (c).
READOUT function is:
wherein,to represent the whole graph feature vector +.>Is>Vitamin characteristic quantity (I)>The function is the first to all nodesThe dimension features take the maximum value +.>Is->Feature vector of individual node->Is>And (5) maintaining the feature quantity.
The classification network is a fully connected multi-layer network with Dropout.
In one embodiment, step 112 comprises: inputting the online game history tree directed graph into a trained opponent identifier to obtain a probability distribution output, wherein the probability distribution represents the probability that the current opponent belongs to each opponent in a known range; if the probability of a specific opponent in the probability distribution exceeds a given threshold, the current opponent is considered as the specific opponent, opponent class information is provided for a target strategy adopted subsequently, and a defending party performs network defense according to the opponent class information in a subsequent network attack and defense game; if the probability of the given threshold value is not exceeded in all the outputs, continuing to collect data in the subsequent game process to update the online game history tree directed graph, inputting the updated online game history tree directed graph into the identifier, and then identifying the opponent according to the output.
Specifically, the online adversary identification based on the adversary identifier includes the following steps:
(1) And constructing a game history tree directed graph model from game data of the game office which is finished in the online game. When the computing resources are sufficient, a new graph is updated at the end of each session from the beginning of the game. When the computing resources are limited, the new graph is updated at preset intervals. To identify opponents as quickly as possible, the number of spaced offices should be as small as possible.
(2) Inputting the obtained online game history tree directed graph into an opponent identifier obtained through offline training, and outputting a probability distribution to represent the probability that the current opponent belongs to each opponent in a known range.
(3) If the probability of a specific opponent in the probability distribution exceeds a given threshold, the current opponent is considered to be the specific opponent, and opponent category information is provided for a target strategy to be adopted subsequently. If the probability of the given threshold value is not exceeded in all the outputs, the current data is considered to be insufficient to enable the identifier to give a sufficiently accurate judgment, and the data needs to be continuously collected in the subsequent game process to update the online game history tree directed graph and be used for identifying the opponent.
It should be understood that, although the steps in the flowchart of fig. 1 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 1 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of other steps or sub-steps of other steps.
In one illustrative embodiment, we want to identify opponents in as short a time as possible when online gaming, i.e. to construct a game history tree using as few data as possible to achieve opponent identification, we examine the effect of using game data on identification effect with different preset numbers of game offices to analyzeRecognition effect of the trained recognizer on the constructed game history tree diagram dataset.
First, consider the performance of four identifiers RCG500, RCG750, RCG1000, and RCG3000, respectively, on test sets of four data sets GHT500, GHT750, GHT1000, and GHT3000, where RCG represents the identifier, GHT represents the data set, and subsequent values represent different preset numbers. The overall recognition accuracy is shown in table 1.
TABLE 1 recognition accuracy of different recognizers on each test set
Note that: the first identifier in the table is RCG500, the second identifier is RCG750, the third identifier is RCG1000, and the fourth identifier is RCG3000.
The first, second, third, and fourth data sets are GHT500, GHT750, GHT1000, and GHT3000 data sets, respectively.
It can be seen that other recognizers perform well on each dataset, except that RCG3000 application has a lower recognition rate on the other datasets. Description by based on different preset office numbers ) The identifier for training the data set of the game can be used for carrying out opponent identification on the data sets with different preset office numbers, and also can be used for carrying out opponent identification by using a game history tree diagram constructed from the data with different office numbers when online games are described. Note that the performance of the recognizer RCG500 is particularly good, reaching recognition accuracy of 99% above over GHT750, GHT1000, and GHT3000 datasets, but slightly lower over the test set of GHT500 for the same preset number of rounds as the own training set. This illustrates that the identifier RCG500 trained on the GHT500 dataset has been able to characterize key features of an adversary, and that these features are also significant on a larger number of preset datasets. While the recognizer RCG3000 trained on the GHT3000 dataset had a significant performance degradation on the dataset with a smaller preset number of rounds, indicating that the recognizer utilized more features of the graph data in the GHT3000 dataset for recognition, but some features were not significant on the dataset with a smaller preset number of rounds. This phenomenon indicates that the features captured by the training of the identifier in the case of the smaller preset number of rounds can be used in the case of the larger preset number of rounds, while the features captured in the case of the larger preset number of rounds cannot be used in the case of the smaller preset number of rounds. The extracted features are enough to reflect the difference of opponents under the condition of small preset office numbers, which is beneficial to quickly identifying the opponents in online games.
Experiments were compared with feature engineering based methods using four simple action statistics as features toThe overall recognition accuracy of the identifier trained on the acquired feature set in the office data on the different feature sets is shown in table 2, in which +.>The preset number of office corresponding to the feature set for training the recognizer, </i >>The number of the preset bureaus corresponding to the feature set for testing is set.
TABLE 2 recognition accuracy of different recognizers on each test set
Can be seen only inAnd->When the same is adopted, the identifier has better identification accuracy, which indicates that the obtained characteristics have obvious change when the statistics of the preset numbers of rounds are different, the identifier trained on a specific preset number of features set cannot be applied to other feature sets that differ in preset number. And the recognition accuracy reaches more than 90% only when the preset number of rounds is equal to 3000, which shows that the difference of opponents can be effectively reflected only by the features counted from 3000 rounds of data, and the quick and accurate recognition of the opponents is difficult to realize in early stage of online game based on the current features.
The foregoing experiments demonstrate that GHT500 and RCG500 perform well when used to characterize and identify opponents, and to investigate the performance of the identifier when the preset number of rounds is smaller, we trained RCG50, RCG100, RCG200, and RCG300 identifiers on GHT50, GHT100, GHT200, and GHT300 data sets, whose identification accuracy on test sets of the different data sets is shown in table 3.
TABLE 3 identification accuracy of different identifiers on each test set with smaller preset number of rounds
Note that: the fifth identifier is an RCG50, the sixth identifier is an RCG100, the seventh identifier is an RCG200, the eighth identifier is an RCG300 identifier, and the first identifier is an RCG500.
The fifth, sixth, seventh, eighth, ninth, tenth, first, second, third, and fourth data sets are GHT50, GHT100, GHT150, GHT200, GHT250, GHT300, GHT500, GHT750, GHT1000, and GHT3000 data sets, respectively.
It can be seen that each identifier has good performance on test sets with different preset numbers, and the identification accuracy is mostly over 90%. From the change rule of the recognition rate, the recognition effect is better when the preset number of bureaus of the training set and the test set is close or the preset number of bureaus of the training set is slightly smaller than the preset number of bureaus of the test set. To more quickly identify an adversary with fewer observations during online identification, it is necessary to train the identifier on a dataset with a smaller number of preset offices. The recognition rate of the RCG50 on the GHT100 reaches more than 93%, which means that in the repeated game process of thousands of bureaus, accurate recognition of opponents can be realized in early game stage.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (7)

1. The incomplete information attack and defense game opponent identification method based on the game history tree is characterized by comprising the following steps of:
acquiring multi-game historical data of games between a defender and different aggressors in the finished network attack and defense game;
constructing a game history chain according to the game history data of each office;
constructing a game history tree according to a game history chain set of a preset office number to obtain a directed homogeneous graph data set;
Constructing an opponent identifier based on a game history tree, wherein the opponent identifier comprises a graph neural network, READOUT functions and a classification network; the graph neural network uses a graph message transmission model as a framework, uses a graph isomorphic neural network as a basic model, and uses multi-layer message transmission;
training the adversary identifier by adopting the directed homograph data set to obtain a trained adversary identifier, and storing neural network parameters of the bottom layer of the adversary identifier;
constructing an online game history tree directed graph according to attack and defense action data of a defender and a current attacker in an ending game office of the online game;
inputting the online game history tree directed graph into a trained opponent identifier, identifying the opponent according to the output probability that the current opponent belongs to different known opponents, and carrying out network defense by a defending party according to the opponent identification result in the subsequent network attack and defense game.
2. The method of claim 1, wherein constructing a gaming history chain from gaming history data for each of the plays comprises:
extracting action history data from one game history data, and constructing a chain taking a game starting state as a starting node;
Traversing action data of each turn in the action history data, if the current action is the action at the beginning of the turn, adding a natural node representing the change of the turn, then adding an action node representing the current action, and recording corresponding node information; if the current action is not the action at the beginning of the turn, directly adding an action node, and recording corresponding node information; the node information comprises a corresponding game party type, a game position corresponding to a game party, a current round, and action sequences of the attack and defense parties of each round before the current node;
and traversing all the action data to obtain a game history chain, wherein the game history chain is a chain which is connected in a directed manner from a starting node to leaf nodes through a plurality of intermediate nodes according to the sequence of actions.
3. The method of claim 1, wherein constructing a game history tree from a set of game history chains of a predetermined number of game plays to obtain a directed homogeneity map dataset comprises:
obtaining action data of both the attack and defense parties of the preset number of offices, combining action sequences of each office with the change of turns to construct a game history chain, and constructing a set of game history chains;
Distinguishing the action sequence of the attack and defense parties, and constructing two game history trees with root nodes only; the first game history tree is a game history tree of which the attacker acts first, and the second game history tree is a game history tree of which the defender acts first; the root node is a natural node;
traversing each game history chain, and selecting a corresponding game history tree to expand according to the action sequence of the attack and defense parties of each game history chain to obtain two complete game history trees;
processing node information of the two complete game history trees, and splicing the two game history trees into a complete directed homogeneous graph model through node serial number change;
traversing all game history data according to the preset office number to obtain a plurality of game history trees and corresponding directed homogeneous graphs thereof, and forming a graph data set.
4. A method according to claim 3, wherein traversing each gaming history chain and selecting a corresponding gaming history tree for expansion according to the sequence of actions of both the offender and the offender of each gaming history chain to obtain two complete gaming history trees comprises:
searching for a corresponding node from a root node in the game history tree by traversing each node in the game history chain;
If the corresponding node exists, increasing the occurrence number of the node by 1 and recording other information of the node;
if the node does not exist in the game history tree, expanding the node from the previous corresponding node, recording the information of the node and the occurrence number of the node as 1, and continuously expanding the leaf nodes until the game history chain is ended;
after traversing all game history chains, obtaining a complete game history tree of the first action of an attacker and a game history tree of the first action of a defender.
5. A method according to claim 3, wherein processing the node information of the two complete game history trees and stitching the two game history trees into a complete directed homography model by node sequence number change comprises:
normalizing the node occurrence times of the two complete game history trees, namely taking the ratio of the current node occurrence times to the root node occurrence times as one of node characteristic information;
according to the action sequence of each turn before the node recorded by the node, the number of various actions of both game parties is obtained and is used as one of the node characteristic information;
representing the expansion relation from the root node to the leaf node as directed connection of the edge, and recording the connection information of the edge to form an edge list;
Taking the node information and the edge list as picture elements to form a directed homogeneous picture model corresponding to the game history tree;
changing the node serial numbers of the graph model of the game history tree of the first action of the defender, adding the serial numbers of all nodes on the graph with the total number of the nodes of the graph model corresponding to the game history tree of the first action of the attacker as a new node serial number, and splicing the graph model with the node information matrix and the edge list of the graph model of the game history tree of the first action of the attacker to form a complete directed homogeneous graph model.
6. The method of claim 1, wherein the characteristic update formula of the nodes in the message passing process of each layer in the graph neural network is:
wherein,is a feature mapping function, < >>Is node sequence number,/->Is node->Feature vector of>For updated nodes->Feature vector of>Is a weight parameter->For node->Is a neighbor of (a);
the READOUT function is:
wherein,feature vector for representing the entire graph +.>Is>Vitamin characteristic quantity (I)>The function is +.>The dimension features take the maximum value +.>Is->Feature vector of individual node->Is>A dimension feature quantity;
The classification network is a fully connected multi-layer network with Dropout.
7. The method of claim 1, wherein inputting the online game history tree directed graph into a trained opponent identifier, identifying an opponent based on the output probability that the current opponent belongs to a different known opponent, and wherein the defending party performs network defense in the subsequent network attack and defense game by adopting a targeted strategy according to the opponent identification result, comprising:
inputting the online game history tree directed graph into a trained opponent identifier to obtain a probability distribution output, wherein the probability distribution output represents the probability that the current opponent belongs to each opponent in a known range;
if the probability of a specific opponent in the probability distribution exceeds a given threshold, the current opponent is considered as the specific opponent, opponent class information is provided for a target strategy adopted subsequently, and a defending party performs network defense according to the opponent class information in a subsequent network attack and defense game;
if the probability of the given threshold value is not exceeded in all the outputs, continuing to collect data in the subsequent game process to update the online game history tree directed graph, inputting the updated online game history tree directed graph into the identifier, and then identifying the opponent according to the output.
CN202311618095.9A 2023-11-30 2023-11-30 Incomplete information attack and defense game opponent identification method based on game history tree Active CN117312810B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311618095.9A CN117312810B (en) 2023-11-30 2023-11-30 Incomplete information attack and defense game opponent identification method based on game history tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311618095.9A CN117312810B (en) 2023-11-30 2023-11-30 Incomplete information attack and defense game opponent identification method based on game history tree

Publications (2)

Publication Number Publication Date
CN117312810A true CN117312810A (en) 2023-12-29
CN117312810B CN117312810B (en) 2024-02-23

Family

ID=89255768

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311618095.9A Active CN117312810B (en) 2023-11-30 2023-11-30 Incomplete information attack and defense game opponent identification method based on game history tree

Country Status (1)

Country Link
CN (1) CN117312810B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130312092A1 (en) * 2012-05-11 2013-11-21 Wintermute, Llc System and method for forensic cyber adversary profiling, attribution and attack identification
EP3370219A1 (en) * 2017-03-03 2018-09-05 MBDA France Method and device for predicting optimal attack and defence solutions in the scenario of a military conflict
CN112329348A (en) * 2020-11-06 2021-02-05 东北大学 Intelligent decision-making method for military countermeasure game under incomplete information condition
CN113599798A (en) * 2021-08-25 2021-11-05 上海交通大学 Chinese chess game learning method and system based on deep reinforcement learning method
CN113806546A (en) * 2021-09-30 2021-12-17 中国人民解放军国防科技大学 Cooperative training-based method and system for defending confrontation of graph neural network
CN116708042A (en) * 2023-08-08 2023-09-05 中国科学技术大学 Strategy space exploration method for network defense game decision
US20230351027A1 (en) * 2019-08-29 2023-11-02 Darktrace Holdings Limited Intelligent adversary simulator

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130312092A1 (en) * 2012-05-11 2013-11-21 Wintermute, Llc System and method for forensic cyber adversary profiling, attribution and attack identification
EP3370219A1 (en) * 2017-03-03 2018-09-05 MBDA France Method and device for predicting optimal attack and defence solutions in the scenario of a military conflict
US20230351027A1 (en) * 2019-08-29 2023-11-02 Darktrace Holdings Limited Intelligent adversary simulator
CN112329348A (en) * 2020-11-06 2021-02-05 东北大学 Intelligent decision-making method for military countermeasure game under incomplete information condition
CN113599798A (en) * 2021-08-25 2021-11-05 上海交通大学 Chinese chess game learning method and system based on deep reinforcement learning method
CN113806546A (en) * 2021-09-30 2021-12-17 中国人民解放军国防科技大学 Cooperative training-based method and system for defending confrontation of graph neural network
CN116708042A (en) * 2023-08-08 2023-09-05 中国科学技术大学 Strategy space exploration method for network defense game decision

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YUAN WEILIN等: "Imperfect Information Game in Multiplayer No-limit Texas Hold’em Based on Mean Approximation and Deep CFVnet", 2021 CHINA AUTOMATION CONGRESS (CAC), pages 2459 - 2466 *
张红旗;杨峻楠;张传富;: "基于不完全信息随机博弈与Q-learning的防御决策方法", 通信学报, no. 08, pages 60 - 72 *

Also Published As

Publication number Publication date
CN117312810B (en) 2024-02-23

Similar Documents

Publication Publication Date Title
Pang et al. Finding the best from the second bests-inhibiting subjective bias in evaluation of visual tracking algorithms
Dereszynski et al. Learning probabilistic behavior models in real-time strategy games
Effendy et al. Handling imbalanced data in customer churn prediction using combined sampling and weighted random forest
CN110472296B (en) Air combat target threat assessment method based on standardized full-connection residual error network
CN113392914B (en) Anomaly detection algorithm for constructing isolated forest based on weight of data features
CN113283590A (en) Defense method for backdoor attack
Joash Fernandes et al. Predicting plays in the national football league
CN115577795A (en) Policy model optimization method and device and storage medium
CN115987552A (en) Network intrusion detection method based on deep learning
CN116244647A (en) Unmanned aerial vehicle cluster running state estimation method
Scott et al. How does AI play football? An analysis of RL and real-world football strategies
Sikka et al. Basketball win percentage prediction using ensemble-based machine learning
Sudhamathy et al. PREDICTION ON IPL DATA USING MACHINE LEARNING TECHNIQUES IN R PACKAGE.
Ruderman et al. Uncovering surprising behaviors in reinforcement learning via worst-case analysis
CN117312810B (en) Incomplete information attack and defense game opponent identification method based on game history tree
CN116452904B (en) Image aesthetic quality determination method
CN112699957A (en) DARTS-based image classification optimization method
Soemers et al. Learning policies from self-play with policy gradients and MCTS value estimates
Tseng Predicting victories in video games: using single XGBoost with feature engineering: IEEE BigData 2021 Cup-Predicting Victories in Video Games
CN112364566B (en) Deduction prediction method based on typical time data characteristics
Hamilton et al. Opponent resource prediction in starcraft using imperfect information
CN114344916A (en) Data processing method and related device
Hao et al. Predication of NCAA bracket using recurrent neural network and combinatorial fusion
Campbell et al. A Curriculum Framework for Autonomous Network Defense using Multi-agent Reinforcement Learning
Papagiannis et al. Applying gradient boosting trees and stochastic leaf evaluation to MCTS on hearthstone

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant