CN112434791A - Multi-agent strong countermeasure simulation method and device and electronic equipment - Google Patents

Multi-agent strong countermeasure simulation method and device and electronic equipment Download PDF

Info

Publication number
CN112434791A
CN112434791A CN202011270335.7A CN202011270335A CN112434791A CN 112434791 A CN112434791 A CN 112434791A CN 202011270335 A CN202011270335 A CN 202011270335A CN 112434791 A CN112434791 A CN 112434791A
Authority
CN
China
Prior art keywords
network
confrontation
countermeasure
agent
strong
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011270335.7A
Other languages
Chinese (zh)
Inventor
白桦
王群勇
孙旭朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING SHENGTAOPING TEST ENGINEERING TECHNOLOGY RESEARCH INSTITUTE
Original Assignee
BEIJING SHENGTAOPING TEST ENGINEERING TECHNOLOGY RESEARCH INSTITUTE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING SHENGTAOPING TEST ENGINEERING TECHNOLOGY RESEARCH INSTITUTE filed Critical BEIJING SHENGTAOPING TEST ENGINEERING TECHNOLOGY RESEARCH INSTITUTE
Priority to CN202011270335.7A priority Critical patent/CN112434791A/en
Publication of CN112434791A publication Critical patent/CN112434791A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a multi-agent strong confrontation simulation method, a multi-agent strong confrontation simulation device and electronic equipment, wherein the method comprises the following steps: acquiring multi-round demonstration confrontation playback data from a confrontation simulation engine, and training and acquiring a neural network strategy model by adopting a confrontation network generation technology based on the confrontation playback data; and simulating the decision process of the multi-agent in the strong confrontation process by using the neural network strategy model to complete the strong confrontation simulation of the multi-agent. By learning the historical data, the training speed of the multi-agent strong confrontation model can be accelerated, so that the operation efficiency is effectively improved, and the computing resources are effectively saved.

Description

Multi-agent strong countermeasure simulation method and device and electronic equipment
Technical Field
The invention relates to the technical field of system simulation, in particular to a multi-agent strong countermeasure simulation method and device and electronic equipment.
Background
A Multi-Agent modeling method is based on a model theory of artificial intelligence and organizational behavior, a Multi-Agent System (MAS) is combined with the research of a mathematical model in a specific field, and the Multi-Agent modeling method already covers a plurality of traditional and advanced scientific fields such as a bionic optimization algorithm, computational economy, artificial society, knowledge propagation engineering, war and political complex systems and the like.
The existing Deep Reinforcement Learning (DQN) technical framework is one of the main methods for establishing a multi-agent strong countermeasure model. However, in the multi-agent strong countermeasure application, the continuous time sequence output action space dimension is huge, so that the number of parameters of the DQN model is also huge. If the model parameters are trained from the initial values, a large amount of training time is consumed to obtain satisfactory results, and the efficiency is low.
Disclosure of Invention
The invention provides a multi-agent strong countermeasure simulation method, a multi-agent strong countermeasure simulation device and electronic equipment, which are used for solving the defect of low operation efficiency in the prior art and achieving the aim of effectively improving the operation efficiency.
The invention provides a multi-agent strong confrontation simulation method, which comprises the following steps:
acquiring multi-round demonstration confrontation playback data from a confrontation simulation engine, and training and acquiring a neural network strategy model by adopting a confrontation network generation technology based on the confrontation playback data;
and simulating the decision process of the multi-agent in the strong confrontation process by using the neural network strategy model to complete the strong confrontation simulation of the multi-agent.
According to the multi-agent strong countermeasure simulation method of one embodiment of the invention, the neural network strategy model comprises a discrimination network and a strategy network;
wherein the discriminative network is configured to classify the input countermeasure data, and an output of the discriminative network is configured to indicate whether the input countermeasure data complies with a demonstration countermeasure policy;
the policy network is used for reading the state data of the strong countermeasure process and generating the countermeasure policy to be adopted under the state data based on the state data.
According to the multi-agent strong countermeasure simulation method of one embodiment of the invention, before the training and obtaining the neural network strategy model, the method further comprises the following steps:
determining a discriminant loss sum of the demonstration sample and the simulation sample as a loss of the discriminant network, wherein a loss function of the discriminant network is expressed as follows:
Dloss=Dloss-expert+Dloss-learner
in the formula, DlossRepresenting the loss of said discrimination network, Dloss-expertRepresenting the cross entropy, D, of the actual output and the expected output of the discriminative network on the presentation sampleloss-learnerRepresenting a cross entropy of an actual output and an expected output of the discrimination network on the mimic sample;
the goal of the discriminating network is to minimize the discriminating loss sum.
According to the multi-agent strong confrontation simulation method of one embodiment of the invention, before the determining the discrimination loss sum of the demonstration sample and the simulation sample as the loss of the discrimination network, the method further comprises the following steps:
the cross entropy is calculated as follows:
l(x,y)=L={l1,...,ln,...,lN}T
ln=-wn[yn·logxn+(1-yn)·log(1-xn)];
in the formula, l (x, y) represents the cross entropy of the vectors x and y, and is defined as a vector { l ] composed of the cross entropy of each component of the vectors x and y1,...,ln,...,lN}T,lnAs corresponding components x of the vectors x, ynAnd ynCross entropy of (1), wnIs the weight of the component N, N being the dimension of the vector x, y.
According to the multi-agent strong countermeasure simulation method of one embodiment of the invention, before the training and obtaining the neural network strategy model, the method further comprises the following steps:
determining a reward function for the policy network as follows:
Reward=-log(D(ΠL));
where Reward represents the return of the policy network, ΠLRepresenting said simulated sample, D (Π)L) Representing a cross entropy of an actual output and an expected output of the discrimination network on the mimic sample;
determining a goal of the policy network to maximize a reward of the policy network;
and/or determining a loss function of the policy network as follows:
Figure BDA0002777479480000031
in the formula, pd represents the confrontation command parameter probability distribution constructed by the parameters output by the policy network, action represents the command parameter value obtained by sampling the constructed probability distribution, log _ prob represents the log probability density of the probability distribution at the sample point of the action value, entrypy represents the entropy of the probability distribution, and beta represents a hyper-parameter.
According to the multi-agent strong confrontation simulation method, the decision process of the multi-agent in the strong confrontation process is simulated by utilizing the neural network strategy model, and the method comprises the following steps:
constructing the countermeasure command parameter probability distribution based on the output of the policy network, and sampling and acquiring countermeasure command parameters from the countermeasure command parameter probability distribution;
converting the countermeasure command parameters into a countermeasure command list according to an interface format required by the countermeasure simulation engine, and inputting the countermeasure command list into the countermeasure simulation engine.
According to the multi-agent strong countermeasure simulation method, the discrimination network is specifically a binary classification neural network, the input of the binary classification neural network is tensor coding of a combined countermeasure state and countermeasure command list, and the output of the binary classification neural network is a binary classification scalar quantity within [0,1 ].
The invention also provides a multi-agent strong confrontation simulation device, which comprises:
the training module is used for acquiring multi-round demonstration confrontation playback data from the confrontation simulation engine and training and acquiring a neural network strategy model by adopting a confrontation network generation technology based on the confrontation playback data;
and the simulation module is used for simulating the decision process of the multi-agent in the strong confrontation process by utilizing the neural network strategy model so as to complete the strong confrontation simulation of the multi-agent.
The invention also provides an electronic device, which comprises a memory, a processor and a program or an instruction which is stored on the memory and can run on the processor, wherein when the processor executes the program or the instruction, the steps of the multi-agent strong countermeasure simulation method are realized.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a program or instructions which, when executed by a computer, implement the steps of the multi-agent warfare simulation method as any one of the above.
According to the multi-agent strong-confrontation simulation method, the multi-agent strong-confrontation simulation device and the electronic equipment, the training speed of the multi-agent strong-confrontation model can be accelerated by learning historical data, so that the operation efficiency is effectively improved, and the computing resources are effectively saved.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed to be used in the description of the present invention or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
FIG. 1 is a schematic diagram of the overall system structure in the multi-agent strong countermeasure simulation method provided by the present invention;
FIG. 2 is a schematic flow chart of a multi-agent strong countermeasure simulation method provided by the present invention;
FIG. 3 is a schematic flow chart of data acquisition of a demonstration countermeasure playback in the multi-agent strong countermeasure simulation method provided by the present invention;
FIG. 4 is a schematic diagram of a data structure in the multi-agent strong countermeasure simulation method provided by the present invention;
FIG. 5 is a schematic diagram of a reinforcement learning control loop in the multi-agent strong confrontation simulation method provided by the present invention;
FIG. 6 is a schematic diagram of a DQN behavior value function approximation network in the multi-agent strong countermeasure simulation method provided by the present invention;
FIG. 7 is a schematic flow chart of a neural network strategy model training in the multi-agent strong confrontation simulation method provided by the present invention;
FIG. 8 is a schematic structural diagram of a multi-agent strong countermeasure simulation apparatus provided in the present invention;
fig. 9 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Aiming at the problem of low operation efficiency in multi-agent strong confrontation simulation in the prior art, the training speed of the multi-agent strong confrontation model can be accelerated by learning historical data, so that the operation efficiency is effectively improved, and the calculation resources are effectively saved. The present invention will now be described and explained with reference to a number of embodiments, in particular with reference to the accompanying drawings.
As shown in fig. 1, for the general structural schematic diagram of the system in the multi-agent strong countermeasure simulation method provided by the present invention, in order to quickly establish a neural network countermeasure strategy with high intelligence, a generation countermeasure network technology is first adopted, and existing high-level countermeasure replay data is utilized to quickly optimize a neural network strategy model, so that the neural network strategy model can simulate the countermeasure strategies adopted in the replay to reach the same intelligence level. The neural network strategy model generated after training can be directly used for intelligent confrontation simulation, and can be further optimized and improved through a reinforcement learning technology to reach a higher intelligent level.
Fig. 2 is a schematic flow chart of a multi-agent strong confrontation simulation method provided by the present invention, as shown in fig. 2, the method includes:
s201, acquiring multi-round demonstration countermeasure playback data from an countermeasure simulation engine, and training and acquiring a neural network strategy model by adopting a countermeasure network generation technology based on the countermeasure playback data.
It can be understood that, as shown in fig. 3, for the schematic flow chart of the acquisition of the demonstration confrontation playback data in the multi-agent strong confrontation simulation method provided by the present invention, the present invention firstly needs to obtain multiple rounds of demonstration confrontation playback data from the simulation engine. Alternatively, the confrontation playback data may be saved in a playback buffer.
As shown in fig. 4, for the structural diagram of data in the multi-agent strong confrontation simulation method provided by the present invention, each sample point in the playback buffer is data of one confrontation step, which includes a joint confrontation situation s and a presenter confrontation command list a.
The collected confrontation playback data can be generated by manual operation of a high-level human player or an automated confrontation rule program which is written by professional technicians and is highly optimized, and the confrontation playback records only need to be stored through a simulation engine without additional manual marking processing.
After the confrontation playback data is collected, the confrontation network generation technology can be adopted to train the neural network strategy model, so that the neural network strategy model learns the confrontation strategy adopted by the demonstrator.
S202, simulating a decision process of the multi-agent in the strong confrontation process by using the neural network strategy model, and completing the strong confrontation simulation of the multi-agent.
It can be understood that, after the training of the neural network strategy model is completed according to the above steps, the multi-agent strong confrontation simulation test can be performed, and the decision process of the multi-agent in the strong confrontation process can be simulated by using the neural network strategy model. The imitator in fig. 1 may be made to mimic the decision-making process of a presenter, such as may be utilized with an confrontational simulation engine.
It should be understood that for real-world complex, large-scale problems, a single agent is often unable to describe and solve, and therefore, multiple agents are often included in an application system. The agents not only have self problem solving capability and behavior targets, but also can cooperate with each other to achieve a common overall target, and the system becomes a MAS. MAS has the following properties: each with incomplete information or ability to solve the problem; the data is stored and processed in a scattered way, and a system-level data centralized processing structure is not available; interactivity inside the system and the encapsulation of the whole system; the computations are synchronized and therefore should be locked for some shared resources.
The multi-agent simulation adopts a system theory and a multi-agent system modeling method to establish a system high-level model, and uses a system calculation model established by a simulation software and hardware support technology based on an agent model to realize simulation.
According to the multi-agent strong confrontation simulation method provided by the invention, under the condition of historical experience data, the training speed of the multi-agent strong confrontation model can be accelerated by learning the historical data, so that the operation efficiency is effectively improved, and the calculation resources are effectively saved. In the military field, if the combat data of the virtual enemy exists, the method can quickly establish a confrontation model of the virtual enemy, simulate the operation mode behavior of the other party and be used for simulation training of commanding fighters by the party.
It should be understood that in Reinforcement Learning (RL) modeling studies, how agents (agents) interact with the environment to optimize targets will be studied. Reinforcement learning is then defined as a markov decision process, which is the theoretical basis for reinforcement learning.
Next, three main functions that an agent can learn are introduced:
strategy → value function → model
Reinforcement learning is related to solving sequential decision problems, and many real-world problems, such as video game play, sports, driving, etc., can be solved in this manner.
In solving these problems, there is an objective or purpose, such as winning a game, reaching the destination safely or minimizing the cost of manufacturing the product. By taking action and getting feedback from the world about the proximity to the target (current score, distance to destination or price per unit). Achieving a goal typically requires taking many actions in turn, each of which changes the surrounding world. These changes in the world and the feedback received are observed before deciding to take further action in response.
The reinforcement learning problem may be represented as a system consisting of agents and environments. The environment generates information describing the state of the system, which may be referred to as the state. The agent interacts with the environment by observing the state and using this information to select an operation. The environment accepts the action and transitions to the next state. It then returns the next status and reward to the agent. When the cycle of (state → action → reward) is completed, one step may be considered to be completed. This cycle is repeated until the environment is terminated (e.g., when the problem is resolved). Fig. 5 is a schematic diagram of the reinforcement learning control loop in the multi-agent strong confrontation simulation method provided by the present invention, in which the whole process of the loop is described.
Consider how to transition from one state to another using an environment called a transfer function. In reinforcement learning, the transition function is formulated as a Markov Decision Process (MDP), which is a mathematical framework that models sequential decisions. To understand how the transition function is expressed as an MDP, consider the following equation:
st+1~P(st+1|(s0,a0),(s1,a1),...,(st,at));
wherein at time step t the next state s is sampled from the probability distribution P conditioned on the entire historyt+1Ambient slave status stTransition to st+1Depending on all previous states s and actions a.
To make the context conversion function more practical, it is converted to MDP by adding the following assumptions: to the next state st+1Only depending on the previous oneA state stAnd operation atReferred to as Markov properties. Under this assumption, the new transfer function will become:
st+1~P(st+1|st,at);
the above formula represents the secondary probability distribution P(s)t+1|st,at) Middle sampling next state st+1. This is a simple form of the original conversion function. The markov property shows that the current state and action at time step t contain enough information to fully determine the transition probability of the next state at t + 1.
The concept of reinforcement learning is combined with deep neural network technology to produce a deep reinforcement learning (DQN) approach, i.e., to construct a deep neural network. As shown in fig. 6, a schematic diagram of a DQN behavior value function approximation network in the multi-agent robust simulation method provided by the present invention is shown. The input is an environment variable and the output is an action variable. And training the neural network by adopting the maximization of the return value as a target.
The multi-agent strong countermeasure simulation method provided according to the above embodiments is optional, and the neural network policy model includes a discrimination network and a policy network.
Wherein the discriminative network is configured to classify the input countermeasure data, and an output of the discriminative network is configured to indicate whether the input countermeasure data complies with a demonstration countermeasure policy; the policy network is used for reading the state data of the strong countermeasure process and generating the countermeasure policy to be adopted under the state data based on the state data.
It can be understood that, as shown in fig. 1, the neural network policy model of the present invention is composed of a discriminant network D and a policy network a. The input countermeasure data are classified by the judgment network D, scalar values between 0 and 1 are output, whether the input data meet the demonstration countermeasure strategy or not is judged, 0 is completely met, 1 is completely not met, and therefore the optimization goal of the judgment network D is to accurately judge all data as far as possible.
The aim of the policy network a, which reads the countermeasure situation (environment) data and generates the countermeasure commands to be taken in this situation, is to simulate the demonstration countermeasure as accurately as possible, and also to mean to fool the discrimination network D as possible into distinguishing whether the countermeasure data was generated by a demonstration player or by the policy network. Therefore, the discrimination network D and the strategy network a form a confrontation relationship, and the two networks are alternately trained, when the two networks reach equilibrium, the discrimination network D discriminates the demonstration confrontation data and the confrontation data generated by the strategy network with nearly equal probability (i.e. the difference between the two cannot be effectively discriminated, and ideally, the value is expected to be 0.5, which means that the discrimination network cannot be discriminated at all), and at this time, the strategy network a learns the confrontation strategy close to the demonstration player.
Thus, optionally, a processing procedure for training the neural network policy model is shown in fig. 7, which is a schematic flow chart of the method for training the neural network policy model in the multi-agent strong confrontation simulation method provided by the present invention, and includes the following processing steps:
(1) randomly sampling from a demonstration countermeasure playback cache to obtain batch samples;
(2) the batch sample comprises a joint confrontation situation and a presenter confrontation command list and can be directly used as a batch presenter sample;
(3) batch simulator sample generation comprising:
(3.1) obtaining a joint confrontation situation sample from the batch sample;
(3.2) inputting the joint confrontation situation samples into the strategy network A to generate output;
(3.3) generating a list of emulator confrontation commands from the policy network a output;
(3.4) combining the joint confrontation situation with the corresponding simulator confrontation command list to form a simulator lot sample;
(4) inputting the batch demonstrator sample and the batch simulator sample into a discrimination network D together, calculating a discrimination network D loss function and performing one round of optimization training on the discrimination network D;
(5) judging the imitator batch samples by using a judging network D to generate output;
(6) and calculating the loss of the strategy network A according to the judgment result of the judgment network D and performing one round of optimization training on the strategy network A.
Optionally, the discrimination network is specifically a binary classification neural network, the input of the binary classification neural network is tensor coding of a joint confrontation state and confrontation command list, and the output of the binary classification neural network is a binary classification scalar quantity within [0,1 ].
Specifically, the discrimination network D of the present invention may be a typical binary classification neural network, the input of the network is the tensor coding of the joint countermeasure situation + countermeasure command list, and the output is a binary classification scalar of 0 to 1.
Optionally, the network structure and the network scale of the binary classification neural network may be selected in consideration of characteristics of input data, for example, a convolutional network CNN or a multilayer perceptron MLP may be generally adopted, and the parameter dimension and the network depth may be adjusted and selected according to the number of input data attributes and the complexity of the association relationship.
Further, on the basis of the multi-agent strong confrontation simulation method provided by each of the above embodiments, before the training to obtain the neural network policy model, the method further includes:
determining a discriminant loss sum of the demonstration sample and the simulation sample as a loss of the discriminant network, wherein a loss function of the discriminant network is expressed as follows:
Dloss=Dloss-expert+Dloss-learner
in the formula, DlossRepresenting the loss of said discrimination network, Dloss-expertRepresenting the cross entropy, D, of the actual output and the expected output of the discriminative network on the presentation sampleloss-learnerRepresenting a cross entropy of an actual output and an expected output of the discrimination network on the mimic sample;
the goal of the discriminating network is to minimize the discriminating loss sum.
Specifically, the loss of the discrimination network D in the present invention is the discrimination loss sum of the demonstration sample and the simulation sample:
Dloss=Dloss-expert+Dloss-learner
wherein, the demonstration sampleCost determination loss Dloss-expertAnd Dloss-learnerThe cross entropy of the actual output and the expected output of the discrimination network D on the presentation samples and the dummy samples, respectively. The demonstration sample should be judged to be fully compliant with the demonstration countermeasure, so the expected output should be 0; the mock sample should be judged to be completely out of compliance with the demonstration countermeasure, so the expected output should be 1.
Further, on the basis of the multi-agent strong confrontation simulation method provided by each of the above embodiments, before determining the discrimination loss sum of the demonstration sample and the simulation sample as the loss of the discrimination network, the method further includes:
the cross entropy is calculated as follows:
l(x,y)=L={l1,...,ln,...,lN}T
ln=-wn[yn·logxn+(1-yn)·log(1-xn)];
in the formula, l (x, y) represents the cross entropy of the vectors x and y, and is defined as a vector { l ] composed of the cross entropy of each component of the vectors x and y1,...,ln,...,lN}T,lnAs corresponding components x of the vectors x, ynAnd ynCross entropy of (1), wnIs the weight of the component N, N being the dimension of the vector x, y.
Thus, the loss calculation function of the discrimination network D can be expressed as:
Dloss=BCELoss(D(ΠE),0)+BCELoss(D(ΠL),1);
wherein BCELoss is cross entropy, piEFor demonstration of samples, iiLTo mimic a sample.
The optimization goal of the discrimination network D is to minimize the overall discrimination loss.
Further, on the basis of the multi-agent strong confrontation simulation method provided by each of the above embodiments, before the training to obtain the neural network policy model, the method further includes:
determining a reward function for the policy network as follows:
Reward=-log(D(ΠL));
where Reward represents the return of the policy network, ΠLRepresenting said simulated sample, D (Π)L) Representing a cross entropy of an actual output and an expected output of the discrimination network on the mimic sample;
determining a goal of the policy network to maximize a reward of the policy network;
and/or determining a loss function of the policy network as follows:
Figure BDA0002777479480000121
in the formula, pd represents the confrontation command parameter probability distribution constructed by the parameters output by the policy network, action represents the command parameter value obtained by sampling the constructed probability distribution, log _ prob represents the log probability density of the probability distribution at the sample point of the action value, entrypy represents the entropy of the probability distribution, and beta represents a hyper-parameter.
Specifically, the structural design of the policy network a is similar to that of a policy network in reinforcement learning, a convolutional network CNN or a multilayer perceptron MLP and the like can be selected according to input and output characteristics for construction, and parameters such as input and output dimensions and network depth need to be selected and adjusted in consideration of simulation data characteristics.
The technical framework of the invention is applicable to different types of agents, denoted by subscript i, and different numbers of agents of the same type, denoted by subscript j.
Then, the reward calculation formula of the policy network a is:
Reward=-log(D(ΠL));
the optimization goal of policy network a is to maximize the payback.
The loss calculation function of the policy network a is:
Figure BDA0002777479480000122
pd is the probability distribution of the parameters of the countermeasure command constructed by the parameters output by the policy network A, the type of the probability distribution adopted by the pd can be selected according to the characteristics of the parameters, the discrete parameters such as the command type can adopt Categorical distribution and the like, and the continuous parameters such as coordinate points x and y can adopt Normal distribution and the like; action is a command parameter value obtained from the structured probability distribution sampling; log _ prob is the log probability density of the probability distribution at the sample point of the action value; entropy is the entropy of the probability distribution; beta is a hyper-parameter, controls the proportion of the maximum entropy target in the strategy network loss, and adjusts according to the training condition during training.
The multi-agent strong confrontation simulation method provided according to the above embodiments is optional, and the simulating a decision process of the multi-agent in the strong confrontation process by using the neural network policy model includes: constructing the countermeasure command parameter probability distribution based on the output of the policy network, and sampling and acquiring countermeasure command parameters from the countermeasure command parameter probability distribution; converting the countermeasure command parameters into a countermeasure command list according to an interface format required by the countermeasure simulation engine, and inputting the countermeasure command list into the countermeasure simulation engine.
Specifically, the policy network a of the present invention is similar to a policy network in reinforcement learning, and has the input of tensor coding of the joint countermeasure situation and the output of probability distribution parameters which can be used for constructing a countermeasure command list. The automatic countermeasure program constructs a countermeasure command parameter probability distribution pd according to the output of the policy network A, samples the pd to obtain countermeasure command parameters, and finally converts the countermeasure command parameters into a countermeasure command list according to an interface format required by the countermeasure simulation engine and inputs the countermeasure command list into the countermeasure simulation engine.
Based on the same inventive concept, the invention provides a multi-agent strong countermeasure simulation device according to the above embodiments, and the device is used for realizing the strong countermeasure simulation of the multi-agent in the above embodiments. Therefore, the description and definition in the multi-agent active confrontation simulation method in each embodiment above can be used for understanding each execution module of the multi-agent active confrontation simulation device in the present invention, and reference may be made to the above embodiment specifically, and details are not described herein.
According to an embodiment of the present invention, the structure of the multi-agent strong countermeasure simulation apparatus is shown in fig. 8, which is a schematic structural diagram of the multi-agent strong countermeasure simulation apparatus provided by the present invention, the apparatus can be used for the strong countermeasure simulation of the multi-agent, the apparatus includes: a training module 801 and a simulation module 802.
The training module 801 is configured to acquire multi-round demonstration countermeasure playback data from a countermeasure simulation engine, and train and acquire a neural network strategy model by using a generation countermeasure network technology based on the countermeasure playback data; the simulation module 802 is configured to simulate a decision process of the multi-agent in the strong countermeasure process by using the neural network policy model, so as to complete the strong countermeasure simulation of the multi-agent.
Specifically, the training module 801 first needs to obtain multiple rounds of demo confrontation playback data from the simulation engine. After the confrontation playback data is collected, the training module 801 may train the neural network strategy model by using the confrontation network generation technology, so that the neural network strategy model learns the confrontation strategy adopted by the presenter.
Then, the simulation module 802 can perform a multi-agent strong confrontation simulation test, and simulate the decision process of the multi-agent in the strong confrontation process by using the neural network policy model. The imitator in fig. 1 may be made to mimic the decision-making process of a presenter, such as may be utilized with an confrontational simulation engine.
The multi-agent strong-confrontation simulation device provided by the invention can accelerate the training speed of the multi-agent strong-confrontation model by learning historical data, thereby effectively improving the operation efficiency and effectively saving the computing resources.
Optionally, the neural network policy model includes a discriminant network and a policy network;
wherein the discriminative network is configured to classify the input countermeasure data, and an output of the discriminative network is configured to indicate whether the input countermeasure data complies with a demonstration countermeasure policy;
the policy network is used for reading the state data of the strong countermeasure process and generating the countermeasure policy to be adopted under the state data based on the state data.
Further, the training module is further configured to:
determining a discriminant loss sum of the demonstration sample and the simulation sample as a loss of the discriminant network, wherein a loss function of the discriminant network is expressed as follows:
Dloss=Dloss-expert+Dloss-learner
in the formula, DlossRepresenting the loss of said discrimination network, Dloss-expertRepresenting the cross entropy, D, of the actual output and the expected output of the discriminative network on the presentation sampleloss-learnerRepresenting a cross entropy of an actual output and an expected output of the discrimination network on the mimic sample;
the goal of the discriminating network is to minimize the discriminating loss sum.
Further, the training module is further configured to:
the cross entropy is calculated as follows:
l(x,y)=L={l1,...,ln,...,lN}T
ln=-wn[yn·logxn+(1-yn)·log(1-xn)];
in the formula, l (x, y) represents the cross entropy of the vectors x and y, and is defined as a vector { l ] composed of the cross entropy of each component of the vectors x and y1,...,ln,...,lN}T,lnAs corresponding components x of the vectors x, ynAnd ynCross entropy of (1), wnIs the weight of the component N, N being the dimension of the vector x, y.
Further, the training module is further configured to:
determining a reward function for the policy network as follows:
Reward=-log(D(∏L));
where Reward represents the return of the policy network, ΠLRepresenting said simulated sample, D (Π)L) Representing a cross entropy of an actual output and an expected output of the discrimination network on the mimic sample;
determining a goal of the policy network to maximize a reward of the policy network;
and/or determining a loss function of the policy network as follows:
Figure BDA0002777479480000151
in the formula, pd represents the confrontation command parameter probability distribution constructed by the parameters output by the policy network, action represents the command parameter value obtained by sampling the constructed probability distribution, log _ prob represents the log probability density of the probability distribution at the sample point of the action value, entrypy represents the entropy of the probability distribution, and beta represents a hyper-parameter.
Optionally, the simulation module is configured to:
constructing the countermeasure command parameter probability distribution based on the output of the policy network, and sampling and acquiring countermeasure command parameters from the countermeasure command parameter probability distribution;
converting the countermeasure command parameters into a countermeasure command list according to an interface format required by the countermeasure simulation engine, and inputting the countermeasure command list into the countermeasure simulation engine.
Optionally, the discrimination network is specifically a binary classification neural network, the input of the binary classification neural network is tensor coding of a joint confrontation state and confrontation command list, and the output of the binary classification neural network is a binary classification scalar quantity within [0,1 ].
It is understood that the relevant program modules in the devices of the above embodiments can be implemented by a hardware processor (hardware processor) in the present invention. Moreover, the multi-agent strong countermeasure simulation apparatus of the present invention can implement the multi-agent strong countermeasure simulation flow of each of the above method embodiments by using the above program modules, and when used for implementing the strong countermeasure simulation of the multi-agent in each of the above method embodiments, the apparatus of the present invention produces the same beneficial effects as those of the corresponding above method embodiments, and reference may be made to the above method embodiments, and details thereof are not repeated here.
As a further aspect of the present invention, the present invention provides an electronic device according to the above embodiments, the electronic device includes a memory, a processor and a program or instructions stored in the memory and executable on the processor, and the processor executes the program or instructions to implement the steps of the multi-agent robust simulation method according to the above embodiments.
Further, the electronic device of the present invention may further include a communication interface and a bus. Referring to fig. 9, a schematic structural diagram of an electronic device provided in the present invention includes: at least one memory 901, at least one processor 902, a communication interface 903, and a bus 904.
Wherein, the memory 901, the processor 902 and the communication interface 903 are communicated with each other through the bus 904, and the communication interface 903 is used for information transmission between the electronic equipment and the countermeasure data equipment; the memory 901 stores a program or instructions that can be executed on the processor 902, and when the processor 902 executes the program or instructions, the steps of the multi-agent warfare simulation method as described in the above embodiments are implemented.
It is understood that the electronic device at least comprises a memory 901, a processor 902, a communication interface 903 and a bus 904, and the memory 901, the processor 902 and the communication interface 903 form a communication connection with each other through the bus 904, and can complete the communication with each other, for example, the processor 902 reads program instructions of the multi-agent robust simulation method from the memory 901. In addition, the communication interface 903 can also realize communication connection between the electronic device and the countermeasure data device, and can complete mutual information transmission, such as reading of the countermeasure playback data through the communication interface 903.
When the electronic device is running, the processor 902 invokes the program instructions in the memory 901 to perform the methods provided by the above-mentioned method embodiments, for example, including: acquiring multi-round demonstration confrontation playback data from a confrontation simulation engine, and training and acquiring a neural network strategy model by adopting a confrontation network generation technology based on the confrontation playback data; and simulating the decision process of the multi-agent in the strong confrontation process by using the neural network strategy model to complete the strong confrontation simulation of the multi-agent and the like.
The program instructions in the memory 901 may be implemented in the form of software functional units and stored in a computer readable storage medium when the program instructions are sold or used as independent products. Alternatively, all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, where the program may be stored in a computer-readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The present invention also provides a non-transitory computer readable storage medium according to the above embodiments, on which a program or instructions are stored, the program or instructions, when executed by a computer, implement the steps of the multi-agent strong confrontation simulation method according to the above embodiments, for example, comprising: acquiring multi-round demonstration confrontation playback data from a confrontation simulation engine, and training and acquiring a neural network strategy model by adopting a confrontation network generation technology based on the confrontation playback data; and simulating the decision process of the multi-agent in the strong confrontation process by using the neural network strategy model to complete the strong confrontation simulation of the multi-agent and the like.
As a further aspect of the present invention, the present invention also provides a computer program product according to the above embodiments, the computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which when executed by a computer, enable the computer to perform the multi-agent strong countermeasure simulation method provided by the above method embodiments, the method comprising: acquiring multi-round demonstration confrontation playback data from a confrontation simulation engine, and training and acquiring a neural network strategy model by adopting a confrontation network generation technology based on the confrontation playback data; and simulating the decision process of the multi-agent in the strong confrontation process by using the neural network strategy model to complete the strong confrontation simulation of the multi-agent.
According to the electronic device, the non-transitory computer readable storage medium and the computer program product provided by the invention, by executing the steps of the multi-agent strong confrontation simulation method described in each embodiment, the training speed of the multi-agent strong confrontation model can be accelerated by learning historical data, so that the operation efficiency is effectively improved, and the calculation resources are effectively saved.
It is to be understood that the above-described embodiments of the apparatus, the electronic device and the storage medium are merely illustrative, and that elements described as separate components may or may not be physically separate, may be located in one place, or may be distributed on different network elements. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the technical solutions mentioned above may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a usb disk, a removable hard disk, a ROM, a RAM, a magnetic or optical disk, etc., and includes several instructions for causing a computer device (such as a personal computer, a server, or a network device, etc.) to execute the methods described in the method embodiments or some parts of the method embodiments.
In addition, it should be understood by those skilled in the art that the terms "comprises," "comprising," or any other variation thereof, in the specification of the present invention, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The term "comprising", without further limitation, means that the element so defined is not excluded from the group consisting of additional identical elements in the process, method, article, or apparatus that comprises the element.
In the description of the present invention, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description. Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A multi-agent strong confrontation simulation method is characterized by comprising the following steps:
acquiring multi-round demonstration confrontation playback data from a confrontation simulation engine, and training and acquiring a neural network strategy model by adopting a confrontation network generation technology based on the confrontation playback data;
and simulating the decision process of the multi-agent in the strong confrontation process by using the neural network strategy model to complete the strong confrontation simulation of the multi-agent.
2. The multi-agent strong countermeasure simulation method of claim 1, wherein the neural network policy model comprises a discrimination network and a policy network;
wherein the discriminative network is configured to classify the input countermeasure data, and an output of the discriminative network is configured to indicate whether the input countermeasure data complies with a demonstration countermeasure policy;
the policy network is used for reading the state data of the strong countermeasure process and generating the countermeasure policy to be adopted under the state data based on the state data.
3. The multi-agent strong countermeasure simulation method of claim 2, further comprising, prior to the training to obtain a neural network policy model:
determining a discriminant loss sum of the demonstration sample and the simulation sample as a loss of the discriminant network, wherein a loss function of the discriminant network is expressed as follows:
Dloss=Dloss-expert+Dloss-learner
in the formula, DlossRepresenting the loss of said discrimination network, Dloss-expertRepresenting the cross entropy, D, of the actual output and the expected output of the discriminative network on the presentation sampleloss-learnerRepresenting a cross entropy of an actual output and an expected output of the discrimination network on the mimic sample;
the goal of the discriminating network is to minimize the discriminating loss sum.
4. The multi-agent strong countermeasure simulation method of claim 3, further comprising, before the determining the discriminant loss sum of the demonstration sample and the mock sample as the loss of the discriminant network:
the cross entropy is calculated as follows:
l(x,y)=L={l1,...,ln,...,lN}T
ln=-wn[yn·logxn+(1-vn)·log(1-xn)];
in the formula, l (x, y) represents the cross entropy of the vectors x and y, and is defined as a vector { l ] composed of the cross entropy of each component of the vectors x and y1,...,ln,...,lN}T,lnAs corresponding components x of the vectors x, ynAnd ynCross entropy of (1), wnIs the weight of the component N, N being the dimension of the vector x, y.
5. The multi-agent strong countermeasure simulation method of claim 3 or 4, further comprising, before the training to obtain a neural network strategy model:
determining a reward function for the policy network as follows:
Reward=-log(D(ΠL));
wherein Reward represents the Reward, II, of said policy networkLRepresenting said mimic sample, D (II)L) Representing a cross entropy of an actual output and an expected output of the discrimination network on the mimic sample;
determining a goal of the policy network to maximize a reward of the policy network;
and/or determining a loss function of the policy network as follows:
Figure FDA0002777479470000021
in the formula, pd represents the confrontation command parameter probability distribution constructed by the parameters output by the policy network, action represents the command parameter value obtained by sampling the constructed probability distribution, log _ prob represents the log probability density of the probability distribution at the sample point of the action value, entrypy represents the entropy of the probability distribution, and beta represents a hyper-parameter.
6. The multi-agent strong confrontation simulation method according to claim 5, wherein the simulating the decision process of the multi-agent in the strong confrontation process by using the neural network strategy model comprises:
constructing the countermeasure command parameter probability distribution based on the output of the policy network, and sampling and acquiring countermeasure command parameters from the countermeasure command parameter probability distribution;
converting the countermeasure command parameters into a countermeasure command list according to an interface format required by the countermeasure simulation engine, and inputting the countermeasure command list into the countermeasure simulation engine.
7. The multi-agent strong countermeasure simulation method of claim 2, wherein the discrimination network is specifically a binary classification neural network, the input of the binary classification neural network is tensor coding of the joint countermeasure state and countermeasure command list, and the output of the binary classification neural network is a binary classification scalar within [0,1 ].
8. A multi-agent strong confrontation simulation device, comprising:
the training module is used for acquiring multi-round demonstration confrontation playback data from the confrontation simulation engine and training and acquiring a neural network strategy model by adopting a confrontation network generation technology based on the confrontation playback data;
and the simulation module is used for simulating the decision process of the multi-agent in the strong confrontation process by utilizing the neural network strategy model so as to complete the strong confrontation simulation of the multi-agent.
9. An electronic device comprising a memory, a processor and a program or instructions stored on the memory and executable on the processor, wherein the processor when executing the program or instructions implements the steps of the multi-agent strong challenge simulation method of any of claims 1 to 7.
10. A non-transitory computer readable storage medium having stored thereon a program or instructions, wherein the program or instructions, when executed by a computer, implement the steps of the multi-agent warfare simulation method of any one of claims 1 to 7.
CN202011270335.7A 2020-11-13 2020-11-13 Multi-agent strong countermeasure simulation method and device and electronic equipment Pending CN112434791A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011270335.7A CN112434791A (en) 2020-11-13 2020-11-13 Multi-agent strong countermeasure simulation method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011270335.7A CN112434791A (en) 2020-11-13 2020-11-13 Multi-agent strong countermeasure simulation method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN112434791A true CN112434791A (en) 2021-03-02

Family

ID=74701309

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011270335.7A Pending CN112434791A (en) 2020-11-13 2020-11-13 Multi-agent strong countermeasure simulation method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN112434791A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298260A (en) * 2021-06-11 2021-08-24 中国人民解放军国防科技大学 Confrontation simulation deduction method based on deep reinforcement learning
CN113894780A (en) * 2021-09-27 2022-01-07 中国科学院自动化研究所 Multi-robot cooperative countermeasure method and device, electronic equipment and storage medium
CN114254722A (en) * 2021-11-17 2022-03-29 中国人民解放军军事科学院国防科技创新研究院 Game countermeasure oriented multi-intelligent model fusion method
CN114996856A (en) * 2022-06-27 2022-09-02 北京鼎成智造科技有限公司 Data processing method and device for airplane intelligent agent maneuver decision

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1659468A2 (en) * 2004-11-16 2006-05-24 Rockwell Automation Technologies, Inc. Universal run-time interface for agent-based simulation and control systems
US20070016464A1 (en) * 2004-07-16 2007-01-18 John Yen Agent-based collaborative recognition-primed decision-making
CN101964019A (en) * 2010-09-10 2011-02-02 北京航空航天大学 Against behavior modeling simulation platform and method based on Agent technology
CN108764453A (en) * 2018-06-08 2018-11-06 中国科学技术大学 The modeling method and action prediction system of game are synchronized towards multiple agent
CN109598342A (en) * 2018-11-23 2019-04-09 中国运载火箭技术研究院 A kind of decision networks model is from game training method and system
US20200090042A1 (en) * 2017-05-19 2020-03-19 Deepmind Technologies Limited Data efficient imitation of diverse behaviors
CN111507880A (en) * 2020-04-18 2020-08-07 郑州大学 Crowd confrontation simulation method based on emotional infection and deep reinforcement learning
CN111767786A (en) * 2020-05-11 2020-10-13 北京航空航天大学 Anti-attack method and device based on three-dimensional dynamic interaction scene

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070016464A1 (en) * 2004-07-16 2007-01-18 John Yen Agent-based collaborative recognition-primed decision-making
EP1659468A2 (en) * 2004-11-16 2006-05-24 Rockwell Automation Technologies, Inc. Universal run-time interface for agent-based simulation and control systems
CN101964019A (en) * 2010-09-10 2011-02-02 北京航空航天大学 Against behavior modeling simulation platform and method based on Agent technology
US20200090042A1 (en) * 2017-05-19 2020-03-19 Deepmind Technologies Limited Data efficient imitation of diverse behaviors
CN108764453A (en) * 2018-06-08 2018-11-06 中国科学技术大学 The modeling method and action prediction system of game are synchronized towards multiple agent
CN109598342A (en) * 2018-11-23 2019-04-09 中国运载火箭技术研究院 A kind of decision networks model is from game training method and system
CN111507880A (en) * 2020-04-18 2020-08-07 郑州大学 Crowd confrontation simulation method based on emotional infection and deep reinforcement learning
CN111767786A (en) * 2020-05-11 2020-10-13 北京航空航天大学 Anti-attack method and device based on three-dimensional dynamic interaction scene

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
XINYAN GAN 等: ""Multi-Agent Based Hybrid Evolutionary Algorithm"", 2011 SEVENTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION *
孙旭朋 等: ""基于失效物理的水下航行器电子设备可靠性预计和近似建模方法"", 《环境技术》, no. 5 *
谭浪: ""强化学习在多智能体对抗中的应用研究"", 《中国优秀硕士论文期刊全文数据库(信息科技辑)》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298260A (en) * 2021-06-11 2021-08-24 中国人民解放军国防科技大学 Confrontation simulation deduction method based on deep reinforcement learning
CN113894780A (en) * 2021-09-27 2022-01-07 中国科学院自动化研究所 Multi-robot cooperative countermeasure method and device, electronic equipment and storage medium
CN114254722A (en) * 2021-11-17 2022-03-29 中国人民解放军军事科学院国防科技创新研究院 Game countermeasure oriented multi-intelligent model fusion method
CN114996856A (en) * 2022-06-27 2022-09-02 北京鼎成智造科技有限公司 Data processing method and device for airplane intelligent agent maneuver decision

Similar Documents

Publication Publication Date Title
CN112434791A (en) Multi-agent strong countermeasure simulation method and device and electronic equipment
US11779837B2 (en) Method, apparatus, and device for scheduling virtual objects in virtual environment
CN112131786B (en) Target detection and distribution method and device based on multi-agent reinforcement learning
Noothigattu et al. Interpretable multi-objective reinforcement learning through policy orchestration
CN112884131A (en) Deep reinforcement learning strategy optimization defense method and device based on simulation learning
CN104102522B (en) The artificial emotion driving method of intelligent non-player roles in interactive entertainment
CN111856925B (en) State trajectory-based confrontation type imitation learning method and device
CN113408621B (en) Rapid simulation learning method, system and equipment for robot skill learning
O'Dowd et al. The distributed co-evolution of an embodied simulator and controller for swarm robot behaviours
CN113627596A (en) Multi-agent confrontation method and system based on dynamic graph neural network
Ugur et al. Refining discovered symbols with multi-step interaction experience
Tang et al. A review of computational intelligence for StarCraft AI
CN114290339B (en) Robot realistic migration method based on reinforcement learning and residual modeling
CN116841317A (en) Unmanned aerial vehicle cluster collaborative countermeasure method based on graph attention reinforcement learning
Lam et al. A simheuristic approach for evolving agent behaviour in the exploration for novel combat tactics
CN113230650B (en) Data processing method and device and computer readable storage medium
Cordeiro et al. A minimal training strategy to play flappy bird indefinitely with NEAT
WO2023038605A1 (en) Autonomous virtual entities continuously learning from experience
CN115212549A (en) Adversary model construction method under confrontation scene and storage medium
Wu et al. Analysis of strategy in robot soccer game
CN112044082A (en) Information detection method and device and computer readable storage medium
Tanskanen et al. Modeling Risky Choices in Unknown Environments
CN113239634B (en) Simulator modeling method based on robust simulation learning
CN112870716B (en) Game data processing method and device, storage medium and electronic equipment
CN117648585B (en) Intelligent decision model generalization method and device based on task similarity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination