CN112717415A - Information bottleneck theory-based AI (Artificial intelligence) training method for reinforcement learning battle game - Google Patents

Information bottleneck theory-based AI (Artificial intelligence) training method for reinforcement learning battle game Download PDF

Info

Publication number
CN112717415A
CN112717415A CN202110091260.4A CN202110091260A CN112717415A CN 112717415 A CN112717415 A CN 112717415A CN 202110091260 A CN202110091260 A CN 202110091260A CN 112717415 A CN112717415 A CN 112717415A
Authority
CN
China
Prior art keywords
training
model
game
reinforcement learning
parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110091260.4A
Other languages
Chinese (zh)
Other versions
CN112717415B (en
Inventor
张轶飞
程帆
张冬梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202110091260.4A priority Critical patent/CN112717415B/en
Publication of CN112717415A publication Critical patent/CN112717415A/en
Application granted granted Critical
Publication of CN112717415B publication Critical patent/CN112717415B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/60Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
    • A63F13/67Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor adaptively or by learning from player actions, e.g. skill level adjustment or by storing successful combat sequences for re-use
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to an AI training method of a reinforcement learning battle game based on an information bottleneck theory, which comprises the following steps: 1) initializing an AI training model; 2) carrying out decision interaction in a simulation environment through a game AI to obtain a sample training batch data set; 3) according to a sample training batch data set obtained by interaction of the game AI and the environment, iteratively training an AI training model by adopting a reinforcement learning algorithm, and storing parameters of the AI training model in stages; 4) and fixing part of the stored parameters of the AI training models in different stages, and retraining the rest parameters by using a reinforcement learning algorithm for fine adjustment to obtain the final AI training models of different levels of AI, thereby generating the fighting game AI file. Compared with the prior art, the method has the advantages of high sampling efficiency, high training speed, high testing flexibility, AI grading and the like.

Description

Information bottleneck theory-based AI (Artificial intelligence) training method for reinforcement learning battle game
Technical Field
The invention relates to the field of game intelligent AI learning, in particular to a reinforcement learning battle game AI training method based on an information bottleneck theory.
Background
With the development of deep learning technology in recent years, many achievements are obtained in the field of deep reinforcement learning, and more methods (such as DQN, A2C, PPO, DDPG, and the like) combining deep learning and reinforcement learning algorithms show strong effects in the aspect of video games AI, however, in many cases, in the reinforcement learning problem, the interaction cost of an agent and the environment is high, so it is desirable to make the algorithms converge as fast as possible, so as to save the training cost, that is, a higher-level intelligent strategy is learned through the same sampling rate.
In the existing fighting game, a man-machine fighting mode is one of important components of the game, the existing game AI is designed by artificially setting strategy distribution and targeted action mapping, so that the course is single and the flexibility of fighting among players is not provided, and meanwhile, in the existing method for training the game AI by reinforcement learning, the original pixels are used as input, and a lot of redundant information is carried to influence the network learning efficiency and the speed of a reinforcement learning algorithm. In the deep learning experiment, the neural network firstly remembers the input by the mutual Information of the variables of the input layer and the representation layer in the training Process, and then compresses the input Information according to a specific learning task to discard useless redundant Information, namely, reduce the mutual Information between the input layer and the representation layer, which is the Information E-C Process, but the existing reinforcement learning algorithm is not optimized for the Information extraction Process.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide an AI training method for a reinforcement learning battle game based on an information bottleneck theory.
The purpose of the invention can be realized by the following technical scheme:
an AI training method for a reinforcement learning battle game based on an information bottleneck theory comprises the following steps:
1) initializing an AI training model;
2) carrying out decision interaction in a simulation environment through a game AI to obtain a sample training batch data set;
3) according to a sample training batch data set obtained by interaction of the game AI and the environment, iteratively training an AI training model by adopting a reinforcement learning algorithm, and storing parameters of the AI training model in stages;
4) and fixing part of the stored parameters of the AI training models in different stages, and retraining the rest parameters by using a reinforcement learning algorithm for fine adjustment to obtain the final AI training models of different levels of AI, thereby generating the fighting game AI file.
The step 1) specifically comprises the following steps:
11) determining an AI optional operation set A and the number n of AI with different required capability levels according to the game operation description;
12) initializing all network parameters in the model, including a value network model parameter theta and a strategy network model parameter phi;
13) determining hyper-parameters beta and eta according to the resolution of the game picture;
14) setting a model learning rate E;
15) the number of samples m sampled from the probability distribution model is set.
The step 2) specifically comprises the following steps:
21) sampling environmental sample batch data from environment
Figure BDA0002912635500000021
Wherein, Xt,Xt+1A game picture sampled at the current time t and a game picture at the next time t +1, AtOptional set of operations for game AI at the current time t, RtK is the total number of sampling moments in order to select the environment reward corresponding to the operation, namely the game integral and the real-time attribute of the game role;
22) for each sampled game frame
Figure BDA0002912635500000022
The game AI obtains m operation samples according to the strategy network model
Figure BDA0002912635500000023
Wherein the content of the first and second substances,
Figure BDA0002912635500000024
for AI real-time operations sampled in the distribution according to a strategic network model, Pφ(.|Xt) Is a probability distribution model;
23) batch data of environmental samples
Figure BDA0002912635500000025
And obtaining m operation samples by AI according to the strategy network model
Figure BDA0002912635500000026
Corresponding integration is carried out to obtain a sample training batch data set
Figure BDA0002912635500000027
In the step 3), an AI training model is iteratively trained by adopting an A2C algorithm.
In the step 3), when the model is iteratively trained, the gradient of the model representation layer is calculated by the following formula:
Figure BDA0002912635500000028
wherein the content of the first and second substances,
Figure BDA0002912635500000029
is the gradient of the model representation layer, P (X) is the distribution probability of the game picture X, phi (Z)iX) AI operation Z when the game screen obtained by the strategy network model is XiE represents expectation.
In the step 3), when the model is iteratively trained, the gradient of the reinforcement learning algorithm is calculated by the following formula:
Figure BDA0002912635500000031
wherein the content of the first and second substances,
Figure BDA0002912635500000032
to strengthen the gradient of the learning algorithm, J (Z; θ) is the loss function of the A2C algorithm that adds an information bottleneck loss term.
The expression of the loss function J (Z; theta) is as follows:
Figure BDA0002912635500000033
Figure BDA0002912635500000034
wherein the content of the first and second substances,
Figure BDA0002912635500000035
for the existing loss function in the framework of the A2C algorithm, R is the real-time prize value, i.e., the real-time change of the game points and the game character attributes, alpha is the prize attenuation coefficient in the A2C algorithm, and thetaΦ(ZtX) is a state decision pair (Z) estimated using the value network model when the policy network model is phitX) real-time value estimation, H (P)φ(at|Xt(ii) a θ)) is P distribution Pφ(at|Xt(ii) a θ) entropy.
In the step 3), when the model is iteratively trained by adopting a reinforcement learning algorithm, the network parameters theta and phi are respectively updated, whether the network parameters are converged is judged, if the network parameters are not converged, the training is carried out again, if the network parameters are converged, the training is stopped and the model is saved in stages, wherein the updating expression of the network parameters theta and phi is as follows:
Figure BDA0002912635500000036
Figure BDA0002912635500000037
the step 4) specifically comprises the following steps:
41) taking out n-1 intermediate models M stored in the AI training process according to the required AI grade number nj(j ═ 1,2, …, n-1), each intermediate model comprising a value network model and a policy network model;
42) the intermediate model MjParameter fixing, re-taking andtraining in the same way in the AI training process and updating the middle model MjThe parameters of the middle full-connection layer part are converged, and the converged model parameters are stored;
43) the n-1 middle models M retrained in the step 42) are used1,M2,…,Mn-1And the final convergence model M obtained in the initial trainingnThe strategy network model in the game is taken out independently, and n game AI strategy distribution files F corresponding to different capability levels are generated according to the strategy network model1,F2,…,Fn
44) Distributing files F of game AI strategy1,F2,…,FnThe real-time operation strategy is used as a real-time operation strategy of the fighting AI with n different capability levels, and is merged with other code files to construct the fighting AI with n different capability levels.
In the step 3), when an AI training model is iteratively trained by using an A2C algorithm, a monte carlo estimation method is adopted for fitting the expectation, and a Stein variation gradient descent method is adopted for gradient update.
Compared with the prior art, the invention has the following advantages:
firstly, the existing strategy gradient algorithms such as PPO, A2C and DDPG focus the visual field on the part of the convergence of the reinforcement learning algorithm, and the problem of information extraction from the environment state to the part of a cost function is not considered.
In the invention, the optimized gradient is obtained by adopting a Stein variation gradient descent method, and the problem that the probability distribution model of the representation layer under the condition of the input layer cannot be calculated in the information bottleneck problem is solved by utilizing the lower bound to normalize the unknown distribution.
Compared with the traditional manually designed AI, the fighting game AI designed by the invention has more possibly generated fighting strategies according to the real-time dynamic state of the game and more flexibility in actual test.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention.
FIG. 2 is a diagram of the model architecture of the present invention.
FIG. 3 is a block diagram of an AI training model according to the present invention.
Fig. 4 shows a specific embodiment of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.
Examples
As shown in fig. 1, the invention provides an AI training method for a reinforcement learning battle game based on an information bottleneck theory, comprising the following steps:
1) initializing network parameters and hyper-parameters of an AI training model (in this example, a CNN model is adopted, and the specific model structure is shown in FIG. 3), and setting learning rate and the number of samples sampled from parameter distribution;
2) performing decision interaction in a simulation environment through AI to obtain a sample training batch data set;
3) iteratively training an AI training model by adopting a reinforcement learning algorithm (in the example, an A2C algorithm) on the basis of a sample training batch data set obtained by the interaction of AI and the environment, and storing model parameters in stages;
4) and fixing part of the stored parameters of the models in different stages, and retraining the rest parameters by using a reinforcement learning algorithm for fine adjustment to obtain final strategy models of different levels of AI.
The specific process of each step is as follows:
the step 1) specifically comprises the following steps:
11) determining an AI optional operation set A (taking a street game as an example, specifically comprising operations of up-down, left-right movement, attack operation, defense operation, skill recruitment and the like in the street game) and the AI number n of required different capability levels according to the game operation description;
12) initializing all corresponding network parameters theta and phi in a value network model theta and a strategy network model phi;
13) determining a hyper-parameter beta, eta according to the resolution of the game picture;
14) setting a model learning rate E;
15) the number of samples m sampled from the probability distribution model is set.
The step 2) specifically comprises the following steps:
21) a batch of data is sampled from the environment:
Figure BDA0002912635500000051
wherein Xt,Xt+1Respectively representing the game picture at that moment and the game picture at the next moment, AtRepresenting the selection operation set of the game AI (including the operations of up, down, left and right movement, attack operation, defense operation and skill recruitment operation in the street game), RtRepresenting environmental rewards corresponding to the state behaviors (including life value change, magic value change, distance from opponents, skill cooling time, game real-time scores and the like).
22) Real-time sampling of each game frame
Figure BDA0002912635500000052
The game AI takes m samples according to the strategy network model
Figure BDA0002912635500000053
Wherein
Figure BDA0002912635500000054
The AI sampled in the distribution according to the strategy network model is operated in real time (including operations of how to move, whether to defend, how to attack and the like).
23) Batch data of environmental samples
Figure BDA0002912635500000055
And data obtained by AI sampling
Figure BDA0002912635500000056
Corresponding integration is carried out to obtain training batch data
Figure BDA0002912635500000057
Calculating the gradient of the representation layer in the sample training batch data D in the step 3) as follows:
Figure BDA0002912635500000058
wherein Z isiRepresenting agent slave probabilistic model Pφ(.|Xt) The real-time operation sample obtained by sampling, P (X) is the game picture distribution probability, phi (Z)iX) AI operation Z when the game screen obtained by the strategy network model is XiThe probability of (c).
The gradient of the reinforcement learning algorithm is:
Figure BDA0002912635500000061
wherein Z isiRepresenting agent slave probabilistic model Pφ(.|Xt) The real-time operating sample obtained from the middle sampling, J (Z; θ) is the loss function of the A2C algorithm that adds the information bottleneck loss term:
Figure BDA0002912635500000062
wherein
Figure BDA0002912635500000063
For the common loss function in the framework of the A2C algorithm:
Figure BDA0002912635500000064
where R is the real-time award value (includingThe specific expressions of the life value change condition delta HP, the magic value change condition delta MP, the distance d between the magic value change condition delta MP and the opponent, the skill cooling time and the like, the game real-time Score Score and the like can be designed according to requirements, and an example is given here: r ═ Δ HP + Δ MP + Δ HP + Score), α is the reward attenuation coefficient in the algorithm, and can be adjusted automatically according to different design requirements, Θ isΦ(ZtX) is a state decision pair (Z) estimated using the value network model when the policy network model is phitAnd X) real-time value estimation, wherein H (P) is the entropy of P distribution, in the whole calculation process, fitting the expected E by using a Monte Carlo estimation method, and performing gradient optimization calculation by using a Stein variation gradient descent method.
The A2C algorithm adopted in this embodiment is mainly implemented by a Pytorch, and a specific AI training model architecture is shown in fig. 2.
The step of specifically updating the model parameters comprises:
31) updating the model network parameter phi according to the following updating principle:
Figure BDA0002912635500000065
32) updating the model network parameter phi according to the following updating principle:
Figure BDA0002912635500000066
33) and judging whether the model parameters are converged according to the parameter updating range, if not, re-training from the step 2, and if the convergence is reached, stopping training and storing the model, and particularly, storing all network model parameters in a time sequence every 100 times of updating the parameters in the whole training process.
And 4) fixing part of stored parameters of the models in different stages, and retraining the rest parameters by using a reinforcement learning algorithm for fine adjustment to obtain final strategy models of different levels of AI.
The method specifically comprises the following steps:
41) taking out n-1 intermediate models M stored in the AI training process according to the required AI grade number ni(i-1, 2, …, n-1) M hereiniSimultaneously, the method comprises a value network model and a strategy network model;
42) model MiThe convolution layer parameters are fixed, and the model M is updated by adopting the same training mode in the AI training process againiThe parameters of the middle full-connection layer part are up to convergence, and the converged model parameters are stored;
43) the n-1 models M retrained in 42) are added1,M2,…,Mn-1And the final convergence model M obtained in the initial trainingnThe strategy network model in the game is taken out independently, and n game AI strategy distribution files F corresponding to different capability levels can be obtained1,F2,…,Fn
44) Distributing policy to file F1,F2,…,FnThe real-time operation strategy of the fighting AI with n different capability levels is merged with other code files to construct n fighting AI with different capability levels.
According to the method, by introducing an information bottleneck theory, a mutual information penalty item is added into a loss function of an A2C algorithm to accelerate an AI training process, and meanwhile, the output strategies of AI of different grades are re-smoothed by using the technical means of pre-training and fine-tuning model parameters. Compared with the prior art, the invention has the effects of high sampling efficiency in the training process, further accelerating the training of the fighting games AI with different capability grades and enough flexibility.
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A reinforcement learning fighting game AI training method based on an information bottleneck theory is characterized by comprising the following steps:
1) initializing an AI training model;
2) carrying out decision interaction in a simulation environment through a game AI to obtain a sample training batch data set;
3) according to a sample training batch data set obtained by interaction of the game AI and the environment, iteratively training an AI training model by adopting a reinforcement learning algorithm, and storing parameters of the AI training model in stages;
4) and fixing part of the stored parameters of the AI training models in different stages, and retraining the rest parameters by using a reinforcement learning algorithm for fine adjustment to obtain the final AI training models of different levels of AI, thereby generating the fighting game AI file.
2. The AI training method for the reinforcement learning battle game based on the information bottleneck theory as claimed in claim 1, wherein the step 1) specifically comprises the following steps:
11) determining an AI optional operation set A and the number n of AI with different required capability levels according to the game operation description;
12) initializing all network parameters in the model, including a value network model parameter theta and a strategy network model parameter phi;
13) determining hyper-parameters beta and eta according to the resolution of the game picture;
14) setting a model learning rate E;
15) the number of samples m sampled from the probability distribution model is set.
3. The AI training method for the reinforcement learning battle game based on the information bottleneck theory as claimed in claim 2, wherein the step 2) specifically comprises the following steps:
21) sampling environmental sample batch data from environment
Figure FDA0002912635490000011
Wherein, Xt,Xt+1A game picture sampled at the current time t and a game picture at the next time t +1, AtFor the current time t tripOpal AI optional operation set, RtK is the total number of sampling moments in order to select the environment reward corresponding to the operation, namely the game integral and the real-time attribute of the game role;
22) for each sampled game frame
Figure FDA0002912635490000012
The game AI obtains m operation samples according to the strategy network model
Figure FDA0002912635490000013
Wherein the content of the first and second substances,
Figure FDA0002912635490000014
for AI real-time operations sampled in the distribution according to a strategic network model, Pφ(.|Xt) Is a probability distribution model;
23) batch data of environmental samples
Figure FDA0002912635490000021
And obtaining m operation samples by AI according to the strategy network model
Figure FDA0002912635490000022
Corresponding integration is carried out to obtain a sample training batch data set
Figure FDA0002912635490000023
4. The AI training method for reinforcement learning battle game based on information bottleneck theory as claimed in claim 1, wherein in the step 3), the AI training model is iteratively trained by using A2C algorithm.
5. The AI training method for an enhancedented learning fighting game based on information bottleneck theory according to claim 4, wherein in the step 3), when iteratively training the model, the gradient of the model representation layer is calculated as:
Figure FDA0002912635490000024
wherein the content of the first and second substances,
Figure FDA0002912635490000025
is the gradient of the model representation layer, P (X) is the distribution probability of the game picture X, phi (Z)iX) AI operation Z when the game screen obtained by the strategy network model is XiE represents expectation.
6. The AI training method for an reinforcement learning battle game based on an information bottleneck theory according to claim 5, wherein in the step 3), when the model is iteratively trained, the gradient of the reinforcement learning algorithm is calculated as:
Figure FDA0002912635490000026
wherein the content of the first and second substances,
Figure FDA0002912635490000027
to strengthen the gradient of the learning algorithm, J (Z; θ) is the loss function of the A2C algorithm that adds an information bottleneck loss term.
7. The AI training method of an information bottleneck theory based reinforcement learning battle game as claimed in claim 6, wherein the loss function J (Z; θ) is expressed as:
Figure FDA0002912635490000028
wherein the content of the first and second substances,
Figure FDA0002912635490000029
for impairments in the framework of the A2C algorithmThe loss function, R is the real-time prize value, i.e. the real-time change of the game credits and the game character attributes, alpha is the prize attenuation coefficient in the A2C algorithm, thetaΦ(ZtX) is a state decision pair (Z) estimated using the value network model when the policy network model is phitX) real-time value estimation.
8. The AI training method for the reinforcement learning battle game based on the information bottleneck theory as claimed in claim 7, wherein in the step 3), when the reinforcement learning algorithm is adopted to iteratively train the model, the network parameters θ and φ are respectively updated, whether the network parameters converge is judged, if the network parameters do not converge, the training is performed again, if the convergence is reached, the training is stopped and the model is saved in stages, and the network parameters θ and φ are updated according to the following expression:
Figure FDA0002912635490000031
Figure FDA0002912635490000032
9. the AI training method for the reinforcement learning battle game based on the information bottleneck theory as claimed in claim 1, wherein the step 4) specifically comprises the following steps:
41) taking out n-1 intermediate models M stored in the AI training process according to the required AI grade number nj(j 1, 2.., n-1), each intermediate model comprising a value network model and a policy network model;
42) the intermediate model MjThe convolution layer parameters are fixed, and the middle model M is updated by adopting the same training mode in the AI training process againjThe parameters of the middle full-connection layer part are converged, and the converged model parameters are stored;
43) the n-1 middle models M retrained in the step 42) are used1,M2,.....,Mn-1And the final convergence model M obtained in the initial trainingnThe strategy network model in the game is taken out independently, and n game AI strategy distribution files F corresponding to different capability levels are generated according to the strategy network model1,F2,...,Fn
44) Distributing files F of game AI strategy1,F2,...,FnThe real-time operation strategy is used as a real-time operation strategy of the fighting AI with n different capability levels, and is merged with other code files to construct the fighting AI with n different capability levels.
10. The AI training method for reinforcement learning battle games based on information bottleneck theory as claimed in claim 4, wherein in the step 3), when the AI training model is iteratively trained by the A2C algorithm, the expectation is fitted by the Monte Carlo estimation method, and the gradient is updated by the Stein variation gradient descent method.
CN202110091260.4A 2021-01-22 2021-01-22 Information bottleneck theory-based AI (Artificial intelligence) training method for reinforcement learning battle game Active CN112717415B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110091260.4A CN112717415B (en) 2021-01-22 2021-01-22 Information bottleneck theory-based AI (Artificial intelligence) training method for reinforcement learning battle game

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110091260.4A CN112717415B (en) 2021-01-22 2021-01-22 Information bottleneck theory-based AI (Artificial intelligence) training method for reinforcement learning battle game

Publications (2)

Publication Number Publication Date
CN112717415A true CN112717415A (en) 2021-04-30
CN112717415B CN112717415B (en) 2022-08-16

Family

ID=75595220

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110091260.4A Active CN112717415B (en) 2021-01-22 2021-01-22 Information bottleneck theory-based AI (Artificial intelligence) training method for reinforcement learning battle game

Country Status (1)

Country Link
CN (1) CN112717415B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113641905A (en) * 2021-08-16 2021-11-12 京东科技信息技术有限公司 Model training method, information pushing method, device, equipment and storage medium
CN114970714A (en) * 2022-05-26 2022-08-30 哈尔滨工业大学 Trajectory prediction method and system considering uncertain behavior mode of moving target
CN116109525A (en) * 2023-04-11 2023-05-12 北京龙智数科科技服务有限公司 Reinforcement learning method and device based on multidimensional data enhancement
CN116808590A (en) * 2023-08-25 2023-09-29 腾讯科技(深圳)有限公司 Data processing method and related device
CN117162102A (en) * 2023-10-30 2023-12-05 南京邮电大学 Independent near-end strategy optimization training acceleration method for robot joint action
CN113269315B (en) * 2021-06-29 2024-04-02 安徽寒武纪信息科技有限公司 Apparatus, method and readable storage medium for performing tasks using deep reinforcement learning

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150100530A1 (en) * 2013-10-08 2015-04-09 Google Inc. Methods and apparatus for reinforcement learning
US20180207535A1 (en) * 2017-01-24 2018-07-26 Line Corporation Method, apparatus, computer program and recording medium for providing game service
CN109923560A (en) * 2016-11-04 2019-06-21 谷歌有限责任公司 Neural network is trained using variation information bottleneck
CN110327624A (en) * 2019-07-03 2019-10-15 广州多益网络股份有限公司 A kind of game follower method and system based on course intensified learning
CN111886059A (en) * 2018-03-21 2020-11-03 威尔乌集团 Automatically reducing use of cheating software in an online gaming environment
CN111985640A (en) * 2020-07-10 2020-11-24 清华大学 Model training method based on reinforcement learning and related device
CN112169311A (en) * 2020-10-20 2021-01-05 网易(杭州)网络有限公司 Method, system, storage medium and computer device for training AI (Artificial Intelligence)
CN112221152A (en) * 2020-10-27 2021-01-15 腾讯科技(深圳)有限公司 Artificial intelligence AI model training method, device, equipment and medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150100530A1 (en) * 2013-10-08 2015-04-09 Google Inc. Methods and apparatus for reinforcement learning
CN109923560A (en) * 2016-11-04 2019-06-21 谷歌有限责任公司 Neural network is trained using variation information bottleneck
US20180207535A1 (en) * 2017-01-24 2018-07-26 Line Corporation Method, apparatus, computer program and recording medium for providing game service
CN111886059A (en) * 2018-03-21 2020-11-03 威尔乌集团 Automatically reducing use of cheating software in an online gaming environment
CN110327624A (en) * 2019-07-03 2019-10-15 广州多益网络股份有限公司 A kind of game follower method and system based on course intensified learning
CN111985640A (en) * 2020-07-10 2020-11-24 清华大学 Model training method based on reinforcement learning and related device
CN112169311A (en) * 2020-10-20 2021-01-05 网易(杭州)网络有限公司 Method, system, storage medium and computer device for training AI (Artificial Intelligence)
CN112221152A (en) * 2020-10-27 2021-01-15 腾讯科技(深圳)有限公司 Artificial intelligence AI model training method, device, equipment and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黄学雨,郭勤: "《融合环境模型与深度强化学习的游戏算法》", 《江西理工大学学报》, vol. 39, no. 3, 30 June 2018 (2018-06-30), pages 84 - 89 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113269315B (en) * 2021-06-29 2024-04-02 安徽寒武纪信息科技有限公司 Apparatus, method and readable storage medium for performing tasks using deep reinforcement learning
CN113641905A (en) * 2021-08-16 2021-11-12 京东科技信息技术有限公司 Model training method, information pushing method, device, equipment and storage medium
CN113641905B (en) * 2021-08-16 2023-10-03 京东科技信息技术有限公司 Model training method, information pushing method, device, equipment and storage medium
CN114970714A (en) * 2022-05-26 2022-08-30 哈尔滨工业大学 Trajectory prediction method and system considering uncertain behavior mode of moving target
CN114970714B (en) * 2022-05-26 2024-05-03 哈尔滨工业大学 Track prediction method and system considering uncertain behavior mode of moving target
CN116109525A (en) * 2023-04-11 2023-05-12 北京龙智数科科技服务有限公司 Reinforcement learning method and device based on multidimensional data enhancement
CN116109525B (en) * 2023-04-11 2024-01-05 北京龙智数科科技服务有限公司 Reinforcement learning method and device based on multidimensional data enhancement
CN116808590A (en) * 2023-08-25 2023-09-29 腾讯科技(深圳)有限公司 Data processing method and related device
CN116808590B (en) * 2023-08-25 2023-11-10 腾讯科技(深圳)有限公司 Data processing method and related device
CN117162102A (en) * 2023-10-30 2023-12-05 南京邮电大学 Independent near-end strategy optimization training acceleration method for robot joint action

Also Published As

Publication number Publication date
CN112717415B (en) 2022-08-16

Similar Documents

Publication Publication Date Title
CN112717415B (en) Information bottleneck theory-based AI (Artificial intelligence) training method for reinforcement learning battle game
CN112668235B (en) Robot control method based on off-line model pre-training learning DDPG algorithm
Hester et al. Deep q-learning from demonstrations
CN111766782B (en) Strategy selection method based on Actor-Critic framework in deep reinforcement learning
Kurin et al. The atari grand challenge dataset
CN111856925B (en) State trajectory-based confrontation type imitation learning method and device
CN111008449A (en) Acceleration method for deep reinforcement learning deduction decision training in battlefield simulation environment
CN109284812B (en) Video game simulation method based on improved DQN
CN111882476B (en) Image steganography method for automatic learning embedding cost based on deep reinforcement learning
CN113952733A (en) Multi-agent self-adaptive sampling strategy generation method
CN111282272B (en) Information processing method, computer readable medium and electronic device
CN113947022B (en) Near-end strategy optimization method based on model
Zhao et al. Handling large-scale action space in deep Q network
CN116090549A (en) Knowledge-driven multi-agent reinforcement learning decision-making method, system and storage medium
CN112257348B (en) Method for predicting long-term degradation trend of lithium battery
CN114169385A (en) MSWI process combustion state identification method based on mixed data enhancement
Tong et al. Enhancing rolling horizon evolution with policy and value networks
KR102209917B1 (en) Data processing apparatus and method for deep reinforcement learning
CN116596059A (en) Multi-agent reinforcement learning method based on priority experience sharing
CN116204849A (en) Data and model fusion method for digital twin application
CN115293361A (en) Rainbow agent training method based on curiosity mechanism
CN115964898A (en) Bignty game confrontation-oriented BC-QMIX on-line multi-agent behavior decision modeling method
Liu et al. Forward-looking imaginative planning framework combined with prioritized-replay double DQN
CN113469904A (en) General image quality enhancement method and device based on cycle consistency loss
Zhang et al. Side-Scrolling Platform Game Levels Reachability Repair Method and Its Applications to Super Mario Bros

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant